J'ai ce dataframe en pyspark :
df = spark.createDataFrame([
("TenantId", "TennatId_1"),
("TimeGenerated", "2023-04-17T11:50:51.9013145Z"),
("ActivityType", "Connection"),
("CorrelationId", "608dd49a"),
("UserName", "test_1@test.cloud"),
("Name", "Name1"),
("Source", "Client"),
("Parameters", "{}"),
("SourceSystem", "Azure"),
("Type", "Check"),
("_ResourceId", "/subscriptions/5286ce"),
("TenantId", "TennatId_2"),
("TimeGenerated", "2023-04-17T11:50:51.944022Z"),
("ActivityType", "Connection"),
("CorrelationId", "11c0d75f0000"),
("UserName", "test_2@test.cloud"),
("Name", "Name2"),
("Source", "Client"),
("Parameters", "{}"),
("SourceSystem", "Azure"),
("Type", "Check"),
("_ResourceId", "/subscriptions/5286ce38-272f-4c54")], ["name", "rows"])
Et je veux faire un pivot dessus.
J'ai essayé l'expression ci-dessous :
pivoted_df = df.groupBy("name") \
.pivot("name") \
.agg(expr("first(rows) as rows")) \
.orderBy("name")
mais j'obtiens cette sortie :
et mon désir est :
Comment cela peut-il être fait?