You can do a collect_list
aggregation before collecting the dataframe to Python and converting the result to a Numpy array:
import numpy as np
import pyspark.sql.functions as F
a = np.array([
i[1] for i in
df.groupBy('Client')
.agg(F.collect_list(F.array(*df.columns[1:])))
.orderBy('Client')
.collect()
])
print(a)
array([[[ 10, 1],
[ 15, 3],
[ 20, 5],
[ 25, 7],
[ 30, 9]],
[[ 1, 10],
[ 2, 11],
[ 3, 12],
[ 4, 13],
[ 5, 14]],
[[100, 0],
[150, 1],
[200, 2],
[250, 3],
[300, 4]]])
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…