#StackBounty: #apache-spark #pyspark #apache-spark-sql Multiple pyspark "window()" calls shows error when doing a "group…

Bounty: 50

This question is a follow up of this answer. Spark is displaying an error when the following situation arises:

# Group results in 12 second windows of "foo", then by integer buckets of 2 for "bar"
fooWindow = window(col("foo"), "12 seconds"))

# A sub bucket that contains values in [0,2), [2,4), [4,6]...
barWindow = window(col("bar").cast("timestamp"), "2 seconds").cast("struct<start:bigint,end:bigint>")

results = df.groupBy(fooWindow, barWindow).count()

The error is:

"Multiple time window expressions would result in a cartesian product of rows, therefore they are currently not supported."

Is there some way to achieve the desired behavior?


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.