I had a spreadsheet that looked like a prior "group by" had left many rows blank where I needed them to be filled with the data above it (see example picture below). I needed each account number to fill all the cells beneath it until the start of the next account number (i.e., A1234 needs to be in all the cells up to B4325, B4325 needs to be in all the cells up to C3452 and so on).
From this stack exchange answer by benjamin berhault I found this code and tailored it to my problem:
SELECT rn, acct, FIRST_VALUE(acct) OVER(PARTITION BY grp)
FROM (SELECT rn, acct, SUM(CASE WHEN acct <> '' THEN 1 END) OVER (ORDER BY rn) AS grp
FROM
(SELECT ROW_NUMBER() OVER() rn
, acct
FROM dataset AS d) AS sub1 ) sub2;
What I don't understand about this query is the ORDER BY clause in this part
SUM(CASE WHEN acct <> '' THEN 1 END) OVER (ORDER BY rn) AS grp
This whole line works to successfully create a new grp column that is all 1's for the first account, all 2's for the second account and so on. From here it can use the FIRST VALUE PARTITION BY in the main query to get the result I am looking for, but what I do not understand is why does ORDER BY rn cause the column to sum in that manner? I would have thought a PARTITION BY would be needed there, but it does not work.
question from:
https://stackoverflow.com/questions/65598712/understanding-this-window-function-query-in-postgresql 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…