Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
183 views
in Technique[技术] by (71.8m points)

postgresql - concatenate a column value for several rows based on condition

I have a table which has format like this (id is the pk)

id|timestamps         |year|month|day|groups_ids|status |SCHEDULED          |uid|
--|-------------------|----|-----|---|----------|-------|-------------------|---|
 1|2021-02-04 17:18:24|2020|    8|  9|         1|OK     |2020-08-09 00:00:00|  1|
 2|2021-02-04 17:18:09|2020|    9|  9|         1|OK     |2020-09-09 00:00:00|  1|
 3|2021-02-04 17:19:51|2020|   10|  9|         1|HOLD   |2020-10-09 00:00:00|  1|
 4|2021-02-04 17:19:04|2020|   10| 10|         2|HOLD   |2020-10-09 00:00:00|  1|
 5|2021-02-04 17:18:30|2020|   10| 11|         2|HOLD   |2020-10-09 00:00:00|  1|
 6|2021-02-04 17:18:57|2020|   10| 12|         2|OK     |2020-10-09 00:00:00|  1|
 7|2021-02-04 17:18:24|2020|    8|  9|         1|HOLD   |2020-08-09 00:00:00|  2|
 8|2021-02-04 17:18:09|2020|    9|  9|         2|HOLD   |2020-09-09 00:00:00|  2|
 9|2021-02-04 17:19:51|2020|   10|  9|         2|HOLD   |2020-10-09 00:00:00|  2|
10|2021-02-04 17:19:04|2020|   10| 10|         2|HOLD   |2020-10-09 00:00:00|  2|
11|2021-02-04 17:18:30|2020|   10| 11|         2|HOLD   |2020-10-09 00:00:00|  2|
12|2021-02-04 17:18:57|2020|   10| 12|         2|HOLD   |2020-10-09 00:00:00|  2|

the job is i want to extract every group_ids for each uid when the status is OK order by SCHEDULED ascended, and if there's no OK found in the record for the uid it will takes for the latest HOLD based on year month and day. After that I want to make a weighing score with each group_ids:

group_ids > score
1 > 100
2 > 80
3 > 60
4 > 50
5 > 10
6 > 50
7 > 0

so if [1,1,2] will be change to (100+100+80) = 280 it will look like this:

ids|uid|pattern|score|
---|---|-------|-----|
  1|  1|[1,1,2]|  280|
  2|  2|[2]    |   80|

It's pretty hard since i cannot found any operators like python for loop and append operators in PostgreSQL

question from:https://stackoverflow.com/questions/66058649/concatenate-a-column-value-for-several-rows-based-on-condition

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

step-by-step demo:db<>fiddle

SELECT 
    s.uid, s.values,
    sum(v.value) as score
FROM (
    SELECT DISTINCT ON (uid)
        uid,
        CASE 
            WHEN cardinality(ok_count) > 0 THEN ok_count
            ELSE ARRAY[last_value]
        END as values
    FROM (
        SELECT 
            *,
            ARRAY_AGG(groups_ids) FILTER (WHERE status = 'OK') OVER (PARTITION BY uid ORDER BY scheduled)as ok_count,
            first_value(groups_ids) OVER (PARTITION BY uid ORDER BY year, month DESC) as last_value
        FROM mytable
    ) s
    ORDER BY uid, scheduled DESC
) s,
  unnest(values) as u_group_id
  JOIN (VALUES 
      (1, 100), (2, 80), (3, 60), (4, 50), (5,10), (6, 50), (7, 0)
  ) v(group_id, value) ON v.group_id = u_group_id 
GROUP BY s.uid, s.values

Phew... quite complex. Let's have a look at the steps:

a)

SELECT 
    *,
    -- 1:
    ARRAY_AGG(groups_ids) FILTER (WHERE status = 'OK') OVER (PARTITION BY uid ORDER BY scheduled)as oks,
    -- 2:
    first_value(groups_ids) OVER (PARTITION BY uid ORDER BY year, month DESC) as last_value
FROM mytable
  1. Using the array_agg() window function to create an array of group_ids without loosing the other data as we would with simple GROUP BY. The FILTER clause is to put only the status = OK records into the array.
  2. Find the last group_id of a group (partition) using the first_value() window function. In descending order is returns the last value.

b)

SELECT DISTINCT ON (uid)                               -- 2
    uid,
    CASE                                               -- 1
        WHEN cardinality(ok_count) > 0 THEN ok_count
        ELSE ARRAY[last_value]
    END as values
FROM (
    ...
) s
ORDER BY uid, scheduled DESC                           -- 2
  1. The CASE clause either takes the previously created array (from step a1) or, if there is none, it takes the last value (from step a2), creates an one-elemented array.
  2. The DISTINCT ON clause returns only the first element of an ordered group. The group is your uid and the order is given by the column scheduled. Since you don't want the first, but last records within the group, you have to order it DESC to make the most recent one the topmost record. That is taken by the DISTINCT ON

c)

SELECT 
    uid,
    group_id
FROM (
    ...
) s,
  unnest(values) as group_id   -- 1
  1. The arrays should be extracted into one record per element. That helps to join the weighted values later.

d)

SELECT 
    s.uid, s.values,
    sum(v.weighted_value) as score                             -- 2
FROM (
    ...
) s,
  unnest(values) as u_group_id
  JOIN (VALUES 
      (1, 100), (2, 80), ...
  ) v(group_id, weighted_value) ON v.group_id = u_group_id     -- 1
GROUP BY s.uid, s.values                              -- 2
  1. Join your weighted value on the array elements. Naturally, this can be a table or query or whatever.
  2. Regroup the uid groups to calculate the SUM() of the weighted_values

Additional note:

You should avoid duplicate data storing. You don't need to store the date parts year, month and day if you also store the complete date. You can always calculate them from the date.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...