I have table which stores person, start of holiday and stop of holiday.
I need to count from it, how many working days per month person was on holiday. So I want to partition this table over month.
To get holidays I'm using: https://github.com/christopherthompson81/pgsql_holidays
Let's assume I have table for one person only with start/stop only.
create table data (id int, start date, stop date);
This is function for network_days I wrote:
CREATE OR REPLACE FUNCTION network_days(start_date date , stop_date date) RETURNS bigint AS $$
SELECT count(*) FROM
generate_series(start_date , stop_date - interval '1 minute' , interval '1 day') the_day
WHERE
extract('ISODOW' FROM the_day) < 6 AND the_day NOT IN (
SELECT datestamp::timestamptz FROM holidays_poland (extract(year FROM o.start_date)::int, extract(year FROM o.stop_date)::int))
$$
LANGUAGE sql
STABLE;
and I created function with query like:
--$2 = 2020
SELECT
month, year, sum(value_per_day)
FROM (
SELECT to_char(dt , 'mm') AS month, to_char(dt, 'yyyy') AS year, (network_days ((
CASE WHEN EXTRACT(year FROM df.start_date) < 2020 THEN (SELECT date_trunc('year' , df.start_date) + interval '1 year')::date
ELSE df.start_date END) , ( CASE WHEN EXTRACT(year FROM df.stop_date) > $2 THEN (date_trunc('year' , df.stop_date))::date
ELSE
df.stop_date END))::int ::numeric / count(*) OVER (PARTITION BY id))::int AS value_per_day
FROM intranet.dayoff df
LEFT JOIN generate_series((
CASE WHEN EXTRACT(year FROM df.start_date) < $2 THEN (SELECT date_trunc('year' , df.start_date) + interval '1 year')::date ELSE df.start_date
END) , (CASE WHEN EXTRACT(year FROM df.stop_date) > $2 THEN (date_trunc('year' , df.stop_date))::date
ELSE df.stop_date END) - interval '1 day' , interval '1 day') AS t (dt) ON extract('ISODOW' FROM dt) < 6
WHERE
extract(isodow FROM dt) < 6 AND (EXTRACT(year FROM start_date) = $2 OR EXTRACT(year FROM stop_date) = $2)) t
GROUP BY month, year
ORDER BY month;
based on: https://dba.stackexchange.com/questions/237745/postgresql-split-date-range-by-business-days-then-aggregate-by-month?rq=1
and I almost have it:
10 rows returned
| month | year | sum |
| ----- | ---- | ---- |
| 03 | 2020 | 2 |
| 04 | 2020 | 13 |
| 06 | 2020 | 1 |
| 11 | 2020 | 1 |
| 12 | 2020 | 2 |
| 05 | 2020 | 1 |
| 10 | 2020 | 2 |
| 08 | 2020 | 10 |
| 01 | 2020 | 1 |
| 02 | 2020 | 1 |
so in function I created I'd need to add something like this
dt NOT IN (SELECT datestamp::timestamptz FROM holidays_poland ($2, $2))
but I end up with many conditions and I feel like this wrong approach.
I feel like I should just somehow divide table from:
id start stop
1 31.12.2019 00:00:00 01.01.2020 00:00:00
2 30.03.2020 00:00:00 14.04.2020 00:00:00
3 01.05.2020 00:00:00 03.05.2020 00:00:00
to
start stop
30.03.2020 00:00:00 01.01.2020 00:00:00
01.01.2020 00:00:00 14.04.2020 00:00:00
01.05.2020 00:00:00 03.05.2020 00:00:00
and just run network_days function for this date range, but I couldn't successfully partition my query of the table to get such result.
What do you think is best way to achieve what I want to calculate?
question from:
https://stackoverflow.com/questions/65933007/count-days-per-month-from-days-off-table