Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
340 views
in Technique[技术] by (71.8m points)

sql - Islands and gaps problem on time-series data

I have a sample data and I have been trying to get the required data as shown below. I have able to implement some sort of Islands and gaps solution as well. Here is the most near version I reached near the expected data.

DROP TABLE IF EXISTS #data
CREATE TABLE #data(
    factor varchar(50),
    val int,
    [start_date] date, [end_date] date
)
Go
INSERT INTO #data VALUES
('a', 15, '2021-01-01', '2021-01-05'),
('a', 15, '2021-01-08', '2021-01-10'),
('a', 20, '2021-01-11', '2021-01-20'),
('a', 15, '2021-01-21', '2099-01-01'),
('b', 10, '2021-01-01', '2021-01-04'),
('b', 12, '2021-01-05', '2021-01-13'),
('b', 12, '2021-01-17', '2021-01-19'),
('b', 12, '2021-01-20', '2021-01-23'),
('b', 10, '2021-01-24', '2099-01-01');

WITH value_cte As (
    SELECT * ,
    RANK() OVER(PARTITION BY factor ORDER BY [start_date]) - RANK() OVER(PARTITION BY factor, val ORDER BY [start_date]) grp
    FROM #data

)
SELECT factor, val, MIN(start_date) st, MAX(end_date) ed
FROM value_cte
GROUP BY factor, val, grp
ORDER BY factor, st

Result from above query:

result achieved

Expected Result:

factor  val st          ed
a       15  2021-01-01  2021-01-05
a       15  2021-01-08  2021-01-10
a       20  2021-01-11  2021-01-20
a       15  2021-01-21  2099-01-01
b       10  2021-01-01  2021-01-04
b       12  2021-01-05  2021-01-13
b       12  2021-01-17  2021-01-23
b       10  2021-01-24  2099-01-01

Even if the value is same for the two consecutive islands and there is a gap then that gap should not be merged and if the two islands are continuous then they should be merged. Unfortunately, I can't change the source here (sample data structure)

question from:https://stackoverflow.com/questions/65940060/islands-and-gaps-problem-on-time-series-data

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can use lag() to determine where the "islands" start -- that is, where there is no overlap. Then use a cumulative sum based on date arithmetic:

select factor, val, min(start_date), max(end_date)
from (select d.*,
             sum(case when prev_end_date >= dateadd(day, -1, start_date) then 0 else 1 end) over (partition by factor, val order by start_date) as grp
      from (select d.*,
                   lag(end_date) over (partition by factor, val order by start_date) as prev_end_date
            from data d
           ) d
     ) d
group by factor, val, grp
order by factor, min(start_date);

Here is a SQL Fiddle.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...