Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
453 views
in Technique[技术] by (71.8m points)

sql - Find min and max for subsets of consecutive rows - gaps and islands

Trying to build a query.

The input is ordered by rownumber in column 'rn' starting with 1 for each unique value in 'name' and defining a given sequence of entries in 'act'. In column 'act' it holds two values in multiple occurence, >sleep< and >wake<. The goal is to find for each consecutive set of rows of one of those values the minimum and maximum value of startt and endd.

This shall be the input:

name       act        rn     startt endd
---------- ---------- ------ ------ ------
jimmy      sleep      1      1      3
jimmy      wake       2      4      7
jimmy      wake       3      8      10
jimmy      sleep      4      11     13
karen      wake       1      1      4
karen      sleep      2      5      7
karen      wake       3      8      9
karen      wake       4      10     12
karen      wake       5      13     14
karen      sleep      6      15     17
karen      sleep      7      18     20

the desired output:

name       act        startt endd   
---------- ---------- ------ ------ 
jimmy      sleep      1      3      
jimmy      wake       4      10     
jimmy      sleep      11     13     
karen      wake       1      4      
karen      sleep      5      7      
karen      wake       8      14     
karen      sleep      15     20  

The source of the input does not provide further columns. The number of members in each subset can be very much higher then in this example.

I tried different ways of aggregating, but none worked. I believe using LEAD and LAGG and further trickery might get me there, but that appears to be awfully unelegant. I have the notion it is key to differentiate each subset, i.e. create an identifier unique to all its members. With this at hand an aggregate with min and max is simple. Maybe i'm wrong. Maybe it's impossible. Maybe a self join. Maybe a recursive cte. I don't know.

So: does anybody know how to get this? Help is much appreciated.

UPDATE:

Thank You to Gordon Linoff, shawnt00 and the other contributors who commented. With Your advice I feel major gaps in my toolbox of logic closing.

For the interested:

declare @t table (
    name nvarchar(10)
    ,act nvarchar (10)
    ,startt smallint
    ,endd smallint
    )

insert into @t (
    name
    ,act
    ,startt
    ,endd
    )
values
     ('jimmy','sleep', 1,3)
    ,('jimmy','wake', 4,7)
    ,('jimmy','wake', 8,10)
    ,('jimmy','sleep', 11,13)
    ,('karen','wake', 1,4)
    ,('karen','sleep', 5,7)
    ,('karen','wake', 8,9)
    ,('karen','wake', 10,12)
    ,('karen','wake', 13,14)
    ,('karen','sleep', 15,17)
    ,('karen','sleep', 18,20)

; --- all rows, no aggregating
with 
cte as (
select 
    name
    ,act
    ,row_number() over (partition by name order by name,startt) rn
    ,row_number() over (partition by name, act order by name,startt) act_n
    ,startt
    ,endd
from
    @t )
select
    name
    ,act
    ,startt
    ,endd
    ,rn
    ,act_n
    ,rn - act_n diff
from 
    cte
order by 
    name
    ,rn

;--- aggregating for the desired ouput
with 
cte as (
select 
    name
    ,act
    ,row_number() over (partition by name order by name,startt) rn
    ,row_number() over (partition by name, act order by name,startt) act_n
    ,startt
    ,endd
from 
    @t )
select
    name
    ,act
    ,min(startt) startt
    ,max(endd)   endd
    ,min(rn)     min_rn
    ,max(rn)     max_rn
from 
    cte
group by 
    name
    ,act
    ,rn - act_n
order by 
    name
    ,min(rn)
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You want to find consecutive groups of similar rows and then aggregation. I like the difference of row numbers approach:

select name, act, min(startt) as startt, max(endd) as endd
from (select i.*,
             row_number() over (partition by name, act order by rn) as seqnum_na,
             row_number() over (partition by name order by rn) as seqnum_n
      from input i
     ) i
group by (seqnum_n - seqnum_na), name, act;

You can see how this works by looking at what the subquery does.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...