一、需求描述
1 原表T1某条记录(记做r1,相邻下一条为r2)的下一行记录的STARTDATE小于上一行ENDDATE,针对这样的记录做转换即:
r1 的STARTDATE 保持不变,ENDDATE为r1 的STARTDATE-1
r2 的STARTDATE 为r1的ENDDATE ,ENDDATE为r1 的ENDDATE
2 如果原表T1不存在相邻行“时间重叠”(即为1的定义)时保持原有数据不变。
# 文本版
#T1
seq id startdate enddate num
1 1 2021-04-20 2021-05-03 200
2 1 2021-05-01 2021-05-24 100
3 1 2021-05-18 2021-05-31 69
4 1 2021-05-20 2021-07-31 34
5 1 2021-08-05 2021-08-25 45
6 1 2021-08-15 2021-09-25 65
#输出结果
ID STARTDATE ENDDATE NUM
1 2021-04-20 2021-04-30 200
1 2021-05-01 2021-05-02 300
1 2021-05-03 2021-05-17 100
1 2021-05-18 2021-05-19 169
1 2021-05-20 2021-05-23 203
1 2021-05-24 2021-05-30 103
1 2021-05-31 2021-07-30 34
1 2021-08-05 2021-08-14 45
1 2021-08-15 2021-08-25 110
1 2021-08-26 2021-09-25 65
二、思路概述
1 需求延展
SEQ ID STARTDATE ENDDATE NUM
1 1 2021-04-20 2021-05-03 200
2 1 2021-05-01 2021-05-24 100
3 1 2021-05-18 2021-05-31 69
4 1 2021-05-20 2021-07-31 34
这里第4条记录同时叠加在第2和3条记录里。
2 思路概述
1) T0 通过上下行函数生成的时间序列
id new_DATE nextSTARTDATE preEndDATE rn
1 2021-05-24 2021-05-03 1
1 2021-05-03 2021-05-24 2021-05-01 2
1 2021-05-01 2021-05-03 2021-04-20 3
1 2021-04-20 2021-05-01 4
2) last 取出T0里的最后一条记录,为后面的矫正做准备。
new_Date preENDDATE id
2021-05-24 2021-05-03 1
3) normal 取出原始数据里不会出现时间叠加的记录,为后面的矫正做准备。
当前演示数据无记录,代码加注释可浮现。
4)T_Serial 统一定义STARTDATE、ENDDATE,首次修正T0。
id STARTDATE ENDDATE
1 2021-04-20 2021-04-30
1 2021-05-01 2021-05-03
1 2021-05-04 2021-05-24
5) T2 对时间没有重叠的记录进行修正(删除T0对应值,更新对应ENDDATE)。
当前示例结果集为空,即无需要修正。
6) T2关联T1(原始表),汇总后取得最终值
STARTDATE ENDDATE NUM
2021-04-20 2021-04-30 200
2021-05-01 2021-05-03 300
2021-05-04 2021-05-24 100
三、SQL代码
当前演示版本是Mysql 8.0.23,支持CTE、窗口函数的SQL Server 、Oracle 需要修改Order by 和ADDDATE 处语法。
Step0 创建表并初始化数据
DROP TABLE IF EXISTS test_ShenLiang2025;
CREATE TABLE test_ShenLiang2025 (
seq int DEFAULT NULL,
id int DEFAULT NULL,
STARTDATE date DEFAULT NULL,
ENDDATE date DEFAULT NULL,
NUM int DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO test_ShenLiang2025 VALUES ('1', '1', '2021-04-20', '2021-05-03', '200');
INSERT INTO test_ShenLiang2025 VALUES ('2', '1', '2021-05-01', '2021-05-24', '100');
INSERT INTO test_ShenLiang2025 VALUES ('3', '1', '2021-05-18', '2021-05-31', '69');
INSERT INTO test_ShenLiang2025 VALUES ('4', '1', '2021-05-20', '2021-07-31', '34');
INSERT INTO test_ShenLiang2025 VALUES ('5', '1', '2021-08-05', '2021-08-25', '45');
INSERT INTO test_ShenLiang2025 VALUES ('6', '1', '2021-08-15', '2021-09-25', '65');
Step1 构建临时结果集以生成时间序列。
WITH T0 AS(
SELECT id,
new_DATE,
LEAD(NEW_DATE,1) OVER (PARTITION BY ID ORDER BY NEW_DATE ) nextSTARTDATE,
LAG(NEW_DATE,1) OVER (PARTITION BY ID ORDER BY NEW_DATE ) preENDDATE,
ROW_NUMBER()OVER(PARTITION BY ID ORDER BY new_DATE DESC) rn
FROM
(
SELECT DISTINCT ID,STARTDATE new_DATE FROM test_ShenLiang2025
WHERE seq in (1,2) -- 可加注释验证,当前仅取原表里2条记录
UNION
SELECT DISTINCT ID,ENDDATE new_DATE FROM test_ShenLiang2025
WHERE seq in (1,2) -- 可加注释验证,当前仅取原表里2条记录
ORDER BY new_DATE
)A
),last AS
( SELECT new_DATE,preENDDATE,id
FROM T0
WHERE nextSTARTDATE IS NULL
),normal AS
(
SELECT * FROM
(
SELECT id,
ENDDATE,
LEAD(STARTDATE,1) OVER (PARTITION BY ID ORDER BY ENDDATE ) nextSTARTDATE,
LAG(ENDDATE,1) OVER (PARTITION BY ID ORDER BY ENDDATE ) preENDDATE
FROM test_ShenLiang2025
)A
WHERE ENDDATE > preENDDATE AND ENDDATE < nextSTARTDATE
),T_Serial AS (
SELECT ID,ADDDATE(preENDDATE, INTERVAL 1 DAY ) STARTDATE,
new_DATE ENDDATE
FROM last
UNION
SELECT bottom_2.ID,bottom_2.new_DATE STARTDATE,
CASE WHEN rn =3 THEN bottom_2.nextSTARTDATE
ELSE ADDDATE(bottom_2.nextSTARTDATE, INTERVAL -1 DAY ) END ENDDATE
FROM last
JOIN T0 bottom_2
ON bottom_2.nextSTARTDATE<=last.preENDDATE AND bottom_2.id = last.id
),T2 AS(
SELECT B.ID,B.STARTDATE,B.ENDDATE FROM
(
SELECT A.*,ROW_NUMBER()OVER(PARTITION BY ID,STARTDATE ORDER BY ENDDATE) rn
FROM
(
SELECT A.ID,A.STARTDATE,A.ENDDATE
FROM T_Serial A
LEFT JOIN normal B
ON A.STARTDATE = B.ENDDATE AND A.ID = B.ID
WHERE B.ENDDATE IS NULL
UNION
SELECT A.ID,A.STARTDATE,B.ENDDATE
FROM T_Serial A
INNER JOIN normal B
ON ADDDATE(A.ENDDATE, INTERVAL 1 DAY ) = B.ENDDATE AND A.ID = B.ID
)A
)B WHERE rn =1
)
Step2 时间序列关联原表生成NUM字段。
SELECT T2.STARTDATE,T2.ENDDATE,SUM(T1.NUM) TOTAL FROM T2
JOIN test_ShenLiang2025 T1
ON T2.STARTDATE>=T1.STARTDATE
AND T2.ENDDATE<=T1.ENDDATE
GROUP BY T2.STARTDATE,T2.ENDDATE
ORDER BY T2.STARTDATE
Step4 查看结果
STARTDATE ENDDATE NUM
2021-04-20 2021-04-30 200
2021-05-01 2021-05-03 300
2021-05-04 2021-05-24 100
执行结果:
到此这篇关于时间序列错位还原之SQL实现案例详解的文章就介绍到这了,更多相关SQL时间错位与还原生成案例内容请搜索极客世界以前的文章或继续浏览下面的相关文章希望大家以后多多支持极客世界! |
请发表评论