Oracle數(shù)倉中判斷時(shí)間連續(xù)性的幾種SQL寫法示例
零、需求介紹
現(xiàn)有一張表數(shù)據(jù)如下:
此表是一張鏡像表,policyno列代表一個(gè)保單號,state列代表這個(gè)保單號在snapdate當(dāng)天的最后一次狀態(tài)(state每天可能會變很多次,鏡像表只保留snapdate時(shí)間點(diǎn)凌晨的最后一次狀態(tài)),snapdate代表當(dāng)天做鏡像的時(shí)間,現(xiàn)在有個(gè)需求,我們想取出來這個(gè)保單號連續(xù)保持某個(gè)狀態(tài)的起止時(shí)間,例如:
保單號sm1保持狀態(tài)1的起止時(shí)間為2021020120210202,然后在20210203時(shí)候變成了狀態(tài)2,又在20210204時(shí)候變成了狀態(tài)3,最終又在2021020520210209時(shí)間段保持在狀態(tài)1,然后鏡像表的程序可能期間出現(xiàn)過問題,在20210210開始到20210215日沒有鏡像成功,直到20210216日才恢復(fù),20210216~20210219日保單號sm1的狀態(tài)一直保持為1,后續(xù)還有可能繼續(xù)變,那么,上面說的保單sm1的幾個(gè)狀態(tài)的連續(xù)時(shí)間,我們想要的結(jié)果為:
POLICYNO STATE START_DATE END_DATEsm1 1 20210201 20210202sm1 2 20210203 20210203sm1 3 20210204 20210204sm1 1 20210205 20210209sm1 1 20210216 20210219.........................
我這里提供5種寫法,可以歸結(jié)為兩大類:
一類:通過使用分析函數(shù)或自關(guān)聯(lián)獲取數(shù)據(jù)連續(xù)性,構(gòu)造一個(gè)分組字段進(jìn)行分組求最大最小值。
二類:通過樹形層次查詢獲取連續(xù)性,獲取起止時(shí)間。
一、通過使用lag分析函數(shù)獲取前后時(shí)間,根據(jù)當(dāng)前時(shí)間與前后時(shí)間的差值進(jìn)行判斷獲取時(shí)間連續(xù)性標(biāo)志,然后使用sum()over()對連續(xù)性標(biāo)志進(jìn)行累加,從而生成一個(gè)新的臨時(shí)分組字段,最終根據(jù)policyno,state,臨時(shí)分組字段進(jìn)行分組取最大最小值
這里為了好理解,每一個(gè)處理步驟都單獨(dú)寫出來了,實(shí)際使用中可以簡寫一下:
with t as--求出來每條數(shù)據(jù)當(dāng)天的前一天鏡像時(shí)間 (select a.policyno, a.state, a.snapdate, lag(a.snapdate) over(partition by a.policyno, a.state order by a.snapdate) as lag_tim from zyd.temp_0430 a order by a.policyno, a.snapdate),t1 as--判斷當(dāng)天鏡像時(shí)間和前一天的鏡像時(shí)間+1是否相等,如果相等就置為0否則置為1,新增臨時(shí)字段lxzt意為:連續(xù)狀態(tài)標(biāo)志 (select t.*, case when t.snapdate = t.lag_tim + 1 then 0 else 1 end as lxzt from t order by policyno, snapdate),t2 as--根據(jù)lxzt字段進(jìn)行sum()over()求和,求出來一個(gè)新的用來做分組依據(jù)的字段,簡稱fzyj (select t1.*, sum(lxzt) over(order by policyno, snapdate) as fzyj from t1)select policyno,--最后根據(jù)policyno,state,fzyj進(jìn)行分組求最大最小值即為狀態(tài)連續(xù)的開始結(jié)束時(shí)間 state, -- fzyj, min(snapdate) as start_snap, max(snapdate) as end_snap from t2 group by policyno, state, fzyj order by fzyj;
二、不使用lag分析函數(shù),通過自關(guān)聯(lián)也能判斷出來哪些天連續(xù),然后后面操作步驟同上,這個(gè)寫法算是對lag()over()函數(shù)的一個(gè)回寫,擺脫對分析函數(shù)的依賴
下面這種寫法,需要讀兩次表,上面lag的方式是對這個(gè)寫法的一種優(yōu)化:
with t as (select a.policyno, a.state, a.snapdate, b.snapdate as snap2 from zyd.temp_0430 a, zyd.temp_0430 b where a.policyno = b.policyno(+) and a.state = b.state(+) and a.snapdate - 1 = b.snapdate(+) order by policyno, snapdate),t1 as (select t.*, case when snap2 is null then 1 else 0 end as lxzt from t order by policyno, snapdate),t2 as (select t1.*, sum(lxzt) over(order by policyno, snapdate) as fzyj from t1 order by policyno, snapdate)select policyno, state, fzyj, min(snapdate) as start_snap, max(snapdate) as end_snap from t2 group by policyno, state, fzyj order by fzyj;
三、通過構(gòu)造樹形結(jié)構(gòu),確定根節(jié)點(diǎn)和葉子節(jié)點(diǎn)來獲取狀態(tài)連續(xù)的開始和結(jié)束時(shí)間
先按照數(shù)據(jù)的連續(xù)性構(gòu)造顯示每層關(guān)系的樹狀結(jié)構(gòu):
with t as (select a.policyno, a.state, a.snapdate, lag(a.snapdate) over(partition by a.policyno, a.state order by a.snapdate) as lag_tim from zyd.temp_0430 a --where policyno="sm1" order by a.policyno, a.snapdate),t1 as (select t.*, case when t.snapdate = t.lag_tim + 1 then 0 else 1 end as lxzt from t order by policyno, snapdate),t2 as (select t1.*, lpad("->", (level - 1) * 2, "->") || snapdate as 樹狀結(jié)構(gòu), level as 樹中層次, decode(level, 1, 1) 是否根節(jié)點(diǎn), decode(connect_by_isleaf, 1, 1) 是否葉子節(jié)點(diǎn), case when (connect_by_isleaf = 0 and level > 1) then 1 end 是否樹杈, (prior snapdate) as 根值, connect_by_root snapdate 主根值 from t1 start with (lxzt = 1) connect by (prior snapdate = snapdate - 1 and prior state = state and prior policyno = policyno) order by policyno, snapdate)select * from t2;
從上面能清晰的看出來,每一次連續(xù)狀態(tài)的開始日期作為每個(gè)樹的根,分支節(jié)點(diǎn)即樹杈和葉子節(jié)點(diǎn)的關(guān)系一步步拓展開來,分析上面數(shù)據(jù)我們能夠知道,如果我們想要獲取每個(gè)保單狀態(tài)連續(xù)時(shí)間范圍,以上面的數(shù)據(jù)現(xiàn)有分布方式,現(xiàn)在就可以:通過policyno,state,主根值進(jìn)行g(shù)roup by 取snapdate的最大最小值,類似前面兩個(gè)寫法的最終步驟;
接下來,我們這個(gè)第三種寫法就是按照這個(gè)方式寫:
with t as (select a.policyno, a.state, a.snapdate, lag(a.snapdate) over(partition by a.policyno, a.state order by a.snapdate) as lag_tim from zyd.temp_0430 a --where policyno="sm1" order by a.policyno, a.snapdate),t1 as (select t.*, case when t.snapdate = t.lag_tim + 1 then 0 else 1 end as lxzt from t order by policyno, snapdate),t2 as (select t1.*, lpad("->", (level - 1) * 2, "->") || snapdate as 樹狀結(jié)構(gòu), level as 樹中層次, decode(level, 1, 1) 是否根節(jié)點(diǎn), decode(connect_by_isleaf, 1, 1) 是否葉子節(jié)點(diǎn), case when (connect_by_isleaf = 0 and level > 1) then 1 end 是否樹杈, (prior snapdate) as 根值, connect_by_root snapdate 主根值 from t1 start with (lxzt = 1) connect by (prior snapdate = snapdate - 1 and prior state = state and prior policyno = policyno) order by policyno, snapdate)select policyno, state, min(snapdate) as start_date, max(snapdate) as end_date from t2 group by policyno, state, 主根值 order by policyno, state;
四、參照過程三,既然已經(jīng)獲取了每條數(shù)據(jù)的主根值和葉子節(jié)點(diǎn)的值,這就代表了我們知道了每個(gè)保單狀態(tài)的連續(xù)開始和結(jié)束時(shí)間,那直接取出來葉子節(jié)點(diǎn)數(shù)據(jù),葉子節(jié)點(diǎn)主根值就是開始日期,葉子節(jié)點(diǎn)的值就是結(jié)束日期,這樣我們就不需再group by了
with t as (select a.policyno, a.state, a.snapdate, lag(a.snapdate) over(partition by a.policyno, a.state order by a.snapdate) as lag_tim from zyd.temp_0430 a --where policyno="sm1" order by a.policyno, a.snapdate),t1 as (select t.*, case when t.snapdate = t.lag_tim + 1 then 0 else 1 end as lxzt from t order by policyno, snapdate),t2 as (select t1.*, lpad("->", (level - 1) * 2, "->") || snapdate as 樹狀結(jié)構(gòu), level as 樹中層次, decode(level, 1, 1) 是否根節(jié)點(diǎn), decode(connect_by_isleaf, 1, 1) 是否葉子節(jié)點(diǎn), case when (connect_by_isleaf = 0 and level > 1) then 1 end 是否樹杈, (prior snapdate) as 根值, connect_by_root snapdate 主根值 from t1 start with (lxzt = 1) connect by (prior snapdate = snapdate - 1 and prior state = state and prior policyno = policyno) order by policyno, snapdate)select policyno, state, 主根值 as start_date, snapdate as end_date from t2 where 是否葉子節(jié)點(diǎn) = 1 order by policyno, snapdate
五、在Oracle10g之前,上面樹狀查詢的關(guān)鍵函數(shù) connect_by_root還不支持,如果使用樹形結(jié)構(gòu),可以通過sys_connect_by_path來實(shí)現(xiàn)
with t as (select a.policyno, a.state, a.snapdate, lag(a.snapdate) over(partition by a.policyno, a.state order by a.snapdate) as lag_tim --case when lag(a.snapdate) over(partition by a.policyno, a.state order by a.snapdate) is null then snapdate else lag(a.snapdate) over(partition by a.policyno, a.state order by a.snapdate) end as lag_tim from zyd.temp_0430 a order by a.policyno, a.snapdate),t1 as (select t.*, case when t.snapdate = t.lag_tim + 1 then 0 else 1 end as lxzt from t order by policyno, snapdate),t2 as (select t1.*, sys_connect_by_path(snapdate, ",") as pt, level, connect_by_isleaf as cb from t1 start with (lxzt = 1) connect by (prior snapdate = snapdate - 1 and prior state = state and prior policyno = policyno))select t2.*, regexp_substr(pt, "[^,]+", 1, 1) as start_date, regexp_substr(pt, "[^,]+", 1, regexp_count(pt, ",")) as end_date from t2 where cb = 1 order by policyno, state;
還有好多其他寫法,這里不再一一列舉!
總結(jié)
到此這篇關(guān)于Oracle數(shù)倉中判斷時(shí)間連續(xù)性的幾種SQL寫法的文章就介紹到這了,更多相關(guān)Oracle數(shù)倉判斷時(shí)間連續(xù)性內(nèi)容請搜索以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持!
