1
moyi97 333 天前
rownumber 确定第一条, lag 或者 lead 做个判断 取断掉那条, 应该就可以实现了吧
|
2
Jinnrry 333 天前 via Android
你这个第一条 end 和第二条 start 也不相等啊,这不差了 2 秒吗?
如果开始结束时间完全一致的时候,先分列,把 starttime 和 endtime 合并成一个字段,然后计数,如果某个时间点计数值是 1 ,那就说明不连续。 |
3
shuaishuaivip OP Hive SQL 实现问题:数据展示如下:
name1,start,end,start_time,end_time LSJA24390MS139778 55.8 55.8 ,2024-01-27 08:44:23 2024-01-27 08:44:23 LSJA24390MS139778 55.8 55.9, 2024-01-27 08:44:25 2024-01-27 08:44:35 LSJA24390MS139778 55.9 55.9 ,2024-01-27 08:44:37 2024-01-27 08:44:47 LSJA24390MS139778 55.9 56 , 2024-01-27 08:44:49 2024-01-27 08:44:51 LSJA24390MS139778 56 56.1, 2024-01-27 08:44:53 2024-01-27 08:45:11 LSJA24390MS139778 56.1 56.1, 2024-01-27 08:45:13 2024-01-27 08:45:21 LSJA24390MS139778 56.1 56.1, 2024-01-27 08:45:23 2024-01-27 08:45:23 LSJA24390MS139778 56.1 56.3, 2024-01-27 08:45:25 2024-01-27 08:45:51 LSJA24390MS139778 56.3 56.5, 2024-01-27 08:45:53 2024-01-27 08:46:21 LSJA24390MS139778 56.5 56.5, 2024-01-27 08:46:23 2024-01-27 08:46:23 LSJA24390MS139778 56.5 56.6, 2024-01-27 08:46:25 2024-01-27 08:46:51 LSJA24390MS139778 56.7 56.7, 2024-01-27 08:46:53 2024-01-27 08:46:59 LSJA24390MS139778 56.7 56.8, 2024-01-27 08:47:01 2024-01-27 08:47:11 LSJA24390MS139778 56.8 56.8, 2024-01-27 08:47:13 2024-01-27 08:47:21 LSJA24390MS139778 56.8 56.8, 2024-01-27 08:47:23 2024-01-27 08:47:23 LSJA24390MS139778 56.8 56.9, 2024-01-27 08:47:25 2024-01-27 08:47:35 LSJA24390MS139778 56.9 57, 2024-01-27 08:47:37 2024-01-27 08:47:47 LSJA24390MS139778 57 57, 2024-01-27 08:47:49 2024-01-27 08:47:51 LSJA24390MS139778 57 57, 2024-01-27 08:47:53 2024-01-27 08:47:59 LSJA24390MS139778 57 57.1, 2024-01-27 08:48:01 2024-01-27 08:48:11 LSJA24390MS139778 57.1 57.2, 2024-01-27 08:48:13 2024-01-27 08:48:21 LSJA24390MS139778 57.2 57.2, 2024-01-27 08:48:23 2024-01-27 08:48:23 LSJA24390MS139778 57.2 57.3, 2024-01-27 08:48:25 2024-01-27 08:48:35 LSJA24390MS139778 57.2 57.3, 2024-01-27 08:48:37 2024-01-27 08:48:47 实现成这样 name1,start,end,start_time,end_time LSJA24390MS139778 55.8 56.6 2024-01-27 08:44:23 2024-01-27 08:46:51 LSJA24390MS139778 56.7 57.3 2024-01-27 08:46:53 2024-01-27 08:48:35 LSJA24390MS139778 57.2 57.3 2024-01-27 08:48:37 2024-01-27 08:48:47 |
4
shuaishuaivip OP start,和 end 说的不是时间哦
|
5
ericzy 333 天前 via Android
-- 创建一个临时表,用 lag 函数获取上一行的 end 值
with temp as ( select name1, start, end, start_time, end_time, lag(end) over (partition by name1 order by start_time) as prev_end from your_table ) -- 从临时表中筛选出断开的数据,即 start 不等于 prev_end 的数据 select name1, start, end, start_time, end_time from temp where start != prev_end or prev_end is null -- 来自 gpt ,仔细查看和使用 |
6
shuaishuaivip OP @ericzy gpt 我都翻烂了~行不通
|
7
BeiChuanAlex 333 天前
一张表解决不了就 2 张表分层实现
|
8
hpkaiq 333 天前
with t2 as (
select *, lead(`start`) over (partition by name order by `end_time`) as next_start from your_table ) ,t3 as ( select name,start_time,end_time from t2 where `end` != next_start or next_start is null ) ,t4 as (select * ,lag(`end_time`) over (partition by name order by `end_time`) as pre_end_time from t3) select name,step_end_time, min(`start`) `start`,max(`end`) `end`, min(start_time) start_time,max(end_time) end_time from ( select t2.*,step_end_time from ( select name,end_time step_end_time,if(pre_end_time is null,'1970-01-01 00:00:00',pre_end_time) step_start_time from t4 )step join t2 on step.name = t2.name and t2.end_time <= step_end_time and t2.start_time > step_start_time )tmp group by name,step_end_time order by name,step_end_time 小试一把,不知道行不行 |
9
hero1874 333 天前
st 就是 start ed 是 end
我是用 doris 写的 所以下面 0 在 hive 里应该是 null 把判断是否为 0 改成是否为 null 第一步 用 lag 函数取出上一个的 ed 值 select name, st, ed, start_time, end_time, lag(ed,1,0) over( partition by name order by start_time ) lag_ed1 -- 上一行的 ed 值 from form6 order by start_time 第二步 select t1.*, sum( case when lag_ed1 = 0 then 1 -- 第一行 when lag_ed1 !=st then 1 -- 断开了 when lag_ed1 = st then 0 -- 连续 end ) over( partition by name order by start_time) num from (select name, st, ed, start_time, end_time, lag(ed,1,0) over( partition by name order by start_time ) lag_ed1 from form6 order by start_time) t1 order by start_time 第三步 根据 name 和判断是否连续或者断开分组取数 select name, min(st) st, max(ed) ed, min(start_time)start_time, max(end_time) end_time from ( select t1.*, sum( case when lag_ed1 = 0 then 1 when lag_ed1 !=st then 1 when lag_ed1 = st then 0 end ) over( partition by name order by start_time) num from (select name, st, ed, start_time, end_time, lag(ed,1,0) over( partition by name order by start_time ) lag_ed1 from form6 order by start_time) t1 order by start_time )t3 group by name ,num |
10
lusansan 333 天前
select
name1 ,MIN(start_number) as start_number ,MAX(end_number) as end_number ,MIN(start_time) as start_time ,MAX(end_time) as end_time from ( SELECT * ,case when start_number is null then row_number() over (partition by name1 order by start_time) -1 else row_number() over (partition by name1 order by start_time) end as rn from ( SELECT name1 ,start_number ,end_number ,start_time ,end_time ,lead(start_number,1) over (partition by name1 order by start_time) lead_start_number FROM ( select name1,start_number,end_number,start_time,end_time from values ('LSJA24390MS139778' ,55.8,55.8 ,'2024-01-27 08:44:23' ,'2024-01-27 08:44:23') ,('LSJA24390MS139778' ,NULL,55.9 ,'2024-01-27 08:44:25' ,'2024-01-27 08:44:35') ,('LSJA24390MS139778' ,NULL,55.9 ,'2024-01-27 08:44:37' ,'2024-01-27 08:44:47') ,('LSJA24390MS139778' ,NULL,56 ,'2024-01-27 08:44:49' ,'2024-01-27 08:44:51') ,('LSJA24390MS139778' ,NULL,56.1 ,'2024-01-27 08:44:53' ,'2024-01-27 08:45:11') ,('LSJA24390MS139778' ,NULL,56.1 ,'2024-01-27 08:45:13' ,'2024-01-27 08:45:21') ,('LSJA24390MS139778' ,NULL,56.1 ,'2024-01-27 08:45:23' ,'2024-01-27 08:45:23') ,('LSJA24390MS139778' ,NULL,56.3 ,'2024-01-27 08:45:25' ,'2024-01-27 08:45:51') ,('LSJA24390MS139778' ,NULL,56.5 ,'2024-01-27 08:45:53' ,'2024-01-27 08:46:21') ,('LSJA24390MS139778' ,NULL,56.5 ,'2024-01-27 08:46:23' ,'2024-01-27 08:46:23') ,('LSJA24390MS139778' ,NULL,56.6 ,'2024-01-27 08:46:25' ,'2024-01-27 08:46:51') ,('LSJA24390MS139778' ,56.7,56.7 ,'2024-01-27 08:46:53' ,'2024-01-27 08:46:59') ,('LSJA24390MS139778' ,NULL,56.8 ,'2024-01-27 08:47:01' ,'2024-01-27 08:47:11') ,('LSJA24390MS139778' ,NULL,56.8 ,'2024-01-27 08:47:13' ,'2024-01-27 08:47:21') ,('LSJA24390MS139778' ,NULL,56.8 ,'2024-01-27 08:47:23' ,'2024-01-27 08:47:23') ,('LSJA24390MS139778' ,NULL,56.9 ,'2024-01-27 08:47:25' ,'2024-01-27 08:47:35') ,('LSJA24390MS139778' ,NULL,57 ,'2024-01-27 08:47:37' ,'2024-01-27 08:47:47') ,('LSJA24390MS139778' ,NULL,57 ,'2024-01-27 08:47:49' ,'2024-01-27 08:47:51') ,('LSJA24390MS139778' ,NULL,57 ,'2024-01-27 08:47:53' ,'2024-01-27 08:47:59') ,('LSJA24390MS139778' ,NULL,57.1 ,'2024-01-27 08:48:01' ,'2024-01-27 08:48:11') ,('LSJA24390MS139778' ,NULL,57.2 ,'2024-01-27 08:48:13' ,'2024-01-27 08:48:21') ,('LSJA24390MS139778' ,NULL,57.2 ,'2024-01-27 08:48:23' ,'2024-01-27 08:48:23') ,('LSJA24390MS139778' ,NULL,57.3 ,'2024-01-27 08:48:25' ,'2024-01-27 08:48:35') ,('LSJA24390MS139778' ,57.2,57.3 ,'2024-01-27 08:48:37' ,'2024-01-27 08:48:47') as t (name1,start_number,end_number,start_time,end_time) ) p1 ) p2 WHERE start_number is not null or lead_start_number is not null ) t1 group by name1,rn ; name1 start_number end_number start_time end_time LSJA24390MS139778 55.8 56.6 2024-01-27 08:44:23 2024-01-27 08:46:51 LSJA24390MS139778 56.7 57.3 2024-01-27 08:46:53 2024-01-27 08:48:35 LSJA24390MS139778 57.2 57.3 2024-01-27 08:48:37 2024-01-27 08:48:47 |
11
shuaishuaivip OP @lusansan 大佬,这个可行。
|