码迷,mamicode.com
首页 > 其他好文 > 详细

hive 之 Cube, Rollup介绍

时间:2019-06-20 18:49:06      阅读:418      评论:0      收藏:0      [点我收藏+]

标签:channel   函数   介绍   details   column   结果   组合   lang   oid   

1. GROUPING SETS

GROUPING SETS作为GROUP BY的子句,允许开发人员在GROUP BY语句后面指定多个统维度,可以简单理解为多条group by语句通过union all把查询结果聚合起来结合起来。

为方便理解,以testdb.test_1为例:

hive> use testdb;
hive> desc test_1;

user_id        string ? ? ?id ? ? ? ? ? ? ? ?
device_id ? ? ?string ? ? ?设备类型:手机、平板? ? ? ? ? ? ?
os_id ? ? ? ? ?string ? ? ?操作系统类型:ios、android ? ? ? ? ? ?
app_id ? ? ? ? string? ? ? 手机app_id? ? ? ? ? ? ?
client_v ?     string ? ? ?客户端版本? ? ? ? ? ? ?
channel ? ? ? ?string ? ? ?渠道
grouping sets语句 等价hive语句
select device_id,os_id,app_id,count(user_id) from ?test_1 group by device_id,os_id,app_id grouping sets((device_id))? SELECT device_id,null,null,count(user_id) FROM test_1 group by device_id
select device_id,os_id,app_id,count(user_id) from ?test_1 group by device_id,os_id,app_id grouping sets((device_id,os_id)) SELECT device_id,os_id,null,count(user_id) FROM test_1 group by?device_id,os_id
select device_id,os_id,app_id,count(user_id) from ?test_1 group by device_id,os_id,app_id grouping sets((device_id,os_id),(device_id)) SELECT device_id,os_id,null,count(user_id) FROM test_1 group by device_id,os_id?UNION ALL?SELECT device_id,null,null,count(user_id) FROM test_1 group by device_id
select device_id,os_id,app_id,count(user_id) from ?test_1 group by device_id,os_id,app_id grouping sets((device_id),(os_id),(device_id,os_id),()) SELECT device_id,null,null,count(user_id) FROM test_1 group by device_id?UNION ALL?SELECT null,os_id,null,count(user_id) FROM test_1 group by os_id?UNION ALL?SELECT device_id,os_id,null,count(user_id) FROM test_1 group by device_id,os_id??UNION ALL?SELECT null,null,null,count(user_id) FROM test_1

2. CUBE函数

cube简称数据魔方,可以实现hive多个任意维度的查询,cube(a,b,c)则首先会对(a,b,c)进行group by,然后依次是(a,b),(a,c),(a),(b,c),(b),(c),最后在对全表进行group by,cube会统计所选列中值的所有组合的聚合

select device_id,os_id,app_id,client_v,channel,count(user_id)?
from test_1?
group by device_id,os_id,app_id,client_v,channel with cube;

等价于:

SELECT device_id,null,null,null,null ,count(user_id) FROM test_1 group by device_id
UNION ALL
SELECT null,os_id,null,null,null ,count(user_id) FROM test_1 group by os_id
UNION ALL
SELECT device_id,os_id,null,null,null ,count(user_id) FROM test_1 group by device_id,os_id
UNION ALL
SELECT null,null,app_id,null,null ,count(user_id) FROM test_1 group by app_id
UNION ALL
SELECT device_id,null,app_id,null,null ,count(user_id) FROM test_1 group by device_id,app_id
UNION ALL
SELECT null,os_id,app_id,null,null ,count(user_id) FROM test_1 group by os_id,app_id
UNION ALL
SELECT device_id,os_id,app_id,null,null ,count(user_id) FROM test_1 group by device_id,os_id,app_id
UNION ALL
SELECT null,null,null,client_v,null ,count(user_id) FROM test_1 group by client_v
UNION ALL
SELECT device_id,null,null,client_v,null ,count(user_id) FROM test_1 group by device_id,client_v
UNION ALL
SELECT null,os_id,null,client_v,null ,count(user_id) FROM test_1 group by os_id,client_v
UNION ALL
SELECT device_id,os_id,null,client_v,null ,count(user_id) FROM test_1 group by device_id,os_id,client_v
UNION ALL
SELECT null,null,app_id,client_v,null ,count(user_id) FROM test_1 group by app_id,client_v
UNION ALL
SELECT device_id,null,app_id,client_v,null ,count(user_id) FROM test_1 group by device_id,app_id,client_v
UNION ALL
SELECT null,os_id,app_id,client_v,null ,count(user_id) FROM test_1 group by os_id,app_id,client_v
UNION ALL
SELECT device_id,os_id,app_id,client_v,null ,count(user_id) FROM test_1 group by device_id,os_id,app_id,client_v
UNION ALL
SELECT null,null,null,null,channel ,count(user_id) FROM test_1 group by channel
UNION ALL
SELECT device_id,null,null,null,channel ,count(user_id) FROM test_1 group by device_id,channel
UNION ALL
SELECT null,os_id,null,null,channel ,count(user_id) FROM test_1 group by os_id,channel
UNION ALL
SELECT device_id,os_id,null,null,channel ,count(user_id) FROM test_1 group by device_id,os_id,channel
UNION ALL
SELECT null,null,app_id,null,channel ,count(user_id) FROM test_1 group by app_id,channel
UNION ALL
SELECT device_id,null,app_id,null,channel ,count(user_id) FROM test_1 group by device_id,app_id,channel
UNION ALL
SELECT null,os_id,app_id,null,channel ,count(user_id) FROM test_1 group by os_id,app_id,channel
UNION ALL
SELECT device_id,os_id,app_id,null,channel ,count(user_id) FROM test_1 group by device_id,os_id,app_id,channel
UNION ALL
SELECT null,null,null,client_v,channel ,count(user_id) FROM test_1 group by client_v,channel
UNION ALL
SELECT device_id,null,null,client_v,channel ,count(user_id) FROM test_1 group by device_id,client_v,channel
UNION ALL
SELECT null,os_id,null,client_v,channel ,count(user_id) FROM test_1 group by os_id,client_v,channel
UNION ALL
SELECT device_id,os_id,null,client_v,channel ,count(user_id) FROM test_1 group by device_id,os_id,client_v,channel
UNION ALL
SELECT null,null,app_id,client_v,channel ,count(user_id) FROM test_1 group by app_id,client_v,channel
UNION ALL
SELECT device_id,null,app_id,client_v,channel ,count(user_id) FROM test_1 group by device_id,app_id,client_v,channel
UNION ALL
SELECT null,os_id,app_id,client_v,channel ,count(user_id) FROM test_1 group by os_id,app_id,client_v,channel
UNION ALL
SELECT device_id,os_id,app_id,client_v,channel ,count(user_id) FROM test_1 group by device_id,os_id,app_id,client_v,channel
UNION ALL
SELECT null,null,null,null,null ,count(user_id) FROM test_1

3. ROLL UP函数

rollup可以实现从右到左递减多级的统计,显示统计某一层次结构的聚合

select device_id,os_id,app_id,client_v,channel,count(user_id)?
from test_1?
group by device_id,os_id,app_id,client_v,channel with rollup;

等价于:

select device_id,os_id,app_id,client_v,channel,count(user_id)?
from test_1?
group by device_id,os_id,app_id,client_v,channel?
grouping sets ((device_id,os_id,app_id,client_v,channel),(device_id,os_id,app_id,client_v),(device_id,os_id,app_id),(device_id,os_id),(device_id),());

4.Grouping_ID函数

当我们没有统计某一列时,它的值显示为null,这可能与列本身就有null值冲突,这就需要一种方法区分是没有统计还是值本来就是null。(写一个排列组合的算法,就马上理解了,grouping_id其实就是所统计各列二进制和)

例子如下:

Column1 (key) Column2 (value)
1 NULL
1 1
2 2
3 3
3 NULL
4 5

hql统计:

 ?SELECT key, value, GROUPING_ID, count(*) from T1 GROUP BY key, value WITH ROLLUP

结果如下:

?key value GROUPING_ID? count(*)?
NULL NULL 0 ? ? 00 6
1 NULL 1 ? ? 10 2
1 NULL 3 ? ? 11 1
1 1 3 ? ? 11 1
2 NULL 1 ? ? 10 1
2 2 3 ? ? 11 1
3 NULL 1 ? ? 10 2
3 NULL 3 ? ? 11 1
3 3 3 ? ? 11 1
4 NULL 1 ? ? 10 1
4 5 3 ? ? 11 1

GROUPING_ID转变为二进制,如果对应位上有值为null,说明这列本身值就是null。(通过类DataFilterNull.py 扫描,可以筛选过滤掉列中null、“”统计结果),

5. 窗口函数

hive窗口函数,感觉大部分都是在模仿oracle,有对oracle熟悉的,应该看下就知道怎么用。

具体参见:http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.0.2/ds_Hive/language_manual/ptf-window.html

参考文章

  1. https://blog.csdn.net/gua___gua/article/details/52523698

hive 之 Cube, Rollup介绍

标签:channel   函数   介绍   details   column   结果   组合   lang   oid   

原文地址:https://www.cnblogs.com/remainsu/p/11060580.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!