其他
实战案例:如何使用 Hive 进行 OLAP 分析?
The following article is from 大数据技术与数仓 Author 西贝
写在前面
如何实现数据的多维分析
一、简单介绍
SELECT a,
b,
SUM(c)
FROM tab1
GROUP BY a,b
GROUPING SETS ((a,b), a,b, ());
-- 与GROUP BY等价关系
SELECT a, b, SUM( c ) FROM tab1 GROUP BY a, b
UNION
SELECT a, null, SUM( c ) FROM tab1 GROUP BY a, null
UNION
SELECT null, b, SUM( c ) FROM tab1 GROUP BY null, b
UNION
SELECT null, null, SUM( c ) FROM tab1;
create table test_grouping__id(id int,amount decimal(10,2));
-- 插入数据
insert into table test_grouping__id values(1,null),(1,1),(2,2),(3,3),(3,null),(4,5);
--执行查询
SELECT id,
amount,
grouping__id,
count(*) cnt
FROM test_grouping__id
GROUP BY id,
amount
GROUPING sets(id,(id,amount),())
ORDER BY grouping__id
查询结果分析
示例模板
SELECT a,
b,
c
SUM(d)
FROM tab1
GROUP BY a,b,c
WITH ROLLUP
-- 等价于下面的方式
SELECT a,
b,
c,
SUM(d)
FROM tab1
GROUP BY a,b,c
GROUPING SETS ((a,b,c), (a,b), (a),());
CUBE
SELECT a,
b,
c
SUM(d)
FROM tab1
GROUP BY a,b,c
WITH CUBE
-- 等价于下面的方式
SELECT a,
b,
c,
SUM(d)
FROM tab1
GROUP BY a,b,c
GROUPING SETS ((a,b,c),(a,b),(b,c), (a,c),(a),(b),(c),());
二、使用案例
数据准备
字段名 | 列名称 | 说明 |
---|---|---|
user_id | 用户ID | 整数类型,用户ID |
item_id | 商品ID | 整数类型,商品ID |
category_id | 商品类目ID | 整数类型,商品所属类目ID |
behavior | 行为类型 | 字符串,枚举类型,包括(‘pv’, ‘buy’, ‘cart’, ‘fav’) |
access_time | 时间戳 | 行为发生的时间戳,单位秒 |
CREATE TABLE user_behavior
(
user_id int ,
item_id int,
category_id int,
behavior string,
access_time string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
-- 装载数据
1,101,1,pv,1511658000
2,102,1,pv,1511658000
3,103,1,pv,1511658000
4,104,2,cart,1511659329
5,105,2,buy,1511659326
6,106,3,fav,1511659323
7,101,1,pv,1511658010
8,102,1,buy,1511658200
9,103,1,cart,1511658030
10,107,3,fav,1511659332
GROUPING SETS使用
-- 查询每种用户行为的访问次数
SELECT
item_id,
category_id,
behavior,
COUNT(*) AS cnt,
GROUPING__ID
FROM user_behavior
GROUP BY item_id,category_id,behavior
GROUPING SETS ((category_id,behavior),behavior)
ORDER BY GROUPING__ID;
ROLLUP使用
-- 查询每种商品品类、每种用户行为的次数
-- 查询用户的总访问次数
SELECT
category_id,
behavior,
COUNT(*) AS cnt,
GROUPING__ID
FROM user_behavior
GROUP BY category_id,behavior
WITH ROLLUP
ORDER BY GROUPING__ID;
CUBE使用
-- 查询每种用户行为的次数
-- 查询每种商品品类、每种用户行为的次数
-- 查询用户的总访问次数
SELECT
category_id,
behavior,
COUNT(*) AS cnt,
GROUPING__ID
FROM user_behavior
GROUP BY category_id,behavior
WITH CUBE
ORDER BY GROUPING__ID;