一. explode单独使用。

1.1. 用于array类型的数据

table_name 表名array_col 为数组类型的字段new_col array_col被explode之后对应的列

select explode(array_col) as new_col from table_name

1.2. 用于map类型的数据

由于map是kay-value结构的,所以它在转换的时候会转换成两列,一列是kay转换而成的,一列是value转换而成的。

table_name 表名map_col 为map类型的字段may_key_col, may_value_col 分别对应map_col被拆分之后的map映射里的key 和 value

select explode(map_col) as (may_key_col, may_value_col) from table_name

二.explode结合lateral view使用

2.1 创建测试表,插入数据

CREATE table student_score(

stu_id string comment '学号',

stu_name string comment '姓名',

courses string comment '各个科目',

scores string comment '各个分数'

) comment '学生成绩表';

insert into student_score values

("1001", "张三","语文,数学,英语,历史,地理", "88,87,94,86,84"),

("1002", "李四", "语文,数学,英语,历史,地理", "78,89,75,79,68"),

("1003", "王五", "语文,数学,英语,历史,地理", "98,97,91,93,92"),

("1004", "朱六", "语文,数学,英语,历史,地理", "66,63,64,67,68");

2.2 测试explode 行转列

select a.stu_id, a.stu_name, table_view.course

from student_score a

lateral view explode(split(courses, ',')) table_view as `course`;

查询每个学生课程对应的分数,使用posexplode函数 先测试使用explode, 看看效果:

select a.stu_id, a.stu_name,

table_view1.course, table_view2.score

from student_score a

lateral view explode(split(courses, ',')) table_view1 as `course`

lateral view explode(split(scores, ',')) table_view2 as `score`;

出现这种情况是因为两个并列的explode的sql没办法识别每个科目对应的成绩是多少,对于多个数组的行转列可以使用posexplode函数。 例如使用如下查询语句:

select stu_id, stu_name, course, score

from student_score

lateral view posexplode(split(courses, ',')) table_view1 as a, course

lateral view posexplode(split(scores, ',')) table_view2 as b, score

where a = b;

三. 进阶使用

1、空格字符串函数:space

语法: space(int n) 返回值: string 说明:返回长度为n的空格字符串 举例:

hive> select space(10) from dual;

hive> select length(space(10)) from dual;

10

2、space函数与split函数结合,得到数组;

space函数与split函数结合,可以得到空格字符串数组 举例:

hive>select split(space(10), '');

[" "," "," "," "," "," "," "," "," "," ",""]

3、如何产生1-100的连续的数字?

结合space函数与split函数,posexplode函数,lateral view函数获得 实现方法一:

select

id_start+pos as id

from(

select

1 as id_start,

100 as id_end

) m lateral view posexplode(split(space(id_end-id_start), '')) t as pos, val

实现方法二:

select

row_number() over() as id

from

(select split(space(99), ' ') as x) t

lateral view

explode(x) ex;

备注:explode(x)和posexplode()均为炸裂函数,区别在于explode炸出一个值,posexplode不仅炸出一个值还附带索引号;

3. 如何产生开始日期到结束日期的连续的日期?

SELECT

DATE_ADD(START_DATE, pos)

FROM (

SELECT DISTINCT

"2023-03-13" AS START_DATE,

"2023-03-23" AS END_DATE

from order_detail

) s1 lateral VIEW posexplode(split(SPACE(DATEDIFF(END_DATE, START_DATE)), " ")) s2 AS pos, null_ele

四. explode和posexplode对比

数据表: 表名:default.class_info,分别是班级,姓名,成绩 3列

单列Explode: 需求:将student这一列中数据由一行变为多行(使用split和explode,并结合lateral view函数实现)

select

class,student_name

from

default.class_info

lateral view explode(split(student,',')) t as student_name

单列Posexplode

需求:想要给每班的每个同学按照顺序来一个编号(使用posexplode函数)

select

class,student_index + 1 as student_index,student_name

from

default.class_info

lateral view posexplode(split(student,',')) t as student_index,student_name;

注意:student_index+1 是因为index是从0开始的

多列Explode 需求:基于学生姓名和分数使其两两匹配,期望得到如下效果。 尝试: 先对两列进行explode

select

class,student_name,student_score

from

default.class_info

lateral view explode(split(student,',')) sn as student_name

lateral view explode(split(score,',')) sc as student_score

结果如下: 不太符合预期,如果对两列都进行explode的话,假设每列都有3个值,最终会变为3*3=9行,但我们实际只想要3行

解决办法:

我们进行两次posexplode,姓名和成绩都保留对应的序号,即便是变成了9行,通过where筛选只保留行号相同的index即可。

select

class,student_name,student_score

from

default.class_info

lateral view posexplode(split(student,',')) sn as student_index_sn,student_name

lateral view posexplode(split(score,',')) sc as student_index_sc,student_score

where

student_index_sn = student_index_sc;

需求:假设我们又想对同学的成绩进行一下排名(借助rank( ) 函数 )

select

class,

student_name,

student_score,

rank() over(partition by class order by student_score desc) as student_rank

from

default.class_info

lateral view posexplode(split(student,',')) sn as student_index_sn,student_name

lateral view posexplode(split(score,',')) sc as student_index_sc,student_score

where

student_index_sn = student_index_sc

order by class,student_rank;

补充:

若没有spilt函数,可能会用到array()

lateral view

posexplode (array()) as as f_keys,f_values

user_id bus_ bike_ taxi_ train_ 1001 503 89 708 2054 1002 24 17 1008 500 1003 80 50 500 400

user_id pay_type Amount 1001 1 503 1001 2 89 1001 3 708 1001 4 2054

select

user_id,

index_+1 as pay_type,

value as amount

from pay_record_log

lateral view

posexplode(ARRAY(bus_,bike_,taxi_,train_))t as index_,value

推荐文章

评论可见,请评论后查看内容,谢谢!!!
 您阅读本篇文章共花了: