Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
2.8k views
in Technique[技术] by (71.8m points)

hive sql - how to select the first n elements in hive array column and return the selected array

Please consider a hive table - Table as mentioned below.

user_id  interest_array
tom      [a,b,c,d,g,w]
bob      [e,d,s,d,g,w,s]
cat      [a]
harry    []
peter    NULL

I want to select the first 3 elements by sequence in 'interest_array' per row and return it as a array, the outout to be like below

user_id  output_array
tom      [a,b,c]
bob      [e,d,s]
cat      [a]
harry    []
peter    NULL

PS: the last two rows are not important, they are just corner case, I can just set them null if necessary.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

1. Simple method, but it will not work correctly if initial array can contain less elements ( result array will contain NULLs ).

 with mydata as(
    select array('a','b','c','d','g','w') as original_array
    )
    
    select original_array, array(original_array[0], original_array[1], original_array[2]) as first_3_array
    from mydata

Result:

original_array              first_3_array
["a","b","c","d","g","w"]   ["a","b","c"]

2. One more method using explode, works correctly with any arrays:

Explode array using posexplode, filter position<=2, collect array again:

with mydata as(
select array('a','b','c','d','g','w') as original_array
)

select original_array, collect_list(e.element) as first_3_array
from mydata
     lateral view outer posexplode(original_array) e as pos, element
where pos<=2
group by original_array

Result:

original_array              first_3_array
["a","b","c","d","g","w"]   ["a","b","c"]

3. More efficient method, without explode: Concatenate array with comma delimiter, use regexp to extract substring with up to 3 first elements, split again:

with mydata as(
select array('a') as original_array
)

select original_array, split(regexp_replace(regexp_extract(concat_ws(',', original_array),
                                                           '^(([^,]*,?){1,3})',1),
                                            ',$','') --remove last delimiter
                             ,',') as first_3_array
from mydata 

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...