replace - How to remove duplicates in hive string?

Question

Welcome To Ask or Share your Answers For Others

replace - How to remove duplicates in hive string?

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

replace - How to remove duplicates in hive string?

I have a comma-separated column(string) with duplicate values. I want to remove duplicates:
e.g.

column_name
-----------------
gun,gun,man,gun,man
shuttle,enemy,enemy,run
hit,chase

I want result like:

column_name
----------------
gun,man
shuttle,enemy,run
hit,chase

I am using hive database.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T19:56:09+0000

Option 1: keep last occurrence

This will keep the last occurrence of every word.
E.g. 'hello,world,hello,world,hello' will result in 'world,hello'

select  regexp_replace
        (
            column_name
           ,'(?<=^|,)(?<word>.*?),(?=.*(?<=,)\k<word>(?=,|$))'
           ,''
        )

from    mytable
;

+-------------------+
| gun,man           |
| shuttle,enemy,run |
| hit,chase         |
+-------------------+

Option 2: keep first occurrence

This will keep the first occurrence of every word.
E.g. 'hello,world,hello,world,hello' will result in 'hello,world'

select  reverse            
        (
            regexp_replace
            (
                reverse(column_name)
               ,'(?<=^|,)(?<word>.*?),(?=.*(?<=,)\k<word>(?=,|$))'
               ,''
            )
        )

from    mytable
;

Option 3: sorted

E.g. 'Cherry,Apple,Cherry,Cherry,Cherry,Banana,Apple' will result in 'Apple,Banana,Cherry'

select  regexp_replace
        (
            concat_ws(',',sort_array(split(column_name,',')))
           ,'(?<=^|,)(?<word>.*?)(,\k<word>(?=,|$))+'
           ,'${word}'
        )

from    mytable
;

Categories

replace - How to remove duplicates in hive string?

replace - How to remove duplicates in hive string?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Option 1: keep last occurrence

Option 2: keep first occurrence

Option 3: sorted

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags