Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
532 views
in Technique[技术] by (71.8m points)

hiveql - Storing and Query Blank Values in Hive Columns

I have a requirement for storing blank strings of length 1, 2, and 3 in some columns of my Hive table.

Storing:

  1. If my column type is char, then I see that the data is always trimmed before storing. i.e. length(column) is always 0
  2. If my column type is varchar then the data is not trimmed. so length(column) is 1, 2 and 3 respectively. So that solves my storing problem.

Querying:

  1. I am unable to query the column by value. say. select * from hive table where column = ' '; it only works if I do something like select * from hive table where length(column) > 0 and trim(column) = '';

Is there a way to handle this separately ? say I want to query those records where column value is of a blank string of length 3? How do I do this?

This is what i Tried (Note that the issues seems to be when the file is stored as parquet)

CREATE EXTERNAL TABLE IF NOT EXISTS DUMMY5 (
  col1 varchar(3)) 
  STORED AS PARQUET
LOCATION "/DUMMY5";

insert into DUMMY5 values ('  '); // 2 character strings
insert into DUMMY5 values ('   '); //3 character strings

select col1, length(col1) from DUMMY5;
+-------+------+--+
| col1  | _c1  |
+-------+------+--+
|       | 3    |
|       | 2    |
+-------+------+--+

select col1, length(col1) from DUMMY5 where col1 = '  '; // 0 record
select col1, length(col1) from DUMMY5  where col1 = '   '; // 0 record

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Running Hive 2.1.1

drop table dummy_tbl;
CREATE TABLE  dummy_tbl (
  col1 char(1),
  col2 varchar(1),
  col3 char(3),
  col4 varchar(3)) ;
  
  insert into dummy_tbl values ('  ', '  ', '  ', '  ');
  
   select  length(col1),  length(col2),  length(col3), length(col4) from dummy_tbl;

Result:

c0  c1  c2  c3
0   1   0   2

Varchar column works absolutely correct. col2 was trimmed on insert, it is documented. col4 varchar(2) works correctly, this query returns 1:

 select count(*) from dummy_tbl where col4='  '; --returns 1

And length of all char columns shows 0 and comparison ignoring spaces like it is documented:

select  count(*) from dummy_tbl where col1=' '; --single space --returns 1
select  count(*) from dummy_tbl where col1='  '; --two spaces --also returns 1 because it is ignoring spaces

You can use varchar with proper length. Or STRING type if you not sure about length.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...