Length of Arrays(Set), Map and String in Hive

Recently I was asked to calculate quantity of items inside each cell of a table, in face it is calculating the length of each cell. Some columns stores value in string, some are in array, some are in map, each type returns different value when they are empty.

Data Preparation

Assume we have a table that has four kinds of data:

  • each cell of the 1st row would contains data that has multiple values;
  • each cell of the 2nd row would contains only one value;
  • each cell of the 3rd row would be empty (not null);
  • each cell of the 4th row would be null.
select *
from tmp.ym_test

Result:

arraykey_valuestring
["item1","item2","item3"]{"key1":"value1","key2":"value2"}a sample string
["item1"]{"key1":"value1"}string
[]{}
nullnullnull

Calculating Length

select size(array), size(key_value), length(string)
from tmp.ym_test

Result:

arraykey_valuestring
3215
116
000
-1-1null

Conclusion

Based on the result above, the conclusion is clear that:

  1. 1st row & 2nd row:
    • length of an array or a map is the quantity of items of that value;
    • length of a string is the quantity of characters of that string, including spaces
  1. 3rd row:

    • all returns 0
  2. 4th row:

    • length of an array or a map value is -1;
    • still null for a string value

Not Over Yet

According to the last two conclusions, because of null values of array and map would be calculated as -1 by its length, if i wanted to calculate the sum of length of that row, the result would be short, because null is 0 to me but -1 to the computer; Besides that, the length of a string would be meaningless to me if it returns how many characters are there in the string, i only need to know that string is not empty, or the length of it is not 0, so just return 1 would be good for me. So modified version of the query would be:

select 
if(size(array) == -1, 0, size(array)), 
if(size(key_value) == -1, 0, size(key_value)), 
if(length(string) is null, 0, 1)
from tmp.ym_test

Result:

arraykey_valuestring
321
111
000
000

This looks much better.

No Comments Yet