Skip to content

Commit

Permalink
add some useful array functions. revise readme document and project v…
Browse files Browse the repository at this point in the history
…ersion.
  • Loading branch information
aaronshan committed Jul 27, 2016
1 parent 8c979e4 commit 56821a2
Show file tree
Hide file tree
Showing 17 changed files with 1,546 additions and 16 deletions.
65 changes: 54 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Introduction

Some useful custom hive udf functions, especial json functions.
Some useful custom hive udf functions, especial array and json functions.

> Note:
> hive-third-functions support hive-0.11.0 or higher.
Expand Down Expand Up @@ -35,7 +35,7 @@ It will generate hive-third-functions-${version}-shaded.jar in target directory.

You can also directly download file from [release page](https://github.com/aaronshan/hive-third-functions/releases).

> current latest version is `2.0.0`
> current latest version is `2.1.0`
## Functions

Expand All @@ -52,6 +52,19 @@ You can also directly download file from [release page](https://github.com/aaron
| function| description |
|:--|:--|
|array_contains(array, value) -> boolean | whether ARRAY contains value or not.|
|array_intersect(array, array) -> array | returns the two array's intersection, without duplicates..|
|array_max(array<E>) -> E | returns the maximum value of input array.|
|array_min(array<E>) -> E | returns the minimum value of input array.|
|array_join(array, delimiter, null_replacement) -> string | concatenates the elements of the given array using the delimiter and an optional `null_replacement` to replace nulls.|
|array_distinct(array) -> array | remove duplicate values from the array.|
|array_position(array, value) -> long | returns the position of the first occurrence of the element in array (or 0 if not found).|
|array_remove(array, value) -> array | remove all elements that equal element from array.|
|array_reverse(array) -> array | reverse the array element.|
|array_sort(array) -> array | sorts and returns the array x. The elements of x must be orderable..|
|array_concat(array, array) -> array | concatenates two arrays.|
|array_value_count(array, value) -> long | count ARRAY's element number that element value equals given value.|
|array_slice(array, start, length) -> array | subsets array starting from index start (or starting from the end if start is negative) with a length of length|
|array_element_at(array<E>, index) -> E | returns element of array at given index. If index < 0, element_at accesses elements from the last to the first.|

### 3. date functions

Expand All @@ -62,7 +75,7 @@ You can also directly download file from [release page](https://github.com/aaron
|zodiac_cn(date_string \| date) -> string | convert date to zodiac chinese |
|type_of_day(date_string \| date) -> string | for chinese. 获取日期的类型(1: 法定节假日, 2: 正常周末, 3: 正常工作日 4:攒假的工作日),错误返回-1. |

### 4. JSON functions
### 4. json functions
| function| description |
|:--|:--|
|json_array_get(json, jsonPath) -> array(varchar) |returns the element at the specified index into the `json_array`. The index is zero-based.|
Expand All @@ -73,7 +86,7 @@ You can also directly download file from [release page](https://github.com/aaron
|json_extract_scalar(json, jsonPath) -> array(varchar) |like `json_extract`, but returns the result value as a string (as opposed to being encoded as JSON).|
|json_size(json, jsonPath) -> array(varchar) |like `json_extract`, but returns the size of the value. For objects or arrays, the size is the number of members, and the size of a scalar value is zero.|

### 5. China Id Card functions
### 5. china id card functions

| function| description |
|:--|:--|
Expand All @@ -90,8 +103,21 @@ You can also directly download file from [release page](https://github.com/aaron
Put these statements into `${HOME}/.hiverc` or exec its on hive cli env.

```
add jar ${jar_location_dir}/hive-third-functions-1.0-SNAPSHOT-shaded.jar
add jar ${jar_location_dir}/hive-third-functions-${version}-shaded.jar
create temporary function array_contains as 'cc.shanruifeng.functions.array.UDFArrayContains';
create temporary function array_intersect as 'cc.shanruifeng.functions.array.UDFArrayIntersect';
create temporary function array_max as 'cc.shanruifeng.functions.array.UDFArrayMax';
create temporary function array_min as 'cc.shanruifeng.functions.array.UDFArrayMin';
create temporary function array_join as 'cc.shanruifeng.functions.array.UDFArrayJoin';
create temporary function array_distinct as 'cc.shanruifeng.functions.array.UDFArrayDistinct';
create temporary function array_position as 'cc.shanruifeng.functions.array.UDFArrayPosition';
create temporary function array_remove as 'cc.shanruifeng.functions.array.UDFArrayRemove';
create temporary function array_reverse as 'cc.shanruifeng.functions.array.UDFArrayReverse';
create temporary function array_sort as 'cc.shanruifeng.functions.array.UDFArraySort';
create temporary function array_concat as 'cc.shanruifeng.functions.array.UDFArrayConcat';
create temporary function array_value_count as 'cc.shanruifeng.functions.array.UDFArrayValueCount';
create temporary function array_slice as 'cc.shanruifeng.functions.array.UDFArraySlice';
create temporary function array_element_at as 'cc.shanruifeng.functions.array.UDFArrayElementAt';
create temporary function day_of_week as 'cc.shanruifeng.functions.date.UDFDayOfWeek';
create temporary function type_of_day as 'cc.shanruifeng.functions.date.UDFTypeOfDay';
create temporary function zodiac_cn as 'cc.shanruifeng.functions.date.UDFZodiacSignCn';
Expand All @@ -117,18 +143,18 @@ create temporary function id_card_info as 'cc.shanruifeng.functions.card.UDFChin

You can use these statements on hive cli env get detail of function.
```
hive> describe function zodiacCn;
zodiacCn(date) - from the input date string or separate month and day arguments, returns the sing of the Zodiac.
hive> describe function zodiac_cn;
zodiac_cn(date) - from the input date string or separate month and day arguments, returns the sing of the Zodiac.
```

or

```
hive> describe function extended zodiacCn;
zodiacCn(date) - from the input date string or separate month and day arguments, returns the sing of the Zodiac.
hive> describe function extended zodiac_cn;
zodiac_cn(date) - from the input date string or separate month and day arguments, returns the sing of the Zodiac.
Example:
> select zodiacCn(date_string) from src;
> select zodiacCn(month, day) from src;
> select zodiac_cn(date_string) from src;
> select zodiac_cn(month, day) from src;
```

### example
Expand All @@ -148,6 +174,23 @@ select zodiac_cn('1989-01-08') => 魔羯座
select zodiac_en('1989-01-08') => Capricorn
```

```
select array_contains(array(16,12,18,9), 12) => true
select array_intersect(array(16,12,18,9,null), array(14,9,6,18,null)) => [null,9,18]
select array_max(array(16,13,12,13,18,16,9,18)) => 18
select array_min(array(16,12,18,9)) => 9
select array_join(array(16,12,18,9,null), '#','=') => 16#12#18#9#=
select array_distinct(array(16,13,12,13,18,16,9,18)) => [9,12,13,16,18]
select array_position(array(16,13,12,13,18,16,9,18), 13) => 2
select array_remove(array(16,13,12,13,18,16,9,18), 13) => [16,12,18,16,9,18]
select array_reverse(array(16,12,18,9)) => [9,18,12,16]
select array_sort(array(16,13,12,13,18,16,9,18)) => [9,12,13,13,16,16,18,18]
select array_concat(array(16,12,18,9,null), array(14,9,6,18,null)) => [16,12,18,9,null,14,9,6,18,null]
select array_value_count(array(16,13,12,13,18,16,9,18), 13) => 2
select array_slice(array(16,13,12,13,18,16,9,18), -2, 3) => [9,18]
select array_element_at(array(16,13,12,13,18,16,9,18), -1) => 18
```

```
select id_card_info('110101198901084517') => {"valid":true,"area":"东城区","province":"北京市","gender":"男","city":"北京市"}
```
Expand Down
14 changes: 13 additions & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

<groupId>cc.shanruifeng</groupId>
<artifactId>hive-third-functions</artifactId>
<version>2.0.0</version>
<version>2.1.0</version>

<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
Expand All @@ -23,6 +23,7 @@
<dep.airlift.version>0.131</dep.airlift.version>
<dep.jackson.version>2.4.4</dep.jackson.version>
<dep.jmh.version>1.9.3</dep.jmh.version>
<dep.fastutil.version>6.5.9</dep.fastutil.version>
</properties>

<dependencyManagement>
Expand Down Expand Up @@ -80,6 +81,12 @@
<artifactId>jackson-databind</artifactId>
<version>${dep.jackson.version}</version>
</dependency>

<dependency>
<groupId>it.unimi.dsi</groupId>
<artifactId>fastutil</artifactId>
<version>${dep.fastutil.version}</version>
</dependency>
</dependencies>
</dependencyManagement>

Expand Down Expand Up @@ -130,6 +137,11 @@
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
</dependency>

<dependency>
<groupId>it.unimi.dsi</groupId>
<artifactId>fastutil</artifactId>
</dependency>
</dependencies>

<build>
Expand Down
118 changes: 118 additions & 0 deletions src/main/java/cc/shanruifeng/functions/array/UDFArrayConcat.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
package cc.shanruifeng.functions.array;

import java.util.ArrayList;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;
import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.serde2.objectinspector.*;

/**
* @author ruifeng.shan
* @date 2016-07-26
* @time 17:30
*/
public class UDFArrayConcat extends GenericUDF {
private static final int ARG_COUNT = 2; // Number of arguments to this UDF
private transient ListObjectInspector leftArrayOI;
private transient ListObjectInspector rightArrayOI;
private transient ObjectInspector leftArrayElementOI;
private transient ObjectInspector rightArrayElementOI;

private transient ArrayList<Object> result = new ArrayList<Object>();
private transient ObjectInspectorConverters.Converter converter;

public UDFArrayConcat() {
}

@Override
public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException {
// Check if two arguments were passed
if (arguments.length != ARG_COUNT) {
throw new UDFArgumentLengthException(
"The function array_concat(array, array) takes exactly " + ARG_COUNT + "arguments.");
}

// Check if two argument is of category LIST
for (int i = 0; i < 2; i++) {
if (!arguments[i].getCategory().equals(ObjectInspector.Category.LIST)) {
throw new UDFArgumentTypeException(i,
"\"" + org.apache.hadoop.hive.serde.serdeConstants.LIST_TYPE_NAME + "\" "
+ "expected at function array_concat, but "
+ "\"" + arguments[i].getTypeName() + "\" "
+ "is found");
}
}

leftArrayOI = (ListObjectInspector) arguments[0];
rightArrayOI = (ListObjectInspector) arguments[1];

leftArrayElementOI = leftArrayOI.getListElementObjectInspector();
rightArrayElementOI = rightArrayOI.getListElementObjectInspector();

// Check if two array are of same type
if (!ObjectInspectorUtils.compareTypes(leftArrayElementOI, rightArrayElementOI)) {
throw new UDFArgumentTypeException(1,
"\"" + leftArrayElementOI.getTypeName() + "\""
+ " expected at function array_concat, but "
+ "\"" + rightArrayElementOI.getTypeName() + "\""
+ " is found");
}

// Check if the comparison is supported for this type
if (!ObjectInspectorUtils.compareSupported(leftArrayElementOI)) {
throw new UDFArgumentException("The function array_concat"
+ " does not support comparison for "
+ "\"" + leftArrayElementOI.getTypeName() + "\""
+ " types");
}

converter = ObjectInspectorConverters.getConverter(leftArrayElementOI, leftArrayElementOI);

return ObjectInspectorFactory.getStandardListObjectInspector(leftArrayElementOI);
}

@Override
public Object evaluate(DeferredObject[] arguments) throws HiveException {
Object leftArray = arguments[0].get();
Object rightArray = arguments[1].get();

int leftArrayLength = leftArrayOI.getListLength(leftArray);
int rightArrayLength = rightArrayOI.getListLength(rightArray);

// Check if array is null or empty
if (leftArray == null || rightArray == null || leftArrayLength < 0 || rightArrayLength < 0) {
return null;
}

if (leftArrayLength == 0) {
return rightArray;
}

if (rightArrayLength == 0) {
return leftArray;
}

result.clear();

for (int i = 0; i < leftArrayLength; i++) {
Object arrayElement = leftArrayOI.getListElement(leftArray, i);
result.add(converter.convert(arrayElement));
}

for (int i = 0; i < rightArrayLength; i++) {
Object arrayElement = rightArrayOI.getListElement(rightArray, i);
result.add(converter.convert(arrayElement));
}

return result;
}

@Override
public String getDisplayString(String[] strings) {
assert (strings.length == ARG_COUNT);
return "array_concat(" + strings[0] + ", "
+ strings[1] + ")";
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumen
// Check if two arguments were passed
if (arguments.length != ARG_COUNT) {
throw new UDFArgumentLengthException(
"The function array_contains(json, json_path) takes exactly " + ARG_COUNT + "arguments.");
"The function array_contains(array, value) takes exactly " + ARG_COUNT + "arguments.");
}

// Check if ARRAY_IDX argument is of category LIST
Expand All @@ -55,7 +55,7 @@ public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumen
if (!ObjectInspectorUtils.compareTypes(arrayElementOI, valueOI)) {
throw new UDFArgumentTypeException(VALUE_IDX,
"\"" + arrayElementOI.getTypeName() + "\""
+ " expected at function ARRAY_CONTAINS, but "
+ " expected at function array_contains, but "
+ "\"" + valueOI.getTypeName() + "\""
+ " is found");
}
Expand Down Expand Up @@ -84,6 +84,10 @@ public Object evaluate(DeferredObject[] arguments) throws HiveException {
int arrayLength = arrayOI.getListLength(array);

// Check if array is null or empty or value is null
if (array == null) {
return result;
}

if (value == null || arrayLength <= 0) {
return result;
}
Expand All @@ -92,8 +96,7 @@ public Object evaluate(DeferredObject[] arguments) throws HiveException {
for (int i = 0; i < arrayLength; ++i) {
Object listElement = arrayOI.getListElement(array, i);
if (listElement != null) {
if (ObjectInspectorUtils.compare(value, valueOI,
listElement, arrayElementOI) == 0) {
if (ObjectInspectorUtils.compare(value, valueOI, listElement, arrayElementOI) == 0) {
result.set(true);
break;
}
Expand Down
Loading

0 comments on commit 56821a2

Please sign in to comment.