Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix protobuf serde errors #425

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Fix protobuf serde errors #425

wants to merge 2 commits into from

Conversation

angushe
Copy link
Contributor

@angushe angushe commented Dec 4, 2014

Hi,

This is a pull request trying to fix the same problem described in pull request #400, and the fix has been tested successfully on Hive 0.12/0.13 and Protobuf 2.4.1/2.5.0.

Any comments?

Thanks
Angus

@miltonwulei
Copy link

I used this patch in cdh5.1.2 with Hive 0.12.0-cdh5.1.0 confirmed that he bug in Issue#400 was resolved! this patch look good to me.

@cooper6581
Copy link

I used this with Hive 0.13.1-cdh5.3.1 and Protobuf 2.5.0 in order to resolve Issue #400. Any chance of this getting merged soon? Thanks for this patch angushe!

@harelglik
Copy link

Solves #400 for me on Hive 0.13.1 and Protobuf 2.5.0 on AWS AMI 3.3.1.
Great fix, will it get merged soon?

@alastrange
Copy link

Used to resolve issue #400 with Protobuf 2.5.0, Hive 0.14.0 & HDP 2.2. Thanks


@Override
public int hashCode() {
return descriptor.hashCode();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this mix in structFields.hashCode() too? This is still correct, but will have more hash collisions in the case of comparing things with the same descriptor but different structFields (idk if that's common or not)

@isnotinvain
Copy link
Contributor

@rangadi I don't know too much about protobuf dynamic messages, would you mind giving this a look too?

There's a lot of casting + isntanceof going on in here where there previously wasn't -- is that part of the direct fix for the issue, or are these just the only way to use DynamicMessage?

@rangadi
Copy link
Contributor

rangadi commented May 20, 2015

We haven't used Hive serde's actively. I will take look anyway.

Message.Builder builder;
if (data instanceof Message.Builder) {
builder = (Message.Builder)data;
} else if (data instanceof Message) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this fix uses only the builder, do you ever expect Message?

@rangadi
Copy link
Contributor

rangadi commented May 20, 2015

The fix looks good. I am not sure about Alex's comment on hashCode(). I just have one comment: if we ever expect Message object.

@joshk0
Copy link

joshk0 commented Feb 14, 2017

Let's assume this patch will never be merged. In this case, I would like to optimize this pull request's SEO.

I was seeing issues like this when running Hive queries on Protobuf external tables requiring a MapReduce job. These issues would not present on queries like:

SELECT * FROM protobuf_external_table LIMIT 1;

But when running a query like this:

SELECT DISTINCT(field.subfield) FROM protobuf_external_table;

I would get a traceback:

17/02/14 10:09:37 [LocalJobRunner Map Task Executor #0]: ERROR mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable <LOTS OF BYTES>
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: FieldDescriptor does not match message type.
	at com.google.protobuf.GeneratedMessage$FieldAccessorTable.getField(GeneratedMessage.java:1536)
	at com.google.protobuf.GeneratedMessage$FieldAccessorTable.access$100(GeneratedMessage.java:1449)
	at com.google.protobuf.GeneratedMessage$Builder.setField(GeneratedMessage.java:366)
	at com.google.protobuf.GeneratedMessage$Builder.setField(GeneratedMessage.java:228)
	at io.arbor.elephantbird.ProtobufStructObjectInspector.setStructFieldData(ProtobufStructObjectInspector.java:148)
	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:407)
	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:129)
	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:92)
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:488)
	... 10 more

Applying both revisions of this PR fixed the issue conclusively.

@sugix
Copy link

sugix commented Mar 3, 2017

Really thanks for this and please merge it asap. I took the patch and it works like a charm now.

@isnotinvain
Copy link
Contributor

Looks like way back when we had some questions on this PR that didn't get answered. Anyone interested in taking a look? I think we can merge this if someone wants to verify it's still working + address the review feedback?

@agammishra
Copy link

I am using elephant-bird-hive-4.15.jar but still I am getting the same issue why???

@CLAassistant
Copy link

CLAassistant commented Jul 18, 2019

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.