Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schemas with repeating values at different levels in the message graph are incompatible with Avro #73

Open
salsferrazza opened this issue Dec 11, 2022 · 1 comment
Labels
bug Something isn't working

Comments

@salsferrazza
Copy link
Collaborator

When running with the following set of options:

wget -q -O - ftp://ftp.cmegroup.com/SBEFix/Production/secdef.dat.gz | gunzip -  | txcode \ 
 --schema_file  ~/src/datacast/transcoder/test/FIX50SP2.CME.xml  \
 --factory fix  \
 --source_file_format_type line_delimited  \
 --message_type_inclusions=SecurityDefinition \
 --fix_header_tags 8,9,35,1128,49,56,34,52,10 \
 --destination_project_id $(gcloud config get-value project)  \ 
 --output_type pubsub \
 --output_encoding json \
  --lazy_create_resources 

The following error is thrown:

google.api_core.exceptions.InvalidArgument: 400 AVRO schema definition is not valid: sbeMessage.NoLotTypeRules exists twice in schema. [detail: "[ORIGINAL ERROR] generic::invalid_argument: AVRO schema definition is not valid: sbeMessage.NoLotTypeRules exists twice in schema. [google.rpc.error_details_ext] { message: \"AVRO schema definition is not valid: sbeMessage.NoLotTypeRules exists twice in schema.\" }"

This is likely because of the complex schema defined in FIX50SP2.CME.xml, where the same field name may be present at different levels of the entity hierarchy. A graph such as Object.Property1 and Object.Property2.Property1 appears to be incompatible with Avro, but is commonly encountered within legitimate FIX schema definitions.

It's notable that BigQuery output types do not exhibit this behavior, but the fastavro output type to local POSIX files does as well:

fastavro._schema_common.SchemaParseException: redefined named type: sbeMessage.NoLotTypeRules
@salsferrazza salsferrazza added the bug Something isn't working label Dec 11, 2022
@salsferrazza
Copy link
Collaborator Author

https://stackoverflow.com/a/48131460

It's not well documented, but Avro allows you to reference previously defined names by using the full namespace for the name that is being referenced. In your case, the following code would result in only one class being generated, referenced by each array. It also DRYs up the schema nicely.

For each level of nested record, a namespace can be applied to distinguish that tier's fields from local homonyms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant