Skip to content

Commit

Permalink
NIFI-13550 Added documentation about the ExcelReader Starting Row Str…
Browse files Browse the repository at this point in the history
…ategy

This closes apache#9082

Signed-off-by: David Handermann <[email protected]>
  • Loading branch information
dan-s1 authored and exceptionfactory committed Jul 15, 2024
1 parent 730b9c6 commit 1ff5ebd
Showing 1 changed file with 10 additions and 4 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@
The ExcelReader allows for interpreting input data as delimited Records. Each row in an Excel spreadsheet is a record
and each cell is considered a field. The reader allows for choosing which row to start from and which sheets
in a spreadsheet to ingest.
When using the "Use Starting Row" strategy, the field names will be assumed to be the column names from the configured
starting row. If there are any column(s) from the starting row which are blank, they are automatically assigned a field name
using the cell number prefixed with "column_".
When using the "Infer Schema" strategy, the field names will be assumed to be the
cell numbers of each column prefixed with "column_". Otherwise, the names of fields can be supplied
when specifying the schema by using the Schema Text or looking up the schema in a Schema Registry.
Expand Down Expand Up @@ -70,13 +73,16 @@ <h2>Schemas and Type Coercion</h2>
will be thrown.
</p>


<h2>Schema Inference</h2>
<h2>Use Starting Row and Schema Inference</h2>

<p>
While NiFi's Record API does require that each Record have a schema, it is often convenient to infer the schema based on the values in the data,
rather than having to manually create a schema. This is accomplished by selecting a value of "Infer Schema" for the "Schema Access Strategy" property.
When using this strategy, the Reader will determine the schema by first parsing all data in the FlowFile, keeping track of all fields that it has encountered
rather than having to manually create a schema. This is accomplished by selecting either value of "Use Starting Row" or "Infer Schema" for the
"Schema Access Strategy" property. When using the "Use Starting Row" strategy, the Reader will determine the schema by parsing the first ten rows
after the configured starting row of the data in the FlowFile all the while keeping track of all fields that it has encountered
and the type of each field. A schema is then formed that encompasses all encountered fields. A schema can even be inferred if there are blank lines
within those ten rows, but if they are all blank, then this strategy will fail to create a schema.
When using the "Infer Schema" strategy, the Reader will determine the schema by first parsing all data in the FlowFile, keeping track of all fields that it has encountered
and the type of each field. Once all data has been parsed, a schema is formed that encompasses all fields that have been encountered.
</p>

Expand Down

0 comments on commit 1ff5ebd

Please sign in to comment.