Skip to content

Commit 5be9e5b

Browse files
committed
first commit
0 parents  commit 5be9e5b

File tree

57 files changed

+6432
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

57 files changed

+6432
-0
lines changed

.gitignore

+13
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
*~
2+
/pkg/
3+
/tmp/
4+
*.gemspec
5+
.gradle/
6+
/classpath/
7+
build/
8+
.idea
9+
/.settings/
10+
/.metadata/
11+
.classpath
12+
.project
13+
/bin

LICENSE.txt

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
2+
MIT License
3+
4+
Permission is hereby granted, free of charge, to any person obtaining
5+
a copy of this software and associated documentation files (the
6+
"Software"), to deal in the Software without restriction, including
7+
without limitation the rights to use, copy, modify, merge, publish,
8+
distribute, sublicense, and/or sell copies of the Software, and to
9+
permit persons to whom the Software is furnished to do so, subject to
10+
the following conditions:
11+
12+
The above copyright notice and this permission notice shall be
13+
included in all copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
16+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
17+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
18+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
19+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
20+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
21+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

README.md

+272
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,272 @@
1+
# Apache POI Excel parser plugin for Embulk
2+
3+
Parses Microsoft Excel files(xls, xlsx) read by other file input plugins.
4+
This plugin uses Apache POI.
5+
6+
## Overview
7+
8+
* **Plugin type**: parser
9+
* **Guess supported**: no
10+
* Embulk 0.10 or later
11+
12+
13+
## Example
14+
15+
```yaml
16+
in:
17+
type: any file input plugin type
18+
parser:
19+
type: poi_excel
20+
sheets: ["DQ10-orb"]
21+
skip_header_lines: 1 # first row is header.
22+
columns:
23+
- {name: row, type: long, value: row_number}
24+
- {name: get_date, type: timestamp, cell_column: A, value: cell_value}
25+
- {name: orb_type, type: string}
26+
- {name: orb_name, type: string}
27+
- {name: orb_shape, type: long}
28+
- {name: drop_monster_name, type: string}
29+
```
30+
31+
if omit **value**, specified `cell_value`.
32+
if omit **cell_column** when **value** is `cell_value`, specified next column.
33+
34+
35+
## Configuration
36+
37+
* **sheets**: sheet name. can use wildcards `*`, `?`. (list of string, required)
38+
* **record_type**: record type. (`row`, `column` or `sheet`. default: `row`)
39+
* **skip_header_lines**: skip rows when **record_type**=`row` (skip columns when **record_type**=`column`). ignored when **record_type**=`sheet`. (integer, default: `0`)
40+
* **columns**: column definition. see below. (hash, required)
41+
* **sheet_options**: sheet option. see below. (hash, default: null)
42+
43+
### columns
44+
45+
* **name**: Embulk column name. (string, required)
46+
* **type**: Embulk column type. (string, required)
47+
* **value**: value type. see below. (string, default: `cell_value`)
48+
* **column_number**: same as **cell_column**.
49+
* **cell_column**: Excel column number. see below. (string, default: next column when **record_type**=`row`)
50+
* **cell_row**: Excel row number. see below. (integer, default: next row when **record_type**=`column`)
51+
* **cell_address**: Excel cell address such as `A1`, `Sheet1!B3`. (string, not required)
52+
* **numeric_format**: format of numeric(double) to string such as `%4.2f`. (default: Java's Double.toString())
53+
* **attribute_name**: use with value `cell_style`, `cell_font`, etc. see below. (list of string)
54+
* **on_cell_error**: processing method of Cell error. see below. (string, default: `constant`)
55+
* **formula_handling**: processing method of formula. see below. (`evaluate` or `cashed_value`. default: `evaluate`)
56+
* **on_evaluate_error**: processing method of evaluate formula error. see below. (string, default: `exception`)
57+
* **formula_replace**: replace formula before evaluate. see below.
58+
* **on_convert_error**: processing method of convert error. see below. (string, default: `exception`)
59+
* **search_merged_cell**: search merged cell when cell is BLANK. (`none`, `linear_search`, `tree_search` or `hash_search`, default: `hash_search`)
60+
61+
### value
62+
63+
* `cell_value`: value in cell.
64+
* `cell_formula`: formula in cell. (if cell is not formula, same `cell_value`.)
65+
* `cell_style`: all cell style attributes. returned json string. see **attribute_name**. (**type** required `string`)
66+
* `cell_font`: all cell font attributes. returned json string. see **attribute_name**. (**type** required `string`)
67+
* `cell_comment`: all cell comment attributes. returned json string. see **attribute_name**. (**type** required `string`)
68+
* `cell_type`: cell type. returned Cell.getCellType() of POI.
69+
* `cell_cached_type`: cell cached formula result type. returned Cell.getCachedFormulaResultType() of POI when CellType==FORMULA, otherwise same as `cell_type` (returned Cell.getCellType()).
70+
* `sheet_name`: sheet name.
71+
* `row_number`: row number(1 origin).
72+
* `column_number`: column number(1 origin).
73+
* `constant`: constant value.
74+
75+
* `constant.`*value*: specified value.
76+
* `constant`: null.
77+
78+
### cell_column
79+
80+
Basically used for **record_type**=`row`.
81+
82+
* `A`,`B`,`C`,...: column number of "A1 format".
83+
* *number*: column number (1 origin).
84+
* `+`: next column.
85+
* `+`*name*: next column of name.
86+
* `+`*number*: number next column.
87+
* `-`: previous column.
88+
* `-`*name*: previous column of name.
89+
* `-`*number*: number previous column.
90+
* `=`: same column.
91+
* `=`*name*: same column of name.
92+
93+
### cell_row
94+
95+
Basically used for **record_type**=`column`.
96+
97+
* *number*: row number (1 origin).
98+
99+
### attribute_name
100+
101+
When **value** is `cell_style`, `cell_font`, or `cell_comment`, by default, it retrieves all attributes and converts them into a JSON string.
102+
(Since it returns a JSON string, the **type** must be `string`.)
103+
104+
```yaml
105+
columns:
106+
- {name: foo, type: string, cell_column: A, value: cell_style}
107+
```
108+
109+
110+
By specifying the **attribute_name**, it retrieves only the specified attributes and converts them into a JSON string.
111+
112+
* **attribute_name**: attribute names. (list of string)
113+
114+
```yaml
115+
columns:
116+
- {name: foo, type: string, cell_column: A, value: cell_style, attribute_name: [border_top, border_bottom, border_left, border_right]}
117+
```
118+
119+
120+
Additionally, by appending a period after `cell_style` or `cell_font` and specifying the attribute name, you can retrieve only that attribute.
121+
In this case, it won't result in a JSON string, and you need to specify the type that matches the attribute's **type**.
122+
123+
```yaml
124+
columns:
125+
- {name: foo, type: long, value: cell_style.border}
126+
- {name: bar, type: long, value: cell_font.color}
127+
```
128+
129+
In `cell_style` and `cell_font`, if **cell_column** is omitted, it targets the same column as the previous one.
130+
(In `cell_value`, omitting `cell_column` causes it to move to the next column.)
131+
132+
133+
### on_cell_error
134+
135+
Processing method of Cell error (`#DIV/0!`, `#REF!`, etc).
136+
137+
```yaml
138+
columns:
139+
- {name: foo, type: string, cell_column: A, value: cell_value, on_cell_error: error_code}
140+
```
141+
142+
* `constant`: set null. (default)
143+
* `constant.`*value*: set specified value.
144+
* `error_code`: set error code.
145+
* `exception`: throw exception.
146+
147+
148+
### formula_handling
149+
150+
Processing method of formula.
151+
152+
```yaml
153+
columns:
154+
- {name: foo, type: string, cell_column: A, value: cell_value, formula_handling: cashed_value}
155+
```
156+
157+
* `evaluate`: evaluate formula. (default)
158+
* `cashed_value`: cashed value in cell.
159+
160+
161+
### on_evaluate_error
162+
163+
Processing method of evaluate formula error.
164+
165+
```yaml
166+
columns:
167+
- {name: foo, type: string, cell_column: A, value: cell_value, on_evaluate_error: constant}
168+
```
169+
170+
* `constant`: set null.
171+
* `constant.`*value*: set specified value.
172+
* `exception`: throw exception. (default)
173+
174+
175+
### formula_replace
176+
177+
Replace formula before evaluate.
178+
179+
```yaml
180+
columns:
181+
- {name: foo, type: string, cell_column: A, value: cell_value, formula_replace: [{regex: aaa, to: "A${row}"}, {regex: bbb, to: "B${row}"}]}
182+
```
183+
184+
`${row}` is replaced with the current row number.
185+
`${column}` is replaced with the current column string.
186+
187+
188+
### on_convert_error
189+
190+
Processing method of convert error. ex) Excel boolean to Embulk timestamp
191+
192+
```yaml
193+
columns:
194+
- {name: foo, type: timestamp, format: "%Y/%m/%d", cell_column: A, value: cell_value, on_convert_error: constant.9999/12/31}
195+
```
196+
197+
* `constant`: set null.
198+
* `constant.`*value*: set specified value.
199+
* `exception`: throw exception. (default)
200+
201+
202+
### sheet_options
203+
204+
Options of individual sheet.
205+
206+
```yaml
207+
parser:
208+
type: poi_excel
209+
sheets: [Sheet1, Sheet2]
210+
columns:
211+
- {name: date, type: timestamp, cell_column: A}
212+
- {name: foo, type: string}
213+
- {name: bar, type: long}
214+
sheet_options:
215+
Sheet1:
216+
skip_header_lines: 1
217+
columns:
218+
foo: {cell_column: B}
219+
bar: {cell_column: C}
220+
Sheet2:
221+
skip_header_lines: 0
222+
columns:
223+
foo: {cell_column: D}
224+
bar: {value: constant.0}
225+
```
226+
227+
**sheet_options** is map of sheet name.
228+
Map values are **skip_header_lines**, **columns**.
229+
230+
**columns** is map of column name.
231+
Map values are same **columns** in **parser** (excluding `name`, `type`).
232+
233+
234+
## Install
235+
236+
1. download pom
237+
```
238+
$ curl https://repo1.maven.org/maven2/io/github/hishidama/embulk/embulk-parser-excel-poi/0.2.0/embulk-parser-excel-poi-0.2.0.pom > embulk-parser-excel-poi-0.2.0.pom
239+
```
240+
241+
2. install dependencies
242+
```
243+
$ mvn install -f embulk-parser-excel-poi-0.2.0.pom
244+
```
245+
246+
3. download and install jar
247+
```
248+
$ export M2_REPO=$HOME/.m2/repository
249+
$ curl https://repo1.maven.org/maven2/io/github/hishidama/embulk/embulk-parser-excel-poi/0.2.0/embulk-parser-excel-poi-0.2.0.jar > $M2_REPO/io/github/hishidama/embulk/embulk-parser-excel-poi/0.2.0/embulk-parser-excel-poi-0.2.0.jar
250+
```
251+
252+
4. add setting to $HOME/.embulk/embulk.properties
253+
```
254+
plugins.parser.poi_excel=maven:io.github.hishidama.embulk:excel-poi:0.2.0
255+
```
256+
257+
258+
## Build
259+
260+
```
261+
$ ./gradlew test
262+
$ ./gradlew package
263+
```
264+
265+
### Build to local Maven repository
266+
267+
```
268+
./gradlew generatePomFileForMavenJavaPublication
269+
mvn install -f build/publications/mavenJava/pom-default.xml
270+
./gradlew publishToMavenLocal
271+
```
272+

0 commit comments

Comments
 (0)