@@ -53,57 +53,33 @@ The algorithm is defined with the `operator` (`-op`) parameter. By default, it i
53
53
## Usage
54
54
You can call the functions directly with given args as
55
55
```
56
- usage: kgtk graph-embeddings [-h] [-i INPUT_FILE_PATH ] [-o OUTPUT_FILE_PATH ]
57
- [-l] [-T] [- ot] [-r True|False] [-d] [-s]
56
+ usage: kgtk graph-embeddings [-h] [-i INPUT_FILE ] [-o OUTPUT_FILE] [-l] [-T ]
57
+ [-ot] [-r True|False] [-d] [-s]
58
58
[-c dot|cos|l2|squared_l2]
59
- [-op linear|diagonal|complex_diagonal|translation ]
60
- [-e] [- b True|False] [-w] [-bs]
59
+ [-op RESCAL|DistMult|ComplEx|TransE] [-e ]
60
+ [-b True|False] [-w] [-bs]
61
61
[-lf ranking|logistic|softmax] [-lr] [-ef]
62
62
[-dr True|False] [-ge True|False]
63
+ [--no-output-header [True|False]]
63
64
[-v [optional True|False]]
64
- [--column-separator COLUMN_SEPARATOR]
65
- [--input-format INPUT_FORMAT]
66
- [--compression-type COMPRESSION_TYPE]
67
- [--error-limit ERROR_LIMIT]
68
- [--use-mgzip [optional True|False]]
69
- [--mgzip-threads MGZIP_THREADS]
70
- [--gzip-in-parallel [optional True|False]]
71
- [--gzip-queue-size GZIP_QUEUE_SIZE]
72
- [--mode {NONE,EDGE,NODE,AUTO}]
73
- [--force-column-names FORCE_COLUMN_NAMES [FORCE_COLUMN_NAMES ...]]
74
- [--header-error-action {PASS,REPORT,EXCLUDE,COMPLAIN,ERROR,EXIT}]
75
- [--skip-header-record [optional True|False]]
76
- [--unsafe-column-name-action {PASS,REPORT,EXCLUDE,COMPLAIN,ERROR,EXIT}]
77
- [--initial-skip-count INITIAL_SKIP_COUNT]
78
- [--every-nth-record EVERY_NTH_RECORD]
79
- [--record-limit RECORD_LIMIT]
80
- [--tail-count TAIL_COUNT]
81
- [--repair-and-validate-lines [optional True|False]]
82
- [--repair-and-validate-values [optional True|False]]
83
- [--blank-required-field-line-action {PASS,REPORT,EXCLUDE,COMPLAIN,ERROR,EXIT}]
84
- [--comment-line-action {PASS,REPORT,EXCLUDE,COMPLAIN,ERROR,EXIT}]
85
- [--empty-line-action {PASS,REPORT,EXCLUDE,COMPLAIN,ERROR,EXIT}]
86
- [--fill-short-lines [optional True|False]]
87
- [--invalid-value-action {PASS,REPORT,EXCLUDE,COMPLAIN,ERROR,EXIT}]
88
- [--long-line-action {PASS,REPORT,EXCLUDE,COMPLAIN,ERROR,EXIT}]
89
- [--prohibited-list-action {PASS,REPORT,EXCLUDE,COMPLAIN,ERROR,EXIT}]
90
- [--short-line-action {PASS,REPORT,EXCLUDE,COMPLAIN,ERROR,EXIT}]
91
- [--truncate-long-lines [TRUNCATE_LONG_LINES]]
92
- [--whitespace-line-action {PASS,REPORT,EXCLUDE,COMPLAIN,ERROR,EXIT}]
65
+
66
+ Generate graph embedding in kgtk tsv format, here we use PytorchBigGraph as low-level implementation
93
67
94
68
optional arguments:
95
69
-h, --help show this help message and exit
96
- -i INPUT_FILE_PATH, --input-file INPUT_FILE_PATH
97
- The KGTK input file. (default=-)
98
- -o OUTPUT_FILE_PATH, --output-file OUTPUT_FILE_PATH
99
- The KGTK output file. (default=-).
70
+ -i INPUT_FILE, --input-file INPUT_FILE
71
+ The KGTK input file. (May be omitted or '-' for
72
+ stdin.)
73
+ -o OUTPUT_FILE, --output-file OUTPUT_FILE
74
+ The KGTK output file. (May be omitted or '-' for
75
+ stdout.)
100
76
-l , --log Setting the log path [Default: None]
101
77
-T , --temporary_directory
102
78
Sepecify the directory location to store temporary
103
79
file
104
80
-ot , --output_format
105
- Outputformat for embeddings [Default: w2v] Choice: kgtk
106
- | w2v | glove
81
+ Outputformat for embeddings [Default: w2v] Choice:
82
+ kgtk | w2v | glove
107
83
-r True|False, --retain_temporary_data True|False
108
84
When opearte graph, some tempory files will be
109
85
generated, set True to retain these files
@@ -122,8 +98,9 @@ optional arguments:
122
98
-op RESCAL|DistMult|ComplEx|TransE, --operator RESCAL|DistMult|ComplEx|TransE
123
99
The transformation to apply to the embedding of one of
124
100
the sides of the edge (typically the right-hand one)
125
- before comparing it with the other one. It reflects
126
- which model that embedding uses. [Default:ComplEx]
101
+ before comparing it with the other one. It
102
+ reflectswhich model that embedding uses.
103
+ [Default:ComplEx]
127
104
-e , --num_epochs The number of times the training loop iterates over
128
105
all the edges.[Default:100]
129
106
-b True|False, --bias True|False
@@ -145,111 +122,20 @@ optional arguments:
145
122
The fraction of edges withheld from training and used
146
123
to track evaluation metrics during training.
147
124
[Defalut:0.0 training all edges ]
148
- -dr True|False, --dynamic_relaitons True|False
125
+ -dr True|False, --dynamic_relations True|False
149
126
Whether use dynamic relations (when graphs with a
150
127
large number of relations) [Default: True]
151
128
-ge True|False, --global_emb True|False
152
129
Whether use global embedding, if enabled, add to each
153
130
embedding a vector that is common to all the entities
154
131
of a certain type. This vector is learned during
155
132
training.[Default: False]
133
+ --no-output-header [True|False]
134
+ When true, do not write a header to the output file
135
+ (default=False).
156
136
157
137
-v [optional True|False], --verbose [optional True|False]
158
138
Print additional progress messages (default=False).
159
-
160
- File options:
161
- Options affecting processing.
162
-
163
- --column-separator COLUMN_SEPARATOR
164
- Column separator (default=<TAB>).
165
- --input-format INPUT_FORMAT
166
- Specify the input format (default=None).
167
- --compression-type COMPRESSION_TYPE
168
- Specify the compression type (default=None).
169
- --error-limit ERROR_LIMIT
170
- The maximum number of errors to report before failing
171
- (default=1000)
172
- --use-mgzip [optional True|False]
173
- Execute multithreaded gzip. (default=False).
174
- --mgzip-threads MGZIP_THREADS
175
- Multithreaded gzip thread count. (default=3).
176
- --gzip-in-parallel [optional True|False]
177
- Execute gzip in parallel. (default=False).
178
- --gzip-queue-size GZIP_QUEUE_SIZE
179
- Queue size for parallel gzip. (default=1000).
180
- --mode {NONE,EDGE,NODE,AUTO}
181
- Determine the KGTK file mode
182
- (default=KgtkReaderMode.AUTO).
183
-
184
- Header parsing:
185
- Options affecting header parsing.
186
-
187
- --force-column-names FORCE_COLUMN_NAMES [FORCE_COLUMN_NAMES ...]
188
- Force the column names (default=None).
189
- --header-error-action {PASS,REPORT,EXCLUDE,COMPLAIN,ERROR,EXIT}
190
- The action to take when a header error is detected.
191
- Only ERROR or EXIT are supported
192
- (default=ValidationAction.EXIT).
193
- --skip-header-record [optional True|False]
194
- Skip the first record when forcing column names
195
- (default=False).
196
- --unsafe-column-name-action {PASS,REPORT,EXCLUDE,COMPLAIN,ERROR,EXIT}
197
- The action to take when a column name is unsafe
198
- (default=ValidationAction.REPORT).
199
-
200
- Pre-validation sampling:
201
- Options affecting pre-validation data line sampling.
202
-
203
- --initial-skip-count INITIAL_SKIP_COUNT
204
- The number of data records to skip initially
205
- (default=do not skip).
206
- --every-nth-record EVERY_NTH_RECORD
207
- Pass every nth record (default=pass all records).
208
- --record-limit RECORD_LIMIT
209
- Limit the number of records read (default=no limit).
210
- --tail-count TAIL_COUNT
211
- Pass this number of records (default=no tail
212
- processing).
213
-
214
- Line parsing:
215
- Options affecting data line parsing.
216
-
217
- --repair-and-validate-lines [optional True|False]
218
- Repair and validate lines (default=False).
219
- --repair-and-validate-values [optional True|False]
220
- Repair and validate values (default=False).
221
- --blank-required-field-line-action {PASS,REPORT,EXCLUDE,COMPLAIN,ERROR,EXIT}
222
- The action to take when a line with a blank node1,
223
- node2, or id field (per mode) is detected
224
- (default=ValidationAction.EXCLUDE).
225
- --comment-line-action {PASS,REPORT,EXCLUDE,COMPLAIN,ERROR,EXIT}
226
- The action to take when a comment line is detected
227
- (default=ValidationAction.EXCLUDE).
228
- --empty-line-action {PASS,REPORT,EXCLUDE,COMPLAIN,ERROR,EXIT}
229
- The action to take when an empty line is detected
230
- (default=ValidationAction.EXCLUDE).
231
- --fill-short-lines [optional True|False]
232
- Fill missing trailing columns in short lines with
233
- empty values (default=False).
234
- --invalid-value-action {PASS,REPORT,EXCLUDE,COMPLAIN,ERROR,EXIT}
235
- The action to take when a data cell value is invalid
236
- (default=ValidationAction.COMPLAIN).
237
- --long-line-action {PASS,REPORT,EXCLUDE,COMPLAIN,ERROR,EXIT}
238
- The action to take when a long line is detected
239
- (default=ValidationAction.COMPLAIN).
240
- --prohibited-list-action {PASS,REPORT,EXCLUDE,COMPLAIN,ERROR,EXIT}
241
- The action to take when a data cell contains a
242
- prohibited list (default=ValidationAction.COMPLAIN).
243
- --short-line-action {PASS,REPORT,EXCLUDE,COMPLAIN,ERROR,EXIT}
244
- The action to take when a short line is detected
245
- (default=ValidationAction.COMPLAIN).
246
- --truncate-long-lines [TRUNCATE_LONG_LINES]
247
- Remove excess trailing columns in long lines
248
- (default=False).
249
- --whitespace-line-action {PASS,REPORT,EXCLUDE,COMPLAIN,ERROR,EXIT}
250
- The action to take when a whitespace line is detected
251
- (default=ValidationAction.EXCLUDE).
252
-
253
139
```
254
140
## Examples
255
141
@@ -274,7 +160,7 @@ The output_file.tsv may look like:
274
160
### Example 2
275
161
Running with more specific parameters (TransE algorithm and 200-dimensional vectors):
276
162
```
277
- kgtk graph-embeddings
163
+ kgtk graph-embeddings \
278
164
--input-file input_file.tsv \
279
165
--output-file output_file.tsv \
280
166
--dimension 200 \
@@ -296,7 +182,7 @@ The `output_file.tsv` may look like:
296
182
### Example 3
297
183
Using glove format to generate graph embeddings
298
184
```
299
- kgtk graph-embeddings
185
+ kgtk graph-embeddings \
300
186
--input-file input_file.tsv \
301
187
--output-file output_file.tsv \
302
188
--output_format glove
@@ -313,7 +199,7 @@ The `output_file.tsv` may look like:
313
199
### Example 4
314
200
Using kgtk format to generate graph embeddings
315
201
```
316
- kgtk graph-embeddings
202
+ kgtk graph-embeddings \
317
203
--input-file input_file.tsv \
318
204
--output-file output_file.tsv \
319
205
--output_format kgtk --no-output-headers
0 commit comments