-
-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
4f52d28
commit b64550d
Showing
6 changed files
with
349 additions
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,121 @@ | ||
# Using the API | ||
|
||
## Installing | ||
|
||
Currently, it is only possible to use language bindings from another Bazel project, as the ODict JAR is not yet on Maven Central and Python's dependency on the shared ODict library makes it difficult to distribute through `pip`. Fortunately, setting up ODict in another Bazel project is easy. | ||
|
||
Just add the following to your `WORKSPACE` file: | ||
|
||
```python | ||
http_archive( | ||
name = "odict", | ||
sha256 = "b58fd3432a6f84865c67a16ef6718be12ecd6b9b32c12dfd917c0a899807062f", | ||
strip_prefix = "odict-1.4.5", | ||
url = "https://github.com/TheOpenDictionary/odict/archive/1.4.5.tar.gz", | ||
) | ||
|
||
load("@odict//bazel:odict_deps.bzl", "odict_deps") | ||
|
||
odict_deps() | ||
|
||
load("@odict//bazel:odict_extra_deps.bzl", "odict_extra_deps") | ||
|
||
odict_extra_deps() | ||
``` | ||
|
||
then require either `@odict//java` or `@odict//python` in your respective Bazel rules. | ||
|
||
If any API usage is unclear, you may be able to get a better idea of how to use the APIs by looking at ODict's unit tests. | ||
|
||
## Go | ||
|
||
ODict is built in Go, so naturally it supports a public API out-of-the-box: | ||
|
||
```go | ||
import ( | ||
odict "github.com/TheOpenDictionary/odict/go" | ||
) | ||
|
||
func main() { | ||
// Write a dictionary from a local ODXML file | ||
odict.CompileDictionary("mydict.xml") | ||
|
||
// Write an XML string to a binary | ||
odict.WriteDictionary("<dictionary></dictionary>", "mydict.odict") | ||
|
||
// Read a compiled dictionary into memory | ||
dict := odict.ReadDictionary("mydict.odict") | ||
|
||
odict.IndexDictionary( | ||
"mydict.odict", | ||
false // Set to "true" to force-index | ||
) | ||
|
||
// Search an indexed dictionary by ID | ||
results := odict.SearchDictionary( | ||
dict.id, | ||
"dog", | ||
false // Set to "true" if you need to match the given word exactly | ||
) | ||
} | ||
|
||
``` | ||
|
||
## Java | ||
|
||
While a [standalone Java client](https://github.com/TheOpenDictionary/odict-java) for ODict _used_ to exist, it has since been superseded by a Java binding that uses the ODict core's cgo dynamic library that lives in this repo. | ||
|
||
Fortunately, the new ODict Java interface is extremely easy to use and will always stay up-to-date with the latest upstream changes to the ODict format. | ||
|
||
```java | ||
// Import statement | ||
import org.odict.Dictionary; | ||
|
||
void main() { | ||
// Compile a dictionary | ||
Dictionary.compile("path/to/file"); | ||
|
||
// Write a new dictionary | ||
Dictionary.write("an XML string", "path/to/output.odict"); | ||
|
||
// Load a dictionary | ||
Dictionary dict = new Dictionary("path/to/dictionary.odict"); | ||
|
||
// Lookup an entry by word | ||
System.out.println(dict.lookup("giraffe")); | ||
|
||
// Index the dictionary | ||
dict.index(); | ||
|
||
// Perform a fuzzy-search | ||
System.out.println(dict.search("full text")); | ||
} | ||
``` | ||
|
||
## Python | ||
|
||
The Python interface for ODict is similar to that of Java and Go: | ||
|
||
```python | ||
# Import statement | ||
from python.odict import Dictionary | ||
|
||
def main(): | ||
# Compile a dictionary | ||
Dictionary.compile("path/to/file") | ||
|
||
# Write a new dictionary | ||
Dictionary.write("an XML string", "path/to/output.odict") | ||
|
||
# Load a dictionary | ||
dict = Dictionary("path/to/dictionary.odict") | ||
|
||
# Lookup an entry by word | ||
print(dict.lookup("giraffe")) | ||
|
||
# Index the dictionary | ||
dict.index() | ||
|
||
# Perform a fuzzy-search | ||
print(dict.search("full text")) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
# CLI Reference | ||
|
||
## Creating Dictionaries | ||
|
||
To create a dictionary, you'll first need to write a dictionary using the ODict [markup language](odxml.md) and save it as an ".xml" file. Once you're confident your file is in the correct format, compiling it to an ".odict" dictionary is as simple as running: | ||
|
||
``` | ||
$ odict c mydictionary.xml | ||
``` | ||
|
||
The output file will always be a corresponding ".odict" dictionary that appears in the same directory as the source file. | ||
|
||
## Searching Dictionaries | ||
|
||
There are two ways to search a compiled .odict file: via a case-insensitive entry lookup, which outputs a single entry, or via a full text fuzzy search, which will output an array of matching entries. | ||
|
||
Let's look at each respectively. | ||
|
||
### Entry Lookup | ||
|
||
Looking up entries is super-duper easy. Just run: | ||
|
||
``` | ||
$ odict l mydictionary.odict "word" | ||
``` | ||
|
||
and a full-bodied JSON object will print out if there is a match. | ||
|
||
### Fuzzy Search | ||
|
||
Fuzzy searching uses [Bleve](https://blevesearch.com/) under-the-hood, so an index of your dictionary is required before searching. Dictionary indexes are stored in a temporary directory that differs depending on your OS, so you'll need to re-index the library on any new device you decide to access the dictionary on. Indexing can take quite some time if you have a particularly large dictionary. | ||
|
||
There are two ways to index a dictionary. You can index it ahead of time by running: | ||
|
||
``` | ||
$ odict i mydictionary.odict | ||
``` | ||
|
||
or you can index it before you run your first search query by passing an `-i` flag to the search command: | ||
|
||
``` | ||
$ odict s -i mydictionary.odict "my query" | ||
``` | ||
|
||
If you omit this flag, ODict will automatically use the correct index for the provided file if one already exists. Each .odict file has a unique identifier baked into it, so once you index a dictionary once, ODict will always know where to find that index in the future. | ||
|
||
## Dumping Dictionaries | ||
|
||
Often times while developing an ODict application, it may be helpful to understand the underlying structure of the dictionary at hand without picking through code. As a result, the ODict CLI has a `dump` command which can be used to convert a compiled binary back into a rough estimation of its [original XML](ODXML.md). I say rough estimation because the library might add back certain XML attributes or ID fields that were not present in the original document used to create the file. | ||
|
||
Using `dump` is easy: | ||
|
||
``` | ||
$ odict d mydictionary.xml outputfile.xml | ||
``` | ||
|
||
## Merging Dictionaries | ||
|
||
The ODict CLI also has the ability to merge two compiled dictionaries and blend their definitions together. This feature uses the [mergo](https://github.com/imdario/mergo) to merge the underlying dictionary structs and is currently not as customizable as it should be. Right now ODict performs a full merge, so you may wind up with an entry with duplicate definitions if your two dictionaries contain similar definitions for the same word. | ||
|
||
However, you can always `dump` the merged file, edit it, then re-compile it. | ||
|
||
To merge two dictionaries, run: | ||
|
||
``` | ||
$ odict m mydictionary1.odict mydictionary2.odict output.odict | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
# ODXML | ||
|
||
ODict XML is the XML variant used to define and structure ODict dictionaries. All compiled ODict dictionaries originate from ODXML files, and can easily be reverted back to XML via the CLI [`dump` command](cli.md#dumping-dictionaries). | ||
|
||
An example dictionary might look like this: | ||
|
||
```xml | ||
<!-- Dictionary Root --> | ||
<dictionary name="My Dictionary"> | ||
<!-- Entry --> | ||
<entry term="Doggo"> | ||
<!-- Etymology --> | ||
<ety> | ||
<!-- Usage (typically determined by part-of-speech) --> | ||
<usage pos="n"> | ||
<!-- Definition --> | ||
<definition>A cute pupper</definition> | ||
<!-- Definition Group --> | ||
<group description="Slang for dog"> | ||
<definition>Common way of saying a dog is a cutie</definition> | ||
</group> | ||
</usage> | ||
</ety> | ||
</entry> | ||
</dictionary> | ||
``` | ||
|
||
Pretty easy to read, right? | ||
|
||
Now let's break this down. | ||
|
||
--- | ||
|
||
## `<definition>` | ||
|
||
Dictionary nodes occur at the base of all source files and will not compile without one. ODict looks for these nodes by default when compiling. | ||
|
||
### Attributes | ||
|
||
| Name | Description | Required? | | ||
| ---- | ------------------------------------- | --------- | | ||
| name | A descriptive name for the dictionary | :x: | | ||
|
||
### Children | ||
|
||
- [`<entry>`](#entry) | ||
|
||
--- | ||
|
||
## `<entry>` | ||
|
||
Entries are the primary entry point to the dictionary and represent **unique terms**. They are used as lookup keys internally by ODict, so it is important there are no duplicate entries. | ||
|
||
### Attributes | ||
|
||
| Name | Description | Required? | | ||
| ---- | ----------------------------- | ------------------ | | ||
| term | The word the entry represents | :white_check_mark: | | ||
|
||
### Children | ||
|
||
- [`<entry>`](#entry) |
Oops, something went wrong.