Skip to content

Commit

Permalink
Merge pull request #57 from philipperemy/453
Browse files Browse the repository at this point in the history
Upgrade to 4.5.3 the latest CoreNLP
  • Loading branch information
philipperemy committed Mar 20, 2023
2 parents 5fe87c4 + fd7343d commit a0f2c0e
Show file tree
Hide file tree
Showing 12 changed files with 48 additions and 353 deletions.
24 changes: 8 additions & 16 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
@@ -1,33 +1,25 @@
name: Stanford NLP Wrapper CI
name: Stanford NLP OpenIE CI

on: [push, pull_request]
on: [ push, pull_request ]

jobs:
build:

runs-on: ubuntu-latest
strategy:
max-parallel: 4
max-parallel: 1
matrix:
python-version: [3.6]

python-version: [ "3.8", "3.9", "3.10" ]
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
uses: actions/setup-python@v3
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
sudo apt-get install graphviz
pip install flake8 tox
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
flake8 . --count --max-complexity 10 --max-line-length 127 --statistics
pip install --upgrade pip
pip install tox
- name: Test with tox
run: |
pip --version
pip3 --version
tox
29 changes: 17 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,22 @@
# Python3 wrapper for Stanford OpenIE

[![Downloads](https://static.pepy.tech/badge/stanford-openie)](https://pepy.tech/project/stanford-openie)
[![Downloads](https://static.pepy.tech/badge/stanford-openie/month)](https://pepy.tech/project/stanford-openie)
![Stanford NLP Wrapper CI](https://github.com/philipperemy/Stanford-OpenIE-Python/workflows/Stanford%20NLP%20Wrapper%20CI/badge.svg)

Open information extraction (open IE) refers to the extraction of structured relation triples from plain text, such that the schema for these relations does not need to be specified in advance. For example, Barack Obama was born in Hawaii would create a triple `(Barack Obama; was born in; Hawaii)`, corresponding to the open domain relation "was born in". CoreNLP is a Java implementation of an open IE system as described in the paper:
*Supports the latest CoreNLP library 4.5.3 (2023-03-10).*

More information can be found here : http://nlp.stanford.edu/software/openie.html
Open information extraction (open IE) refers to the extraction of structured relation triples from plain text, such that
the schema for these relations does not need to be specified in advance. For example, Barack Obama was born in Hawaii
would create a triple `(Barack Obama; was born in; Hawaii)`, corresponding to the open domain relation "was born in".
CoreNLP is a Java implementation of an open IE system as described in the paper:

The OpenIE library is only available in english: https://stanfordnlp.github.io/CoreNLP/human-languages.html
More information can be found [here](http://nlp.stanford.edu/software/openie.html). The OpenIE library is only available
in [english](https://stanfordnlp.github.io/CoreNLP/human-languages.html).

## Installation

You need python3 and Java installed. Java is used by the CoreNLP library.
You need python3 and Java (JRE) installed. Java is used by the CoreNLP library.

```bash
pip install stanford_openie
Expand Down Expand Up @@ -46,8 +53,9 @@ with StanfordOpenIE(properties=properties) as client:
print('|-', triple)
print('[...]')
```

*Expected output*

*Expected output*

```
|- {'subject': 'Barack Obama', 'relation': 'was', 'object': 'born'}
|- {'subject': 'Barack Obama', 'relation': 'was born in', 'object': 'Hawaii'}
Expand All @@ -59,18 +67,15 @@ with StanfordOpenIE(properties=properties) as client:
|- {'subject': 'Menapolus', 'relation': 'son of', 'object': 'Ithagenes'}
|- {'subject': 'Menapolus', 'relation': 'was Among', 'object': 'immigrants'}
```

It will generate a [GraphViz DOT](http://www.graphviz.org/) in `graph.png`:

<div align="center">
<img src="img/out.png"><br><br>
</div>

*Note*: Make sure GraphViz is installed beforehand. Try to run the `dot` command to see if this is the case. If not, run `sudo apt-get install graphviz` if you're running on Ubuntu.

## V1

Still available here [v1](v1).
*Note*: Make sure GraphViz is installed beforehand. Try to run the `dot` command to see if this is the case. If not,
run `sudo apt-get install graphviz` if you're running on Ubuntu.

## References

Expand Down
22 changes: 15 additions & 7 deletions openie/openie.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
from pathlib import Path
from subprocess import Popen
from sys import stderr
from typing import Optional
from zipfile import ZipFile

import wget
Expand All @@ -12,12 +13,13 @@ class StanfordOpenIE:

def __init__(
self,
core_nlp_version: str = '4.1.0',
install_dir_path: str = None,
*args, **kwargs
core_nlp_version: str = '4.5.3', # https://stanfordnlp.github.io/CoreNLP/history.html
install_dir_path: Optional[str] = None,
*args,
**kwargs
):
if install_dir_path is None:
default_path = Path('~/.stanfordnlp_resources/').expanduser()
default_path = Path('~/.stanfordnlp_resources').expanduser()
self.install_dir = os.environ.get("OPENIE_INSTALL_PATH", default_path)
else:
self.install_dir = Path(install_dir_path)
Expand Down Expand Up @@ -52,8 +54,10 @@ def annotate(self, text: str, properties_key: str = None, properties: dict = Non
:return: Depending on simple_format: full or simpler format of triples <subject, relation, object>.
"""
# https://stanfordnlp.github.io/CoreNLP/openie.html
core_nlp_output = self.client.annotate(text=text, annotators=['openie'], output_format='json',
properties_key=properties_key, properties=properties)
core_nlp_output = self.client.annotate(
text=text, annotators=['openie'], output_format='json',
properties_key=properties_key, properties=properties
)
if simple_format:
triples = []
for sentence in core_nlp_output['sentences']:
Expand All @@ -67,7 +71,11 @@ def annotate(self, text: str, properties_key: str = None, properties: dict = Non
else:
return core_nlp_output

def generate_graphviz_graph(self, text: str, png_filename: str = './out/graph.png'):
def generate_graphviz_graph(
self,
text: str,
png_filename: str = './out/graph.png'
):
"""
:param (str | unicode) text: raw text for the CoreNLPServer to parse
:param (list | string) png_filename: list of annotators to use
Expand Down
6 changes: 4 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

setup(
name='stanford-openie',
version='1.3.0',
version='1.3.1',
description='Minimalist wrapper around Stanford OpenIE',
author='Philippe Remy',
license='MIT',
Expand All @@ -12,6 +12,8 @@
install_requires=[
'wget',
'stanfordnlp',
'six'
'six',
'pydot',
'protobuf<=3.20'
]
)
8 changes: 4 additions & 4 deletions tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@
envlist = py3

[testenv]
# Because the download URL is too unstable. Sometimes their website is down...
commands = bash -ec "mkdir -p ~/.stanfordnlp_resources/"
bash -ec "cat lib/* > ~/.stanfordnlp_resources/stanford-corenlp-full-2018-02-27.zip"
deps = flake8
pylint
commands = pylint --disable=R,C,W,E1136 openie
flake8 openie --count --max-line-length 127 --select=E9,F63,F7,F82 --show-source --statistics
python main.py
passenv = *
install_command = pip install {packages}
whitelist_externals = *
75 changes: 0 additions & 75 deletions v1/README.md

This file was deleted.

Loading

0 comments on commit a0f2c0e

Please sign in to comment.