-
Notifications
You must be signed in to change notification settings - Fork 21
users_guide_eng
IlyaKozlov edited this page Dec 8, 2020
·
3 revisions
Dedoc reads documents in different formats (doc, docx, odt, csv and others) and extracts:
- Metadata
- Text + texts metadata (bold, italic, font size)
- Logical structure (optional).
Dedoc prefers to work through rest api but can works through python.
You can run dedoc in docker container or not in docker.
- Ensure that docker is installed docker
- Clone project
git clone https://gitlab.at.ispras.ru/Ilya/dedoc_project.git
cd dedoc_project/
- Build container and run.
docker build . -t dedoc_container
docker run -p 1231:1231 --rm dedoc_container:latest python3.5 /dedoc/main.py
Now you can check if dedoc running, just open localhost:1231 and ensure you can read online documentation. If dedoc is running you can read about output format here.
Dedoc was tested on ubuntu 18 and python3.5. You can install dedoc on your system similar to describe actions in the Dockerfile.
We have launched dedoc and want to make sure that it works correctly:
- go to http://localhost:1231
- click Supported Formats (bottom of the page)
- click any "result in html"
- Expected result should look like this:
Example with python + requests.
import json
import os
import requests
"""specify file name and directory"""
directory_path = "..."
file_name = "..."
file_path = os.path.join(directory_path, file_name)
with open(file_path, 'rb') as file:
files = {'file': (file_name, file)}
"""put additional parameter to data dict, you can look to the online docs for additional parameters"""
data = {}
"""send request and get response"""
response = requests.post("http://localhost:1231/upload", files=files, data=data)
"""Check if everything is OK"""
if response.status_code != 200:
raise Exception("Fail to parse file {}".format(response.status_code))
"""parse result from json """
result = json.loads(response.content.decode())