GitHub - chchench/textract: Golang module for extracting text from XML-based MS Office documents

Still very early in the development of this code. Will need to find the time to work on it :)

Description

A simple library to extract text content from popular document types, such as Word, PowerPoint, Excel, PDF, etc.

Started developing this module because I need it for another application I've been building and am looking for something that is royalty-free and high performance. I intend to add support for additional document types over time.

This initial version only supports Word/docx documents.

Installation

After you have installed go, run this command to install the textract package:

go get github.com/chchench/textract

Roadmap

1.0.0 - Initial release supports text extraction from (post 2007) Word/docx files

License

Making the source code to this app available under

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
testdats		testdats
tool-dumparchive		tool-dumparchive
tool-extract		tool-extract
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
common.go		common.go
docx.go		docx.go
docx_test.go		docx_test.go
go.mod		go.mod
go.sum		go.sum
textract.go		textract.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Installation

Roadmap

License

About

Releases 1

Packages

Contributors 2

Languages

License

chchench/textract

Folders and files

Latest commit

History

Repository files navigation

Description

Installation

Roadmap

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages