docx4js

a docx/pptx/xlsx parser, here's a demo.

API

import docx4js from "docx4js"

docx4js.load("~/test.docx").then(docx=>{
	//you can render docx to anything (react elements, tree, dom, and etc) by giving a function
	docx.render(function createElement(type,props,children){
		return {type,props,children}
	})

	//or use a event handler for more flexible control
	const ModelHandler=require("docx4js/lib/openxml/docx/model-handler").default
	class MyModelhandler extends ModelHandler{
		onp({type,children,node,...}, node, officeDocument){

		}
	}

	const handler=new MyModelhandler()
	handler.on("*",function({type,children,node,...}, node, officeDocument){
		console.log("found model:"+type)
	})
	handler.on("r",function({type,children,node,...}, node, officeDocument){
		console.log("found a run")
	})

	docx.parse(handler)

	//you can change content on docx.officeDocument.content, and then save
	docx.officeDocument.content("w\\:t").text("hello")
	docx.save("~/changed.docx")
})

//you can create a blank docx
docx4js.create().then(docx=>{
	//do anything you want
	docx.save("~/new.docx")
})

*please note 3.x is totally different from 2.x, everything is breaking change.

*please note 2.x is totally different from 1.x, everything is breaking change.

docx4js is a javascript docx parser.

The original goal is to support docx, pptx, and xlsx, but it's a huge work, so I limited to docx so far.

In sake of performance, the implementation doesn't keep parsed structure. It only traverse docx content, and identify docx model, then call passed visitors one by one. No matter content, and styles, are all with the same stratigy. This method makes it do more with less memory.

There are lots of information in docx, but the client application usually only cares about part of them, such as content only, structure only, some styles, or some attributes. The client application is able to handle special word model by TYPE.

Attributes of word model usually affects styles, but I don't understand all of them, so I'm lazy just to iterate every attribute, and some unknown child elements, so client application is possible to catch all information you know.

pptx supported since 3.1.30

Features

environment

nodejs
browser
- IE9+
- firefox
- chrome

identified models

section
header
footer
paragraph
inline
numbering
heading
shape
- group
- line
- roundRect
- rect
image
hyperlink
table
- row
- cell
control.
- checkbox
  - checked
- comboBox
  - value
  - options: {displayText, value}
- date
  - value
  - format
  - locale
- dropDownList
  - value
  - options: {displayText, value}
- gallery
- picture
- richtext
- text
  - value
text
- softHyphen
- noBreakHyphen
- tab
- symbol
field
- date
- hyperlink
- ref
OLE: {type:"object", embed, prog, data}
diagram
equation
bookmark
range
br
chart

style

document default style
named style
style inheritance
paragraph
character
numbering
section
table

ChangeLog

* ~~identify OLE object~~

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 280 Commits
.github/workflows		.github/workflows
__tests__		__tests__
dist		dist
docs		docs
lib		lib
src		src
templates		templates
.gitignore		.gitignore
.npmignore		.npmignore
.travis.yml		.travis.yml
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

docx4js

API

pptx supported since 3.1.30

Features

environment

ChangeLog

License

About

Releases

Packages

Used by 342

Contributors 3

Languages

lalalic/docx4js

Folders and files

Latest commit

History

Repository files navigation

docx4js

API

pptx supported since 3.1.30

Features

environment

ChangeLog

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Used by 342

Contributors 3

Languages

Packages