Skip to content

Commit

Permalink
Add serialization via SLAXML:xml()
Browse files Browse the repository at this point in the history
* Also fixes #10
* Also fixes #11
  • Loading branch information
Phrogz committed Oct 23, 2018
1 parent 8bfc922 commit 8a3e0c9
Show file tree
Hide file tree
Showing 9 changed files with 544 additions and 239 deletions.
2 changes: 1 addition & 1 deletion LICENSE.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Copyright (c) 2013 Gavin Kistner
Copyright (c) 2013-2018 Gavin Kistner

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

Expand Down
115 changes: 111 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ parser = SLAXML:parser{
startElement = function(name,nsURI,nsPrefix) end, -- When "<foo" or <x:foo is seen
attribute = function(name,value,nsURI,nsPrefix) end, -- attribute found on current element
closeElement = function(name,nsURI) end, -- When "</foo>" or </x:foo> or "/>" is seen
text = function(text) end, -- text and CDATA nodes
text = function(text,cdata) end, -- text and CDATA nodes (cdata is true for cdata nodes)
comment = function(content) end, -- comments
pi = function(target,content) end, -- processing instructions e.g. "<?yes mon?>"
}
Expand Down Expand Up @@ -76,9 +76,10 @@ The returned table is a 'document' composed of tables for elements, attributes,
* <strong>`someEl.type`</strong> : the string `"element"`
* <strong>`someEl.name`</strong> : the string name of the element (without any namespace prefix)
* <strong>`someEl.nsURI`</strong> : the namespace URI for this element; `nil` if no namespace is applied
* <strong>`someAttr.nsPrefix`</strong> : the namespace prefix string; `nil` if no prefix is applied
* <strong>`someEl.attr`</strong> : a table of attributes, indexed by name and index
* `local value = someEl.attr['attribute-name']` : any namespace prefix of the attribute is not part of the name
* `local someAttr = someEl.attr[1]` : an single attribute table (see below); useful for iterating all attributes of an element, or for disambiguating attributes with the same name in different namespaces
* `local someAttr = someEl.attr[1]` : a single attribute table (see below); useful for iterating all attributes of an element, or for disambiguating attributes with the same name in different namespaces
* <strong>`someEl.kids`</strong> : an array table of child elements, text nodes, comment nodes, and processing instructions
* <strong>`someEl.el`</strong> : an array table of child elements only
* <strong>`someEl.parent`</strong> : reference to the parent element or document table
Expand All @@ -87,10 +88,12 @@ The returned table is a 'document' composed of tables for elements, attributes,
* <strong>`someAttr.name`</strong> : the name of the attribute (without any namespace prefix)
* <strong>`someAttr.value`</strong> : the string value of the attribute (with XML and numeric entities unescaped)
* <strong>`someAttr.nsURI`</strong> : the namespace URI for the attribute; `nil` if no namespace is applied
* <strong>`someAttr.nsPrefix`</strong> : the namespace prefix string; `nil` if no prefix is applied
* <strong>`someAttr.parent`</strong> : reference to the owning element table
* **Text** - for both CDATA and normal text nodes
* <strong>`someText.type`</strong> : the string `"text"`
* <strong>`someText.name`</strong> : the string `"#text"`
* <strong>`someText.cdata`</strong> : `true` if the text was from a CDATA block
* <strong>`someText.value`</strong> : the string content of the text node (with XML and numeric entities unescaped for non-CDATA elements)
* <strong>`someText.parent`</strong> : reference to the parent element table
* **Comment**
Expand Down Expand Up @@ -126,13 +129,109 @@ print(elementText(para)) --> "Hello you crazy World!"

### A Simpler DOM

If you want the DOM tables to be simpler-to-serialize you can supply the `simple` option via:
If you want the DOM tables to be easier to inspect you can supply the `simple` option via:

```lua
local dom = SLAXML:dom(myXML,{ simple=true })
```

In this case no table will have a `parent` attribute, elements will not have the `el` collection, and the `attr` collection will be a simple array (without values accessible directly via attribute name). In short, the output will be a strict hierarchy with no internal references to other tables, and all data represented in exactly one spot.
In this case the document will have no `root` property, no table will have a `parent` property, elements will not have the `el` collection, and the `attr` collection will be a simple array (without values accessible directly via attribute name). In short, the output will be a strict hierarchy with no internal references to other tables, and all data represented in exactly one spot.


### Serializing the DOM

You can serialize any DOM table to an XML string by passing it to the `SLAXML:xml()` method:

```lua
local SLAXML = require 'slaxdom'
local doc = SLAXML:dom(myxml)
-- ...modify the document...
local xml = SLAXML:xml(doc)
```

The `xml()` method takes an optional table of options as its second argument:

```lua
local xml = SLAXML:xml(doc,{
indent = 2, -- each pi/comment/element/text node on its own line, indented by this many spaces
indent = '\t', -- ...or, supply a custom string to use for indentation
sort = true, -- sort attributes by name, with no-namespace attributes coming first
omit = {...} -- an array of namespace URIs; removes elements and attributes in these namespaces
})
```

When using the `indent` option, you likely want to ensure that you parsed your DOM using the `stripWhitespace` option. This will prevent you from having whitespace text nodes between elements that are then placed on their own indented line.

Some examples showing the serialization options:

```lua
local xml = [[
<!-- a simple document showing sorting and namespace culling -->
<r c="1" z="3" b="2" xmlns="uri1" xmlns:x="uri2" xmlns:a="uri3">
<e a:foo="f" x:alpha="a" a:bar="b" alpha="y" beta="beta" />
<a:wrap><f/></a:wrap>
</r>
]]

local dom = SLAXML:dom(xml, {stripWhitespace=true})

print(SLAXML:xml(dom))
--> <!-- a simple document showing sorting and namespace culling --><r c="1" z="3" b="2" xmlns="uri1" xmlns:x="uri2" xmlns:a="uri3"><e a:foo="f" x:alpha="a" a:bar="b" alpha="y" beta="beta"/><a:wrap><f/></a:wrap></r>

print(SLAXML:xml(dom, {indent=2}))
--> <!-- a simple document showing sorting and namespace culling -->
--> <r c="1" z="3" b="2" xmlns="uri1" xmlns:x="uri2" xmlns:a="uri3">
--> <e a:foo="f" x:alpha="a" a:bar="b" alpha="y" beta="beta"/>
--> <a:wrap>
--> <f/>
--> </a:wrap>
--> </r>

print(SLAXML:xml(dom.root.kids[2]))
--> <a:wrap><f/></a:wrap>
-- NOTE: you can serialize any DOM table node, not just documents

print(SLAXML:xml(dom.root.kids[1], {indent=2, sort=true}))
--> <e alpha="y" beta="beta" a:bar="b" a:foo="f" x:alpha="a"/>
-- NOTE: attributes with no namespace come first

print(SLAXML:xml(dom, {indent=2, omit={'uri3'}}))
--> <!-- a simple document showing sorting and namespace culling -->
--> <r c="1" z="3" b="2" xmlns="uri1" xmlns:x="uri2">
--> <e x:alpha="a" alpha="y" beta="beta"/>
--> </r>
-- NOTE: Omitting a namespace omits:
-- * namespace declaration(s) for that space
-- * attributes prefixed for that namespace
-- * elements in that namespace, INCLUDING DESCENDANTS

print(SLAXML:xml(dom, {indent=2, omit={'uri3', 'uri2'}}))
--> <!-- a simple document showing sorting and namespace culling -->
--> <r c="1" z="3" b="2" xmlns="uri1">
--> <e alpha="y" beta="beta"/>
--> </r>

print(SLAXML:xml(dom, {indent=2, omit={'uri1'}}))
--> <!-- a simple document showing sorting and namespace culling -->
-- NOTE: Omitting namespace for the root element removes everything
```

Serialization of elements and attributes ignores the `nsURI` property in favor of the `nsPrefix` attribute. As such, you can construct DOM's that serialize to invalid XML:

```lua
local el = {
type="element",
nsPrefix="oops", name="root",
attr={
{type="attribute", name="xmlns:nope", value="myuri"},
{type="attribute", nsPrefix="x", name="wow", value="myuri"}
}
}
print( SLAXML:xml(el) )
--> <oops:root xmlns:nope="myuri" x:wow="myuri"/>
```

So, if you want to use a `foo` prefix on an element or attribute, be sure to add an appropriate `xmlns:foo` attribute defining that namespace on an ancestor element.


## Known Limitations / TODO
Expand All @@ -157,6 +256,14 @@ In this case no table will have a `parent` attribute, elements will not have the

## History

### v0.8 2018-Oct-23
+ Adds `SLAXML:xml()` to serialize the DOM back to XML.
+ Adds `nsPrefix` properties to the DOM tables for elements and attributes (needed for round-trip serialization)
+ Fixes test suite to work on Lua 5.2, 5.3.
+ Fixes Issue #10, allowing DOM parser to handle comments/PIs after the root element.
+ Fixes Issue #11, causing DOM parser to preserve whitespace text nodes on the document.
+ **Backwards-incompatible change**: Removes `doc.root` key from DOM when `simple=true` is specified.

### v0.7 2014-Sep-26
+ Decodes entities above 127 as UTF8 (decimal and hexadecimal).
- The encoding specified by the document is (still) ignored.
Expand Down
107 changes: 94 additions & 13 deletions slaxdom.lua
Original file line number Diff line number Diff line change
Expand Up @@ -4,24 +4,23 @@ function SLAXML:dom(xml,opts)
if not opts then opts={} end
local rich = not opts.simple
local push, pop = table.insert, table.remove
local stack = {}
local doc = { type="document", name="#doc", kids={} }
local current = doc
local doc = {type="document", name="#doc", kids={}}
local current,stack = doc, {doc}
local builder = SLAXML:parser{
startElement = function(name,nsURI)
local el = { type="element", name=name, kids={}, el=rich and {} or nil, attr={}, nsURI=nsURI, parent=rich and current or nil }
startElement = function(name,nsURI,nsPrefix)
local el = { type="element", name=name, kids={}, el=rich and {} or nil, attr={}, nsURI=nsURI, nsPrefix=nsPrefix, parent=rich and current or nil }
if current==doc then
if doc.root then error(("Encountered element '%s' when the document already has a root '%s' element"):format(name,doc.root.name)) end
doc.root = el
doc.root = rich and el or nil
end
push(current.kids,el)
if current.el then push(current.el,el) end
current = el
push(stack,el)
end,
attribute = function(name,value,nsURI)
attribute = function(name,value,nsURI,nsPrefix)
if not current or current.type~="element" then error(("Encountered an attribute %s=%s but I wasn't inside an element"):format(name,value)) end
local attr = {type='attribute',name=name,nsURI=nsURI,value=value,parent=rich and current or nil}
local attr = {type='attribute',name=name,nsURI=nsURI,nsPrefix=nsPrefix,value=value,parent=rich and current or nil}
if rich then current.attr[name] = value end
push(current.attr,attr)
end,
Expand All @@ -30,11 +29,10 @@ function SLAXML:dom(xml,opts)
pop(stack)
current = stack[#stack]
end,
text = function(value)
if current.type~='document' then
if current.type~="element" then error(("Received a text notification '%s' but was inside a %s"):format(value,current.type)) end
push(current.kids,{type='text',name='#text',value=value,parent=rich and current or nil})
end
text = function(value,cdata)
-- documents may only have text node children that are whitespace: https://www.w3.org/TR/xml/#NT-Misc
if current.type=='document' and not value:find('^%s+$') then error(("Document has non-whitespace text at root: '%s'"):format(value:gsub('[\r\n\t]',{['\r']='\\r', ['\n']='\\n', ['\t']='\\t'}))) end
push(current.kids,{type='text',name='#text',cdata=cdata and true or nil,value=value,parent=rich and current or nil})
end,
comment = function(value)
push(current.kids,{type='comment',name='#comment',value=value,parent=rich and current or nil})
Expand All @@ -46,4 +44,87 @@ function SLAXML:dom(xml,opts)
builder:parse(xml,opts)
return doc
end

local escmap = {["<"]="&lt;", [">"]="&gt;", ["&"]="&amp;", ['"']="&quot;", ["'"]="&apos;"}
local function esc(s) return s:gsub('[<>&"]', escmap) end

-- opts.indent: number of spaces, or string
function SLAXML:xml(n,opts)
opts = opts or {}
local out = {}
local tab = opts.indent and (type(opts.indent)=="number" and string.rep(" ",opts.indent) or opts.indent) or ""
local ser = {}
local omit = {}
if opts.omit then for _,s in ipairs(opts.omit) do omit[s]=true end end

function ser.document(n)
for _,kid in ipairs(n.kids) do
if ser[kid.type] then ser[kid.type](kid,0) end
end
end

function ser.pi(n,depth)
depth = depth or 0
table.insert(out, tab:rep(depth)..'<?'..n.name..' '..n.value..'?>')
end

function ser.element(n,depth)
if n.nsURI and omit[n.nsURI] then return end
depth = depth or 0
local indent = tab:rep(depth)
local name = n.nsPrefix and n.nsPrefix..':'..n.name or n.name
local result = indent..'<'..name
if n.attr and n.attr[1] then
local sorted = n.attr
if opts.sort then
sorted = {}
for i,a in ipairs(n.attr) do sorted[i]=a end
table.sort(sorted,function(a,b)
if a.nsPrefix and b.nsPrefix then
return a.nsPrefix==b.nsPrefix and a.name<b.name or a.nsPrefix<b.nsPrefix
elseif not (a.nsPrefix or b.nsPrefix) then
return a.name<b.name
elseif b.nsPrefix then
return true
else
return false
end
end)
end

local attrs = {}
for _,a in ipairs(sorted) do
if (not a.nsURI or not omit[a.nsURI]) and not (omit[a.value] and a.name:find('^xmlns:')) then
attrs[#attrs+1] = ' '..(a.nsPrefix and (a.nsPrefix..':') or '')..a.name..'="'..esc(a.value)..'"'
end
end
result = result..table.concat(attrs,'')
end
result = result .. (n.kids and n.kids[1] and '>' or '/>')
table.insert(out, result)
if n.kids and n.kids[1] then
for _,kid in ipairs(n.kids) do
if ser[kid.type] then ser[kid.type](kid,depth+1) end
end
table.insert(out, indent..'</'..name..'>')
end
end

function ser.text(n,depth)
if n.cdata then
table.insert(out, tab:rep(depth)..'<![[CDATA['..n.value..']]>')
else
table.insert(out, tab:rep(depth)..esc(n.value))
end
end

function ser.comment(n,depth)
table.insert(out, tab:rep(depth)..'<!--'..n.value..'-->')
end

ser[n.type](n,0)

return table.concat(out, opts.indent and '\n' or '')
end

return SLAXML
2 changes: 1 addition & 1 deletion slaxml-0.7-0.rockspec
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
package = "SLAXML"
version = "0.7-0"
version = "0.8-0"
source = {
url = "https://github.com/Phrogz/SLAXML.git"
}
Expand Down
20 changes: 11 additions & 9 deletions slaxml.lua
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
--[=====================================================================[
v0.7 Copyright © 2013-2014 Gavin Kistner <[email protected]>; MIT Licensed
v0.8 Copyright © 2013-2018 Gavin Kistner <[email protected]>; MIT Licensed
See http://github.com/Phrogz/SLAXML for details.
--]=====================================================================]
local SLAXML = {
VERSION = "0.7",
VERSION = "0.8",
_call = {
pi = function(target,content)
print(string.format("<?%s %s?>",target,content))
Expand All @@ -25,11 +25,13 @@ local SLAXML = {
if nsURI then io.write(" (ns='",nsURI,"')") end
io.write("\n")
end,
text = function(text)
print(string.format(" text: %q",text))
text = function(text,cdata)
print(string.format(" %s: %q",cdata and 'cdata' or 'text',text))
end,
closeElement = function(name,nsURI,nsPrefix)
print(string.format("</%s>",name))
io.write("</")
if nsPrefix then io.write(nsPrefix,":") end
print(name..">")
end,
}
}
Expand Down Expand Up @@ -71,7 +73,7 @@ function SLAXML:parse(xml,options)
end
end
local entityMap = { ["lt"]="<", ["gt"]=">", ["amp"]="&", ["quot"]='"', ["apos"]="'" }
local entitySwap = function(orig,n,s) return entityMap[s] or n=="#" and utf8(tonumber('0'..s)) or orig end
local entitySwap = function(orig,n,s) return entityMap[s] or n=="#" and utf8(tonumber('0'..s)) or orig end
local function unescape(str) return gsub( str, '(&(#?)([%d%a]+);)', entitySwap ) end

local function finishText()
Expand All @@ -82,7 +84,7 @@ function SLAXML:parse(xml,options)
text = gsub(text,'%s+$','')
if #text==0 then text=nil end
end
if text then self._call.text(unescape(text)) end
if text then self._call.text(unescape(text),false) end
end
end

Expand Down Expand Up @@ -180,7 +182,7 @@ function SLAXML:parse(xml,options)
first, last, match1 = find( xml, '^<!%[CDATA%[(.-)%]%]>', pos )
if first then
finishText()
if self._call.text then self._call.text(match1) end
if self._call.text then self._call.text(match1,true) end
pos = last+1
textStart = pos
return true
Expand Down Expand Up @@ -233,7 +235,7 @@ function SLAXML:parse(xml,options)

while pos<#xml do
if state=="text" then
if not (findPI() or findComment() or findCDATA() or findElementClose()) then
if not (findPI() or findComment() or findCDATA() or findElementClose()) then
if startElement() then
state = "attributes"
else
Expand Down
5 changes: 5 additions & 0 deletions test/files/commentwrapper.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
<!-- before -->

<r/>

<!-- after -->
11 changes: 11 additions & 0 deletions test/files/state.scxml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
<?xml version="1.0" encoding="UTF-8"?>
<scxml xmlns="http://www.w3.org/2005/07/scxml" xmlns:nv="http://nvidia.com/drive/ar/scxml" xmlns:dumb="nope" version="1">
<state id="AwaitingChoice" nv:loc="0 0 400 300">
<state id="UpToDate" nv:rgba="0 0.5 1 0.2" nv:loc="10 10 100 40">
<transition event="ota.available" nv:anchor="e1" target="UpdateAvailable" dumb:status="very" type="internal"/>
</state>
</state>
<dumb:wrapper>
<state />
</dumb:wrapper>
</scxml>
Loading

0 comments on commit 8a3e0c9

Please sign in to comment.