-
-
Notifications
You must be signed in to change notification settings - Fork 896
Cheat sheet
This is a digest of most of the methods documented at nokogiri.org. Reading the source can help, too.
Topics not covered: RelaxNG validation and Builder.
Strings are always stored as UTF-8 internally. Methods that return text values will always return UTF-8 encoded strings. Methods that return XML (like to_xml, to_html and inner_html) will return a string encoded like the source document.
More Resources
- Official Tutorials
- XPath Reference
- XPath Reference 2
- XPath Reference 3
- CSS Selector Reference
- CSS Selector Reference 2
- Mechanize
- Loofah
- Nokogumbo
- HTML/XML parsing tools
- StackOverflow top questions
Nokogiri::HTML::Document Nokogiri::XML::Document
doc = Nokogiri(string_or_io) # Nokogiri will try to guess what type of document you are attempting to parse
doc = Nokogiri::HTML(string_or_io) # [, url, encoding, options, &block]
doc = Nokogiri::XML(string_or_io) # [, url, encoding, options, &block]
# set options with block {|config| config.noblanks.noent.noerror.strict }
# OR with a bitmask {|config| config.options = Nokogiri::XML::ParseOptions::NOBLANKS | Nokogiri::XML::ParseOptions::NOENT}
# https://nokogiri.org/rdoc/Nokogiri/XML/ParseOptions.html
# doc = Nokogiri.parse(...)
# doc = Nokogiri::XML.parse(...) #shortcut to Nokogiri::XML::Document.parse
# doc = Nokogiri::HTML.parse(...) #shortcut to Nokogiri::HTML::Document.parse
# document namespaces
doc.collect_namespaces
doc.remove_namespaces!
doc.namespaces
# shortcuts for creating new nodes
doc.create_cdata(string, &block)
doc.create_comment(string, &block)
doc.create_element(name, *args, &block) # Create an element
doc.create_element "div" # <div></div>
doc.create_element "div", :class => "container" # <div class='container'></div>
doc.create_element "div", "contents" # <div>contents</div>
doc.create_element "div", "contents", :class => "container" # <div class='container'>contents</div>
doc.create_element "div" { |node| node['class'] = "container" } # <div class='container'></div>
doc.create_entity
doc.create_text_node(string, &block)
doc.root
doc.root=node
# A document is a Node, so see working_with_a_node
Nokogiri::XML::DocumentFragment Nokogiri::HTML::DocumentFragment
Generally speaking, unless you expect to have a DOCTYPE and a single root node, you don’t have a document, you have a fragment. For HTML, another rule of thumb is that documents have html and body tags, and fragments usually do not.
A fragment is a Node, but is not a Document. If you need to call methods that are only available on Document, like create_element
, call fragment.document.create_element
.
fragment = Nokogiri::XML.fragment(string)
fragment = Nokogiri::HTML.fragment(string, encoding = nil)
# Note: Searching a fragment relative to the document root with xpath
# will probably not return what you expect. You should search relative to
# the current context instead. e.g.
fragment.xpath('//*').size #=> 0
fragment.xpath('.//*').size #=> 229
Working with a Nokogiri::XML::Node
node = Nokogiri::XML::Node.new('name', document) # initialize a new node
node = document.create_element('name') # shortcut
node.document
node.name # alias of node.node_name
node.name= # alias of node.node_name=
node.read_only?
node.blank?
# Type of Node
node.type # alias of node.node_type
node.cdata? # type == CDATA_SECTION_NODE
node.comment? # type == COMMENT_NODE
node.element? # type == ELEMENT_NODE alias node.elem?
node.fragment? # type == DOCUMENT_FRAG_NODE (Document fragment node)
node.html? # type == HTML_DOCUMENT_NODE
node.text? # type == TEXT_NODE
node.xml? # type == DOCUMENT_NODE (Document node type)
# other types not covered by a convenience method
# ATTRIBUTE_DECL: Attribute declaration type
# ATTRIBUTE_NODE: Attribute node type
# DOCB_DOCUMENT_NODE: DOCB document node type
# DOCUMENT_TYPE_NODE: Document type node type
# DTD_NODE: DTD node type
# ELEMENT_DECL: Element declaration type
# ENTITY_DECL: Entity declaration type
# ENTITY_NODE: Entity node type
# ENTITY_REF_NODE: Entity reference node type
# NAMESPACE_DECL: Namespace declaration type
# NOTATION_NODE: Notation node type
# PI_NODE: PI node type
# XINCLUDE_END: XInclude end type
# XINCLUDE_START: XInclude start type
# Attributes, like a hash that maps string keys to string values
node['src'] # aliases: node.get_attribute, node.attr.
node['src'] = 'value' # alias node.set_attribute
node.key?('src') # alias node.has_attribute?
node.keys
node.values
node.delete('src') # alias of node.remove_attribute
node.each { |attr_name, attr_value| }
# Node includes Enumerable, which works on these attribute names and values
# Attribute Nodes
node.attribute('src') # Get the attribute node with name src
# Returns a Nokogiri::XML::Attr, a subclass of Nokogiri::XML::Node
# that provides +.content=+ and +.value=+ to modify the attribute value
node.attribute_nodes # returns an array of this' the Node attributes as Attr objects.
node.attribute_with_ns('src', 'namespace') # Get the attribute node with name and namespace
node.attributes # Returns a hash containing the node's attributes.
# The key is the attribute name without any namespace,
# the value is a Nokogiri::XML::Attr representing the attribute.
# If you need to distinguish attributes with the same name, but with different namespaces, use #attribute_nodes instead.
# Traversing / Modifying
# +node_or_tags+ can be a Node, a DocumentFragment, a NodeSet, or a string containing markup.
## Self
node.traverse { |node| } # yields all children and self to a block, _recursively_.
node.remove # alias of node.unlink # Unlink this node from its current context.
node.replace(node_or_tags)
# Replace this Node with +node_or_tags+.
# Returns the reparented node (if +node_or_tags+ is a Node),
# or returns a NodeSet (if +node_or_tags+ is a DocumentFragment, NodeSet, or string).
node.swap(node_or_tags) # like +replace+, but returns self to support chaining
## Siblings
node.next # alias of node.next_sibling # Returns the next sibling node
node.next=(node_or_tags) # alias of node.add_next_sibling
# Inserts node_or_tags after this node (as a sibling).
# Returns the reparented node (if +node_or_tags+ is a Node)
# or returns a NodeSet if (if +node_or_tags is a DocumentFragment, NodeSet, or string.)
node.after(node_or_tags) # like +next=+, but returns self to suppport chaining
node.next_element # Returns the next Nokogiri::XML::Element sibling node.
node.previous # alias of node.previous_sibling # Returns the previous sibling node
node.previous=(node_or_tags) # alias of node.add_previous_sibling ?
# Inserts node_or_tags before this node (as a sibling).
# Returns the reparented node (if +node_or_tags+ is a Node)
# or returns a NodeSet (if +node_or_tags+ is a DocumentFragment, NodeSet, or string.)
node.before(node_or_tags) # just like +previous=+, but returns self to suppport chaining
node.previous_element # Returns the previous Nokogiri::XML::Element sibling node.
## Parent
node.parent
node.parent=(node)
## Children
node.child # returns a Node
node.children # Get the list of children of this node as a NodeSet
node.children=(node_or_tags)
# Set the inner html for this Node
# Returns the reparented node (if +node_or_tags+ is a Node),
# or returns a NodeSet (if +node_or_tags+ is a DocumentFragment, NodeSet, or string).
node.elements # alias: node.element_children # Get the list of child Elements of this node as a NodeSet.
node.add_child(node_or_tags)
# Add +node_or_tags+ as a child of this Node.
# Returns the reparented node (if +node_or_tags+ is a Node),
# or returns a NodeSet (if +node_or_tags+ is a DocumentFragment, NodeSet, or string.)
node << node_or_tags # like above, but returns self to support chaining, e.g. root << child1 << child2
node.first_element_child # Returns the first child node of this node that is an element.
node.last_element_child # Returns the last child node of this node that is an element.
## Content / Children
node.content # aliases node.text node.inner_text node.to_str
node.content=(string) # Set the Node's content to a Text node containing +string+. The string gets XML escaped, and will not be interpreted as markup.
node.inner_html # (*args) children.map { |x| x.to_html(*args) }.join
node.inner_html=(node_or_tags)
# Sets the inner html of this Node to +node_or_tags+
# Returns self.
# Also see related method +children=+
## Searching below (see Working with a Nodeset below)
# see docs for namespace bindings, variable bindings, and custom xpath functions via a handler class
node.search(*paths) # alias: node / path # paths can be XPath or CSS
node.at(*paths) # alias node % path # Search for the first occurrence of path. Returns nil if nothing is found, otherwise a Node. (like search(path, ns).first)
node.xpath(*paths) # search for XPath queries
node.at_xpath(*paths) # like xpath(*paths).first
node.css(*rules) # search for CSS rules
node.at_css(*rules) # like css(*rules).first
node > selector # Search this node's immediate children using a CSS selector
# Searching above
node.ancestors # list of ancestor nodes, closest to furthest, as a NodeSet.
node.ancestors(selector) # ancestors that match the selector
# Where am I?
node.path # Returns the path associated with this Node
node.css_path # Get the path to this node as a CSS expression
node.matches?(selector) # does this node match this selector?
node.line # line number from input
node.pointer_id # internal pointer number
# Namespaces
node.add_namespace(prefix, href) # alias of node.add_namespace_definition
# Adds a namespace definition with prefix using href value. The result is as
# if parsed XML for this node had included an attribute
# ‘xmlns:prefix=value'. A default namespace for this node (“xmlns=”) can be
# added by passing ‘nil' for prefix. Namespaces added this way will not show
# up in #attributes, but they will be included as an xmlns attribute when
# the node is serialized to XML.
node.default_namespace=(url)
# Adds a default namespace supplied as a string url href, to self. The
# consequence is as an xmlns attribute with supplied argument were present
# in parsed XML. A default namespace set with this method will now show up
# in #attributes, but when this node is serialized to XML an “xmlns”
# attribute will appear. See also #namespace and #namespace=
node.namespace # returns the default namespace set on this node (as with an “xmlns=” attribute), as a Namespace object.
node.namespace=(ns)
# Set the default namespace on this node (as would be defined with an
# “xmlns=” attribute in XML source), as a Namespace object ns . Note that a
# Namespace added this way will NOT be serialized as an xmlns attribute for
# this node. You probably want #default_namespace= instead, or perhaps
# #add_namespace_definition with a nil prefix argument.
node.namespace_definitions
# returns namespaces defined on self element directly, as an array of
# Namespace objects. Includes both a default namespace (as in“xmlns=”), and
# prefixed namespaces (as in “xmlns:prefix=”).
node.namespace_scopes
# returns namespaces in scope for self – those defined on self element
# directly or any ancestor node – as an array of Namespace objects. Default
# namespaces (“xmlns=” style) for self are included in this array; Default
# namespaces for ancestors, however, are not. See also #namespaces
node.namespaced_key?(attribute, namespace)
# Returns true if attribute is set with namespace
node.namespaces # Returns a Hash of {prefix => value} for all namespaces on this node and its ancestors.
# This method returns the same namespaces as #namespace_scopes.
#
# Returns namespaces in scope for self – those defined on self element
# directly or any ancestor node – as a Hash of attribute-name/value pairs.
# Note that the keys in this hash XML attributes that would be used to
# define this namespace, such as “xmlns:prefix”, not just the prefix.
# Default namespace set on self will be included with key “xmlns”. However,
# default namespaces set on ancestor will NOT be, even if self has no
# explicit default namespace.
# see also attribute_with_ns
# Rubyisms
node <=> another_node # Compare two Node objects with respect to their Document. Nodes from different documents cannot be compared.
# uses xmlXPathCmpNodes "Compare two nodes w.r.t document order"
node == another_node # compares pointer_id
node.clone # alias node.dup # Copy this node. An optional depth may be passed in, but it defaults to a deep copy. 0 is a shallow copy, 1 is a deep copy.
# Visitor pattern
node.accept(visitor) # calls visitor.visit(self)
# Write it out (sorted from most flexible/hardest to use to least flexible/easiest to use)
node.write_to(io, *options)
# Write Node to +io+ with +options+. +options+ modify the output of
# this method. Valid options are:
#
# * +:encoding+ for changing the encoding
# * +:indent_text+ the indentation text, defaults to one space
# * +:indent+ the number of +:indent_text+ to use, defaults to 2
# * +:save_with+ a combination of SaveOptions constants.
# SaveOptions
# AS_BUILDER: Save builder created document
# AS_HTML: Save as HTML
# AS_XHTML: Save as XHTML
# AS_XML: Save as XML
# DEFAULT_HTML: the default for HTML document
# DEFAULT_XHTML: the default for XHTML document
# DEFAULT_XML: the default for XML documents
# FORMAT: Format serialized xml
# NO_DECLARATION: Do not include declarations
# NO_EMPTY_TAGS: Do not include empty tags
# NO_XHTML: Do not save XHTML
# e.g. node.write_to(io, :encoding => 'UTF-8', :indent => 2)
node.write_html_to(io, options={}) # uses write_to with :save_with => DEFAULT_HTML option (libxml2.6 does dump_html)
node.write_xhtml_to(io. options={}) # uses write_to with :save_with => DEFAULT_XHTML option (libxml2.6 does dump_html)
node.write_xml_to(io, options={}) # uses write_to with :save_with => DEFAULT_XML option
node.serialize # Serialize Node a string using +options+, provided as a hash or block. Uses write_to (via StringIO)
# node.serialize(:encoding => 'UTF-8', :save_with => FORMAT | AS_XML)
# node.serialize(:encoding => 'UTF-8') do |config|
# config.format.as_xml
# end
node.to_html(options={}) # serializes with :save_with => DEFAULT_HTML option (libxml2.6 does dump_html)
node.to_xhtml(options={}) # serializes with :save_with => DEFAULT_XHTML option (libxml2.6 does dump_html)
node.to_xml(options={}) # serializes with :save_with => DEFAULT_XML option
node.to_s # document.xml? ? to_xml : to_html
node.inspect
node.pretty_print(pp) # to enhance pp
# Utility
node.encode_special_chars(str) # Encodes special characters :P
node.fragment(tags) # Create a DocumentFragment containing tags that is relative to this context node.
node.parse(string_or_io, options={})
# Parse +string_or_io+ as a document fragment within the context of
# *this* node. Returns a XML::NodeSet containing the nodes parsed from
# +string_or_io+.
# External subsets, like DTD declarations
node.create_external_subset(name, external_id, system_id)
node.create_internal_subset(name, external_id, system_id)
node.external_subset
node.internal_subset
# Other:
node.description # Fetch the Nokogiri::HTML::ElementDescription for this node. Returns nil on XML documents and on unknown tags.
# e.g. if node is an <img> tag: Nokogiri::HTML::ElementDescription['img'] Nokogiri::HTML::ElementDescription: img embedded image >
node.decorate! # Decorate this node with the decorators set up in this node's Document. Used internally to provide Slop support and Hpricot compatibility via Nokogiri::Hpricot
node.do_xinclude # options as a block or hash
# Do xinclude substitution on the subtree below node. If given a block, a
# Nokogiri::XML::ParseOptions object initialized from +options+, will be
# passed to it, allowing more convenient modification of the parser options.
Working with a Nokogiri::XML::NodeSet
nodes = Nokogiri::XML::NodeSet.new(document, list=[])
# Set operations
nodes | other_nodeset # UNION, i.e. merging the sets, returning a new set
nodes + other_nodeset # UNION, i.e. merging the sets, returning a new set
nodes & other_nodeset # INTERSECTION # i.e. return a new NodeSet with the common nodes only
nodes - other_nodeset # DIFFERENCE Returns a new NodeSet containing the nodes in this NodeSet that aren't in other_nodeset
nodes.include?(node)
nodes.empty?
nodes.length # alias nodes.size
nodes.delete(node) # Delete node from the Nodeset, if it is a member. Returns the deleted node if found, otherwise returns nil.
# List operations (includes Enumerable)
nodes.each { |node| }
nodes.first
nodes.last
nodes.reverse # Returns a new NodeSet containing all the nodes in the NodeSet in reverse order
nodes.index(node) # returns the numeric index or nil
nodes[3] # element at index 3
nodes[3,4] # return a NodeSet of size 4, starting at index 3
nodes[3..6] # or return a NodeSet using a range of indexes
# alias nodes.slice
nodes.pop # Removes the last element from set and returns it, or nil if the set is empty
nodes.push(node) # alias nodes << node # Append node to the NodeSet.
nodes.shift # Returns the first element of the NodeSet and removes it. Returns nil if the set is empty.
nodes.filter(expr) # Filter this list for nodes that match an XPATH or CSS query
# find_all { |node| node.matches?(expr) }
nodes.children # Returns a new NodeSet containing all the children of all the nodes in the NodeSet
# Content
nodes.inner_html(*args) # Get the inner html of all contained Node objects
nodes.inner_text # alias nodes.text
# Convenience modifiers
nodes.remove # alias of nodes.unlink # Unlink this NodeSet and all Node objects it contains from their current context.
nodes.wrap("<div class='container'></div>") # wrap new xml around EACH NODE in a Nodeset
nodes.before(datum) # Insert datum before the first Node in this NodeSet # e.g. first.before(datum)
nodes.after(datum) # Insert datum after the last Node in this NodeSet # e.g. last.after(datum)
nodes.attr(key, value) # set the attribute key to value on all Node objects in the NodeSet
nodes.attr(key) { |node| 'value' } # set the attribute key to the result of the block on all Node objects in the NodeSet
# alias nodes.attribute, nodes.set
nodes.remove_attr(name) # removes the attribute from all nodes in the nodeset
nodes.add_class(name) # Append the class attribute name to all Node objects in the NodeSet.
nodes.remove_class(name = nil) # if nil, removes the class attrinute from all nodes in the nodeset
# Searching
nodes.search(*paths) # alias nodes / path
nodes.at(*paths) # alias nodes % path
nodes.xpath(*paths)
nodes.at_xpath(*paths)
nodes.css(*rules)
nodes.at_css(*rules)
nodes > selector # Search this NodeSet's nodes' immediate children using CSS selector
# Writing out
nodes.to_a # alias nodes.to_ary # Return this list as an Array
nodes.to_html(*args)
nodes.to_s
nodes.to_xhtml(*args)
nodes.to_xml(*args)
# Rubyisms
nodes == nodes # Two NodeSets are equal if the contain the same number of elements and if each element is equal to the corresponding element in the other NodeSet
nodes.dup # Duplicate this node set
nodes.inspect
nc = Nokogiri::HTML::NamedCharacters # a Nokogiri::HTML::EntityLookup
nc[key] # like nc.get(key).try(:value) # e.g. nc['gt'] (62) or nc['rsquo'] (8217)
nc.get(key) # returns an Nokogiri::HTML::EntityDescription
# e.g. nc.get('rsquo') #=> #<struct Nokogiri::HTML::EntityDescription value=8217, name="rsquo", description="right single quotation mark, U+2019 ISOnum">
# Adding a Processing Instruction (like <?xml-stylesheet?>)
# Nokogiri::XML::ProcessingInstruction https://nokogiri.org/tutorials/modifying_an_html_xml_document.html
pi = Nokogiri::XML::ProcessingInstruction.new(doc, "xml-stylesheet",'type="text/xsl" href="foo.xsl"')
doc.root.add_previous_sibling(pi)
Reader parsers
Reader parsers can be used to parse very large XML documents quickly without the need to load the entire document into memory or write a SAX document parser. The reader makes each node in the XML document available exactly once, only moving forward, like a cursor.
reader = Nokogiri::XML::Reader(string_or_io)
# attrs
# .encoding
# .errors
# .source
# Reading
reader.each {|node| } # node and reader are the same object. shortcut for while(node = self.read) yield(node); end;
reader.read # Move the Reader forward through the XML document.
node.name
node.local_name
# Attributes
node.attribute('src')
node.attribute_at(1)
node.attribute_count
node.attribute_nodes
node.attributes
node.attributes?
# Content
node.empty_element?
node.self_closing?
node.value # Get the text value of the node if present as a utf-8 encoded string. Does NOT advance the reader.
node.value? # Does this node have a text value?
node.inner_xml # Read the contents of the current node, including child nodes and markup into a utf-8 encoded string. Does NOT advance the reader
node.outer_xml # Does NOT advance the reader
node.base_uri # Get the xml:base of the node
node.default? # Was an attribute generated from the default value in the DTD or schema?
node.depth
# Namespaces and the rest
node.namespace_uri # Get the URI defining the namespace associated with the node
node.namespaces # Get a hash of namespaces for this Node
node.prefix # Get the shorthand reference to the namespace associated with the node.
node.xml_version # Get the XML version of the document being read
node.lang # Get the xml:lang scope within which the node resides.
node.node_type
# one of
# TYPE_ATTRIBUTE
# TYPE_CDATA
# TYPE_COMMENT
# TYPE_DOCUMENT
# TYPE_DOCUMENT_FRAGMENT
# TYPE_DOCUMENT_TYPE
# TYPE_ELEMENT
# TYPE_END_ELEMENT
# TYPE_END_ENTITY
# TYPE_ENTITY
# TYPE_ENTITY_REFERENCE
# TYPE_NONE
# TYPE_NOTATION
# TYPE_PROCESSING_INSTRUCTION
# TYPE_SIGNIFICANT_WHITESPACE
# TYPE_TEXT
# TYPE_WHITESPACE
# TYPE_XML_DECLARATION
node.state # Get the state of the reader
XSD XSD::XMLParser XSD::XMLParser::Nokogiri
xsd = Nokogiri::XML::Schema(string_or_io_to_schema_file)
doc = Nokogiri::XML(File.read(PO_XML_FILE))
xsd.valid?(doc) # => true/false
xsd.validate(doc) # returns an an array of SyntaxError s
xsd.validate(doc).each do |syntax_error|
syntax_error.error?
syntax_error.fatal?
syntax_error.none?
syntax_error.to_s
syntax_error.warning?
# undocumented attributes
syntax_error.code R
syntax_error.column R
syntax_error.domain R
syntax_error.file R
syntax_error.int1 R
syntax_error.level R
syntax_error.line R
syntax_error.str1 R
syntax_error.str2 R
syntax_error.str3 R
end
# https://nokogiri.org/rdoc/Nokogiri/XML/Schema.html
# https://nokogiri.org/rdoc/Nokogiri/XML/AttributeDecl.html
# https://nokogiri.org/rdoc/Nokogiri/XML/DTD.html
# https://nokogiri.org/rdoc/Nokogiri/XML/ElementDecl.html
# https://nokogiri.org/rdoc/Nokogiri/XML/ElementContent.html
# https://nokogiri.org/rdoc/Nokogiri/XML/EntityDecl.html
# https://nokogiri.org/rdoc/Nokogiri/XML/EntityReference.html
doc.validate # validate it against its DTD, if it has one
Nokogiri::CSS Nokogiri::CSS::Node Nokogiri::CSS::Parser Nokogiri::CSS::SyntaxError Nokogiri::CSS::Tokenizer Nokogiri::CSS::Tokenizer::ScanError
# https://nokogiri.org/rdoc/Nokogiri/CSS.html
Nokogiri::CSS.parse('selector') # => returns an AST
Nokogiri::CSS.xpath_for('selector', options={})
# https://nokogiri.org/rdoc/Nokogiri/CSS/Node.html
# attr: type, value
#methods
# accept(visitor)
# find_by_type
# new
# preprocess!
# to_a
# to_type
# to_xpath
# https://nokogiri.org/rdoc/Nokogiri/CSS/Parser.html # a Racc generated Parser
Nokogiri::XSLT Nokogiri::XSLT::Stylesheet
doc = Nokogiri::XML(File.read('some_file.xml'))
xslt = Nokogiri::XSLT(File.read('some_transformer.xslt'))
puts xslt.transform(doc) # [, xslt_parameters]
# xslt.serialize(doc) # to am xml string
# xslt.apply_to(doc, params=[]) # equivalent to xslt.serialize(xslt.transform(doc, params))
SAX Parsing
Event-driving XML parsing appropriate for reading very large XML files without reading the entire document into memory. The best documentation is in this file.
# Document template
# Define any or all of these methods to get their notifications:
# Your document doesn't have to subclass Nokogiri::XML::SAX::Document,
# doing so just saves you from having to define all the sax methods,
# rather than the few you need.
class MyDocument < Nokogiri::XML::SAX::Document
def xmldecl(version, encoding, standalone)
end
def start_document
end
def end_document
end
def start_element(name, attrs = [])
end
def end_element(name)
end
def start_element_namespace(name, attrs = [], prefix = nil, uri = nil, ns = [])
end
def end_element_namespace(name, prefix = nil, uri = nil)
end
def characters(string)
end
def comment(string)
end
def warning(string)
end
def error(string)
end
def cdata_block(string)
end
end
# Standard Parser
parser = Nokogiri::XML::SAX::Parser.new(MyDocument.new) # [, encoding = 'UTF-8]
# A block can be passed to the parse methods to get the ParserContext before parsing, but you probably don't need that
parser.parse(string_or_io)
parser.parse_io(io) # [, encoding = "ASCII"]
parser.parse_file(filename)
parser.parse_memory(string)
# If you want HTML correction features, instantiate this parser instead
parser = Nokogiri::HTML::SAX::Parser.new(MyDoc.new)
(If you're a weirdo,) You can stream the XML manually using Nokogiri::SAX::PushParser The best documentation is this file.
Slop decorator (Don’t use this)
The ::Slop decorator implements method_missing such that methods may be used instead of CSS or XPath. See the bottom of this page Nokogiri.Slop Nokogiri::XML::Document#slop! Nokogiri::Decorators::Slop
doc = Nokogiri::Slop(string_or_io)
doc = Nokogiri(string_or_io).slop!
doc = Nokogiri::HTML(string_or_io).slop!
doc = Nokogiri::XML(string_or_io).slop!
doc = Nokogiri::Slop(<<-eohtml)
<html>
<body>
<p>first</p>
<p>second</p>
</body>
</html>
eohtml
assert_equal('second', doc.html.body.p[1].text)
doc = Nokogiri::Slop <<-EOXML
<employees>
<employee status="active">
<fullname>Dean Martin</fullname>
</employee>
<employee status="inactive">
<fullname>Jerry Lewis</fullname>
</employee>
</employees>
EOXML
# navigate!
doc.employees.employee.last.fullname.content # => "Jerry Lewis"
# access node attributes!
doc.employees.employee.first["status"] # => "active"
# use some xpath!
doc.employees.employee("[@status='active']").fullname.content # => "Dean Martin"
doc.employees.employee(:xpath => "@status='active'").fullname.content # => "Dean Martin"
# use some css!
doc.employees.employee("[status='active']").fullname.content # => "Dean Martin"
doc.employees.employee(:css => "[status='active']").fullname.content # => "Dean Martin"