Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML special chars break the document #14

Open
EligiusSantori opened this issue Jul 1, 2016 · 10 comments
Open

XML special chars break the document #14

EligiusSantori opened this issue Jul 1, 2016 · 10 comments

Comments

@EligiusSantori
Copy link

EligiusSantori commented Jul 1, 2016

Hello. I got "Invalid Character Error" exception when i had tried to make XML document from db data because one of records contained "<" symbol. Then i have been required to check every xml node in my code for special chars and wrap it in CDATA block manually. I think the data should be escaped or CDATA-wrapped automatically by default. Thanks!

\FluidXml\fluidxml()->add('root', '12345')->xml(); // ok
\FluidXml\fluidxml()->add('root', '<12345>')->xml(); // error

@FoxxMD
Copy link

FoxxMD commented Jul 8, 2016

I encountered a similar problem when trying to ->add() and the string contained the character &:

DOMElement::__construct(): unterminated entity reference ...

I was able get around it (kind of) by wrapping the string in htmlspecialchars().

I haven't tested it but it might be possible to use the XML Fragment method instead -- @EligiusSantori this might be easier than wrapping with CDATA blocks.

@daniele-orlando , if the XML Fragment works would it be out of scope to ask that fluidxml catches this error and automatically attempts to use the fragment method?

@daniele-orlando
Copy link
Member

daniele-orlando commented Jul 9, 2016

Guys, the ampersand & like <, > and other characters must be escaped in XML. Nothing surprising here.
You can use CDATA blocks (they exist for this) or convert the special characters to a valid XML representation.
XML Fragments are a different thing, they are valid XML, not just a trick to escape invalid characters.

@daniele-orlando
Copy link
Member

I'll evaluate for the next release (2.0) to catch the exception and convert any special character to a valid XML representation.

@Phr33d0m
Copy link

Hi, sorry about being offtopic but the usage of ->cdata() is not clear to me.

How would one go with adding CDATA to multiple elements using chaining methods?

        $xml
            ->add('property', true)
                ->add('id', $property->id)
                ->add('url')->cdata($property->url_web) // NOPE
                ->add('title', true)->cdata($property->title) // NOPE NOPE

Thanks in advance.

@daniele-orlando
Copy link
Member

daniele-orlando commented Jul 19, 2016

If you want to stay fluent:

    $xml->add('property', true)
          ->add('id', $property->id)
          ->add('url', true)
            ->cdata($property->url_web)
          ->query('..')
          ->add('title', true)
            ->cdata($property->title);

Otherwise:

    $prop = $xml->add('property', true);
    $prop->add('id', $property->id);
    $prop->add('url', true)->cdata($property->url_web);
    $prop->add('title', true)->cdata($property->title);

@andzandz
Copy link

andzandz commented Jun 9, 2017

Hi, any update on this? Please escape characters correctly:

// Breaks
$fluid_xml->add([
    'Property' => [
        'ID' => 1234,
        'Test' => 'Hello & World'
    ]
]);

As I'm working from an array like this, I can't easily change this to use your ->cdata() method (the real code is more complicated).

This is quite a basic feature, I'm surprised it's missing. This potential bug could easily be missed and then break in production the first time a user types an & into your system somewhere.

@shoe-diamente
Copy link

I don't see the point of not escaping by default. What use cases does not escaping cover?

@jedlynch
Copy link

jedlynch commented May 9, 2019

Not that what you laid out here will not work, but I think it about the features of the library. I found this library search for a way to convert an array to XML, which the library does well except for CDATA objects. I see this function...

public function addChild($child, ...$optionals) { return $this->chooseContext(function($cx) use ($child, &$optionals) { return $cx->addChild($child, ...$optionals); }); }

...and it seems there is no way to add a CDATA object adding an array to a node. It's not deal breaker, but it would be nice to have.

@FoxxMD
Copy link

FoxxMD commented Aug 20, 2020

So this is still an issue and doesn't seem like it will ever be fixed. I've forked it to fix it (messily)

on addChild for a simple string you can optionally use htmlentities() to encode your string

Usage

<?php

$options = [
	'htmlentities' => $args, // use to optimistically encode (always run value through htmlentities())
	'exceptionHtmlentities' => $args // use to encode and recreate element only if an exception with a message containing 'unterminated entity reference' is thrown
];

// only one option can be used at a time -- if you use htmlentities then exceptionHtmlentities will be ignored

$args = true; // use htmlentities() without args
$args = array(); // spread the array as arguments EX htmlentities($value, ...$args)

// use as defaults
$fluidDoc = new FluidXml(null, ['htmlentities' => $args]);
// use on addChild()
$fluidDoc->addChild('MyNode', 'someValue', ['htmlentities' => $args]);

//EX using as defaults, only on exception, with some htmlentities arguments
$fluidDoc = FluidXml(null, ['exceptionHtmlentities' => [(ENT_NOQUOTES | ENT_SUBSTITUTE)]]);

fork is at https://github.com/Fulfillment-dot-com/fluidxml/tree/specialchars

ADDITIONALLY: my branch moved composer.json to the root of the directory because cloning from git on windows may not respect symlinks and honestly the only reason it isn't already like this is because the author has some vim issue or something -- its a personal preference that's been foisted on anyone trying to develop on this repo.

So to use this fork change your composer.json like so:

{
  "require": {
    ...
    "servo/fluidxml": "dev-specialchars",
  },
  "repositories": [
    ...
    {
      "type": "git",
      "url": "https://github.com/Fulfillment-dot-com/fluidxml.git"
    }
  ],
}

P.S. It's a big bummer that so many members of FluidXml class are all private instead of protected. This could easily have been solved by extending FluidXml to use a modified FluidInsertionHandler but instead I had to do this whole fork just to manage it :/

@dev-guidolin
Copy link

Use htmlspecialchars()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants