Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid internal use of preg_match_alll() #313

Open
chaslain opened this issue Mar 6, 2023 · 2 comments
Open

Invalid internal use of preg_match_alll() #313

chaslain opened this issue Mar 6, 2023 · 2 comments

Comments

@chaslain
Copy link

chaslain commented Mar 6, 2023

PHP Warning 'yii\base\ErrorException' with message 'preg_match_all(): Compilation failed: invalid range in character class at offset 4'

in vendor/paquettg/php-html-parser/src/PHPHtmlParser/Selector.php:91

Code was this:

$file = file_get_contents($this->file_path);
$dom = new Dom;
$dom->loadStr($file, []);

$rows = $dom->find("tr");


php version: PHP 7.3.33 (cli) (built: Mar 18 2022 03:41:41) ( NTS )
Package version: 1.7.0

@chaslain
Copy link
Author

chaslain commented Mar 6, 2023

Same error if using provided method of loading from file instead.

@FMaz008
Copy link

FMaz008 commented Jan 20, 2024

Same error trying to use it as follow:


require "../vendor/autoload.php";
use PHPHtmlParser\Dom;
$url = "https://google.com";
$dom = new Dom;
$dom->loadFromUrl($url);

Interestingly, I place a var_dump just before that line so get you some more details:
var_dump($this->pattern, $selector);

Result:

string(103) "/([\w-:\*>]*)(?:\#([\w-]+)|\.([\w-]+))?(?:\[@?(!?[\w-:]+)(?:([!*^$]?=)["']?(.*?)["']?)?\])?([\/, ]+)/is"
string(29) "meta[http-equiv=Content-Type]"
<br />
<b>Warning</b>:  preg_match_all(): Compilation failed: invalid range in character class at offset 4 in <b>/home/fmaz878/vendor/paquettg/php-html-parser/src/PHPHtmlParser/Selector.php</b> on line <b>92</b><br />

Note that without the var_dump the error is on line 91.

Specifically, what is wrong is using "-" after \w, which tries to create a range, but fail to follow proper syntax. I won't pretend to understand the purpose of that regexp, but escaping the dash seems to resolve that specific issue (and create a different error.
/([\w\-:\*>]*)(?:\#([\w\-]+)|\.([\w\-]+))?(?:\[@?(!?[\w\-:]+)(?:([!*^$]?=)["']?(.*?)["']?)?\])?([\/, ]+)/is

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants