Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails to parse --hash= values embedded in requirements.txt #2

Open
aswinnnn opened this issue Jul 5, 2023 · 9 comments
Open

Fails to parse --hash= values embedded in requirements.txt #2

aswinnnn opened this issue Jul 5, 2023 · 9 comments

Comments

@aswinnnn
Copy link

aswinnnn commented Jul 5, 2023

The contents of requirements.txt used for parsing: https://github.com/anotherbridge/pdfalyzer/blob/master/requirements.txt

Within the following code: extractor.rs

Here's the Err values:

[
    found ''\\'' at 68..69,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found ''\\'' at 68..69,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found ''\\'' at 71..72,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found ''\\'' at 72..73,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found ''\\'' at 70..71,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found ''\\'' at 68..69,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found ''\\'' at 75..76,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found ''\\'' at 81..82,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found ''\\'' at 66..67,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found ''\\'' at 65..66,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found ''\\'' at 79..80,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found ''\\'' at 67..68,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found ''\\'' at 72..73,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found ''\\'' at 70..71,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..0,
]
Ok(Dependency { name: "lxml", extras: [], spec: None, marker: None })

I added lxml at the end of the file to see if it was a file quirk, and unfortunately not. Is this because it's unsupported in pep-508 crate for now? Wouldn't parsing the text before ; would be a feasible workaround?

Your crate is one of the core ones I use [issue originally mentioned at aswinnnn/pyscan#16] and would appreciate it if you could look into it. Thanks.

@aswinnnn aswinnnn changed the title Fails to parse dependencies provided with additional information [v0.3.0] Fails to parse dependencies provided with additional information Jul 6, 2023
@aswinnnn
Copy link
Author

aswinnnn commented Jul 6, 2023

So I removed the \ from the requirements.txt and:

Ok(Dependency { name: "anytree", extras: [], spec: Some(Version([VersionSpec { comparator: Eq, version: "2.8.0" }])), marker: Some(And(Operator(PythonVersion, Comparator(Ge), String("3.9")), Operator(PythonVersion, Comparator(Lt), String("4.0")))) })
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
Ok(Dependency { name: "chardet", extras: [], spec: Some(Version([VersionSpec { comparator: Eq, version: "5.0.0" }])), marker: Some(And(Operator(PythonVersion, Comparator(Ge), String("3.9")), Operator(PythonVersion, Comparator(Lt), String("4.0")))) })
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
Ok(Dependency { name: "commonmark", extras: [], spec: Some(Version([VersionSpec { comparator: Eq, version: "0.9.1" }])), marker: Some(And(Operator(PythonVersion, Comparator(Ge), String("3.9")), Operator(PythonVersion, Comparator(Lt), String("4.0")))) })   
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
Ok(Dependency { name: "deprecated", extras: [], spec: Some(Version([VersionSpec { comparator: Eq, version: "1.2.13" }])), marker: Some(And(Operator(PythonVersion, Comparator(Ge), String("3.9")), Operator(PythonVersion, Comparator(Lt), String("4.0")))) })  
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
Ok(Dependency { name: "pygments", extras: [], spec: Some(Version([VersionSpec { comparator: Eq, version: "2.13.0" }])), marker: Some(And(Operator(PythonVersion, Comparator(Ge), String("3.9")), Operator(PythonVersion, Comparator(Lt), String("4.0")))) })    
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
Ok(Dependency { name: "pypdf2", extras: [], spec: Some(Version([VersionSpec { comparator: Eq, version: "2.11.1" }])), marker: Some(And(Operator(PythonVersion, Comparator(Ge), String("3.9")), Operator(PythonVersion, Comparator(Lt), String("4.0")))) })      
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
Ok(Dependency { name: "python-dotenv", extras: [], spec: Some(Version([VersionSpec { comparator: Eq, version: "0.21.0" }])), marker: Some(And(Operator(PythonVersion, Comparator(Ge), String("3.9")), Operator(PythonVersion, Comparator(Lt), String("4.0")))) })
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
Ok(Dependency { name: "rich-argparse-plus", extras: [], spec: Some(Version([VersionSpec { comparator: Eq, version: "0.3.1.4" }])), marker: Some(And(Operator(PythonVersion, Comparator(Ge), String("3.9")), Operator(PythonVersion, Comparator(Lt), String("4.0")))) })
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
Ok(Dependency { name: "rich", extras: [], spec: Some(Version([VersionSpec { comparator: Eq, version: "12.6.0" }])), marker: Some(And(Operator(PythonVersion, Comparator(Ge), String("3.9")), Operator(PythonVersion, Comparator(Lt), String("4.0")))) })        
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
Ok(Dependency { name: "six", extras: [], spec: Some(Version([VersionSpec { comparator: Eq, version: "1.16.0" }])), marker: Some(And(Operator(PythonVersion, Comparator(Ge), String("3.9")), Operator(PythonVersion, Comparator(Lt), String("4.0")))) })
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
Ok(Dependency { name: "typing-extensions", extras: [], spec: Some(Version([VersionSpec { comparator: Eq, version: "4.4.0" }])), marker: Some(And(Operator(PythonVersion, Comparator(Ge), String("3.9")), Operator(PythonVersion, Comparator(Lt), String("3.10")))) })
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
Ok(Dependency { name: "wrapt", extras: [], spec: Some(Version([VersionSpec { comparator: Eq, version: "1.14.1" }])), marker: Some(And(Operator(PythonVersion, Comparator(Ge), String("3.9")), Operator(PythonVersion, Comparator(Lt), String("4.0")))) })       
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
Ok(Dependency { name: "yara-python", extras: [], spec: Some(Version([VersionSpec { comparator: Eq, version: "4.2.3" }])), marker: Some(And(Operator(PythonVersion, Comparator(Ge), String("3.9")), Operator(PythonVersion, Comparator(Lt), String("4.0")))) })  
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]
[
    found end of input at 0..1,
]

So it did parse all 14 dependencies but treated the --hash values as its own line for parsing, and failed, I presume.

@aswinnnn
Copy link
Author

aswinnnn commented Jul 6, 2023

Is pep-508 able to parse --hash= values? As mentioned in:
https://pip.pypa.io/en/stable/topics/secure-installs/

I don't think I saw this in the original pep 508 spec and it would be easy to miss

@aswinnnn
Copy link
Author

aswinnnn commented Jul 6, 2023

This is a better read: https://pip.pypa.io/en/stable/reference/requirements-file-format/#per-requirement-options

As hash checking is part of per-requirement options in pip, which checks whether the requirements.txt contains --hash= values that it can use. Would this crate be interested in implementing this?

@aswinnnn
Copy link
Author

aswinnnn commented Jul 6, 2023

Only --hash seems to want an embedded value in the requirements.txt in those options, so implementing something for just parsing that would fit the requirement I think.

@aswinnnn aswinnnn changed the title [v0.3.0] Fails to parse dependencies provided with additional information [v0.3.0] Fails to parse --hash= values embedded in requirements.txt Jul 6, 2023
aswinnnn added a commit to aswinnnn/pep-508 that referenced this issue Jul 6, 2023
@figsoda
Copy link
Owner

figsoda commented Jul 6, 2023

Looks like requirements.txt's format isn't equivalent to pep 508, I wonder if it's possible to introduce some sort of flavoring, since pyproject seems to just use pep 508 directly

@aswinnnn
Copy link
Author

aswinnnn commented Jul 6, 2023

Well, your crate is mostly used for parsing from requirements\constrains.txt, pyproject.toml and other spaces where people use PEP-508. And from that set, requirements.txt is the most popular one. Wouldn't you consider solving this supposedly edge case?

Maybe there could be an embedded field in the return struct that could contain these hashes or for anything else that are "embedded" onto the parsing text?

@figsoda
Copy link
Owner

figsoda commented Jul 6, 2023

Wouldn't you consider solving this supposedly edge case?

Yes, that's why I wanted to introduce some sort of flavoring, possibly having a module for requirements.txt

@aswinnnn
Copy link
Author

aswinnnn commented Jul 7, 2023

Wouldn't you consider solving this supposedly edge case?

Yes, that's why I wanted to introduce some sort of flavoring, possibly having a module for requirements.txt

That sounds fantastic.

@aswinnnn
Copy link
Author

any updates on this?

@figsoda figsoda changed the title [v0.3.0] Fails to parse --hash= values embedded in requirements.txt Fails to parse --hash= values embedded in requirements.txt Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants