Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Conftest can now successfully load files using a file URL (e.g., file:///C:/path/to/data.yaml) on windows #999

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

pckvcode
Copy link

@pckvcode pckvcode commented Sep 3, 2024

Conftest encounters errors on Windows when loading file paths that include drive letters (e.g., C:/path/to/data.yaml). Even when using a file URL (e.g., file:///C:/path/to/data.yaml), we still face issues.

Screenshot 2024-08-30 at 10 31 39 AM

With these code changes, Conftest can now successfully load files using a file URL (e.g., file:///C:/path/to/data.yaml).
Screenshot 2024-09-03 at 5 10 00 PM

We opted for file URLs(e.g., file:///C:/path/to/data.yaml) instead of paths with drive letters (e.g., C:/path/to/data.yaml) because OPA does not support file paths with drive letters. For more details, see this issue comment.

Resolves: #979

@boranx Can you please review this changes? Thank you

    Conftest encounters errors on Windows when loading file paths that include drive letters (e.g., `C:/path/to/data.yaml`).
    Even when using a file URL (e.g., `file:///C:/path/to/data.yaml`), we still face issues.
    With these code changes, Conftest can now successfully load files using a file URL (e.g., `file:///C:/path/to/data.yaml`).
    We opted for file URLs instead of paths with drive letters (e.g., `C:/path/to/data.yaml`) because OPA does not support file paths with drive letters. For more details, see [this issue comment](open-policy-agent/opa#6922 (comment)).

    Resolves: open-policy-agent#979

Signed-off-by: Punith C K <[email protected]>
policy/engine.go Outdated
@@ -134,7 +134,9 @@ func LoadWithData(policyPaths []string, dataPaths []string, capabilities string,
return nil, fmt.Errorf("filter data paths: %w", err)
}

documents, err := loader.NewFileLoader().All(allDocumentPaths)
documents, err := loader.NewFileLoader().WithProcessAnnotation(true).Filtered(dataPaths, func(_ string, info os.FileInfo, _ int) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a couple of clarifications might be helpful here:

1- What is the reasoning behind using WithProcessAnnotation(true)
2- Why are we rewriting the same logic as defined in the previous block: https://github.com/open-policy-agent/conftest/blob/db75f9e237ddafb20ef6990eab75b8dea2d557f9/policy/engine.go#L127-#L131

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1- What is the reasoning behind using WithProcessAnnotation(true)
I have removed calling the function WithProcessAnnotation(true). It is not needed.

2- Why are we rewriting the same logic as defined in the previous block: https://github.com/open-policy-agent/conftest/blob/db75f9e237ddafb20ef6990eab75b8dea2d557f9/policy/engine.go#L127-#L131

FileLoader supports paths like:

  • /path/from/root/data.yaml
  • file:///C:/path/with/drive/data.yaml

However, it does not support paths with Windows drive letters in the format C:/path/with/drive/data.yaml. This causes an error:

Error: running test: load: filter data paths: 1 error occurred during loading: CreateFile Users/pck/Punith/test_files/data.yaml: The system cannot find the path specified.

According to recommendations, For FileLoader, paths should be passed as file:///C:/path/with/drive/data.yaml to avoid this issue.

In the previous code, the file-paths(allDocumentPaths) extracted from the input(dataPaths)

allDocumentPaths, err := loader.FilteredPaths(dataPaths, func(_ string, info os.FileInfo, _ int) bool {

were passed directly to FileLoader, which caused issues on Windows systems.

documents, err := loader.NewFileLoader().All(allDocumentPaths)

As a solution, I am using FileLoader, which takes input arg dataPaths such as file:///C:/path/with/drive/data.yaml instead of allDocumentPaths and filters the required files(.yaml, .yml and .json files).

https://github.com/pckvcode/conftest-pck/blob/fix/support-load-data-files-from-file-url/policy/engine.go#L137

With this, it supports loading from other drives on Windows without affecting other inputs like /path/from/root/data.yaml on Linux and macOS.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@boranx
Can you please review this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just found some time to look at this today (disclaimer, didn't delve into it deeply), here's my thoughts:
I think it's still not clear why we'd need two-times call loader.Filtered() back to back to get the data files loaded into engine. I wonder if we could just lean on one single loader.NewFileLoader().Filtered as follows?

	documents, err := loader.NewFileLoader().Filtered(dataPaths, func(_ string, info os.FileInfo, _ int) bool {
		if info.IsDir() {
			return false
		}
		return !contains([]string{".yaml", ".yml", ".json"}, filepath.Ext(info.Name()))
	})
	if err != nil {
		return nil, fmt.Errorf("filter data paths: %w", err)
	}

	store, err := documents.Store()
	if err != nil {
		return nil, fmt.Errorf("get documents store: %w", err)
	}

	engine.store = store

The above doesn't break any unit-test/e2e
so what is the point of having engine.docs = documentContents given we already store the data via store, err := documents.Store()
cc: @jalseth I'd love to consult your opinions here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I'm not sure what this change actually does. allDocumentPaths is already defined using loader.FilteredPaths (L127) using what appears to be the same filter function.

I'll have to locate a Windows machine to dig into this more, I don't have one readily available at the moment.

Copy link
Member

@boranx boranx Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I'm not sure what this change actually does.

Yeah, me neither.. @pckvcode could you help clarify how adding a second same filter function solves the issues on windows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to repro and it seems that the issue is the path returned from loader.FilteredPaths trims the file:// prefix (likely due to https://github.com/open-policy-agent/opa/blob/6bfd4cdf92ec246754894804976918453ade3b74/internal/file/url/url.go#L37). So file:///F:/something.yaml in the CLI inputs is now F:/something.yaml which the loader then does not work with the call to loader.FileLoader.All(), same as if the CLI arg was just F:/something.yaml without the prefix which OPA doesn't handle correctly.

This fix is reasonable, but it raises the question of do we need the allDocumentPaths anymore? I don't think we do as we can retrieve the same info from the returned documents variable from the loader.

@pckvcode Can you update this PR to make that change so we don't have duplicate loading code?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • documents.Documents is of type map[string]interface{}, while engine.docs is of type map[string]string.

  • We cannot directly assign documents.Documents to engine.docs due to the type mismatch.

  • Casting documents.Documents to map[string]string is an option, but it could fail if any values in the map are not strings.

  • In this merge request #1007, @boranx tried removing the initialization of engine.docs, but the CI tests failed.

  • As a result, I proceeded with the current code. I modified it slightly to eliminate redundancy.

  • Please suggest @boranx @jalseth

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@boranx and @jalseth
Can you please share you suggestion on the above comments?

Removing WithProcessAnnotation(true) which is not needed for loading data files

Signed-off-by: Punith C K <[email protected]>
… file:///C:/path/to/data.yaml) on windows

Removing duplicate code

Signed-off-by: Punith C K <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: Accessing Rego Policies from Different Drives in Windows
3 participants