Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Conftest can now successfully load files using a file URL (e.g., file:///C:/path/to/data.yaml) on windows #999

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion policy/engine.go
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,9 @@ func LoadWithData(policyPaths []string, dataPaths []string, capabilities string,
return nil, fmt.Errorf("filter data paths: %w", err)
}

documents, err := loader.NewFileLoader().All(allDocumentPaths)
documents, err := loader.NewFileLoader().WithProcessAnnotation(true).Filtered(dataPaths, func(_ string, info os.FileInfo, _ int) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a couple of clarifications might be helpful here:

1- What is the reasoning behind using WithProcessAnnotation(true)
2- Why are we rewriting the same logic as defined in the previous block: https://github.com/open-policy-agent/conftest/blob/db75f9e237ddafb20ef6990eab75b8dea2d557f9/policy/engine.go#L127-#L131

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1- What is the reasoning behind using WithProcessAnnotation(true)
I have removed calling the function WithProcessAnnotation(true). It is not needed.

2- Why are we rewriting the same logic as defined in the previous block: https://github.com/open-policy-agent/conftest/blob/db75f9e237ddafb20ef6990eab75b8dea2d557f9/policy/engine.go#L127-#L131

FileLoader supports paths like:

  • /path/from/root/data.yaml
  • file:///C:/path/with/drive/data.yaml

However, it does not support paths with Windows drive letters in the format C:/path/with/drive/data.yaml. This causes an error:

Error: running test: load: filter data paths: 1 error occurred during loading: CreateFile Users/pck/Punith/test_files/data.yaml: The system cannot find the path specified.

According to recommendations, For FileLoader, paths should be passed as file:///C:/path/with/drive/data.yaml to avoid this issue.

In the previous code, the file-paths(allDocumentPaths) extracted from the input(dataPaths)

allDocumentPaths, err := loader.FilteredPaths(dataPaths, func(_ string, info os.FileInfo, _ int) bool {

were passed directly to FileLoader, which caused issues on Windows systems.

documents, err := loader.NewFileLoader().All(allDocumentPaths)

As a solution, I am using FileLoader, which takes input arg dataPaths such as file:///C:/path/with/drive/data.yaml instead of allDocumentPaths and filters the required files(.yaml, .yml and .json files).

https://github.com/pckvcode/conftest-pck/blob/fix/support-load-data-files-from-file-url/policy/engine.go#L137

With this, it supports loading from other drives on Windows without affecting other inputs like /path/from/root/data.yaml on Linux and macOS.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@boranx
Can you please review this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just found some time to look at this today (disclaimer, didn't delve into it deeply), here's my thoughts:
I think it's still not clear why we'd need two-times call loader.Filtered() back to back to get the data files loaded into engine. I wonder if we could just lean on one single loader.NewFileLoader().Filtered as follows?

	documents, err := loader.NewFileLoader().Filtered(dataPaths, func(_ string, info os.FileInfo, _ int) bool {
		if info.IsDir() {
			return false
		}
		return !contains([]string{".yaml", ".yml", ".json"}, filepath.Ext(info.Name()))
	})
	if err != nil {
		return nil, fmt.Errorf("filter data paths: %w", err)
	}

	store, err := documents.Store()
	if err != nil {
		return nil, fmt.Errorf("get documents store: %w", err)
	}

	engine.store = store

The above doesn't break any unit-test/e2e
so what is the point of having engine.docs = documentContents given we already store the data via store, err := documents.Store()
cc: @jalseth I'd love to consult your opinions here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I'm not sure what this change actually does. allDocumentPaths is already defined using loader.FilteredPaths (L127) using what appears to be the same filter function.

I'll have to locate a Windows machine to dig into this more, I don't have one readily available at the moment.

Copy link
Member

@boranx boranx Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I'm not sure what this change actually does.

Yeah, me neither.. @pckvcode could you help clarify how adding a second same filter function solves the issues on windows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to repro and it seems that the issue is the path returned from loader.FilteredPaths trims the file:// prefix (likely due to https://github.com/open-policy-agent/opa/blob/6bfd4cdf92ec246754894804976918453ade3b74/internal/file/url/url.go#L37). So file:///F:/something.yaml in the CLI inputs is now F:/something.yaml which the loader then does not work with the call to loader.FileLoader.All(), same as if the CLI arg was just F:/something.yaml without the prefix which OPA doesn't handle correctly.

This fix is reasonable, but it raises the question of do we need the allDocumentPaths anymore? I don't think we do as we can retrieve the same info from the returned documents variable from the loader.

@pckvcode Can you update this PR to make that change so we don't have duplicate loading code?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • documents.Documents is of type map[string]interface{}, while engine.docs is of type map[string]string.

  • We cannot directly assign documents.Documents to engine.docs due to the type mismatch.

  • Casting documents.Documents to map[string]string is an option, but it could fail if any values in the map are not strings.

  • In this merge request #1007, @boranx tried removing the initialization of engine.docs, but the CI tests failed.

  • As a result, I proceeded with the current code. I modified it slightly to eliminate redundancy.

  • Please suggest @boranx @jalseth

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@boranx and @jalseth
Can you please share you suggestion on the above comments?

return !info.IsDir() && !contains([]string{".yaml", ".yml", ".json"}, filepath.Ext(info.Name()))
})
if err != nil {
return nil, fmt.Errorf("load documents: %w", err)
}
Expand Down