Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional check to prevent false positive endpoints #93

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

hanslovsky
Copy link
Member

Addresses #92

This works because it specifically addresses my use case described in #92 (some string of the form A:/B:C in the arguments). I am not too happy with the solution because it opens the door to just adding more and more special case checks. Maybe a regular expression check would be better in Endpoint.is_endpoint, e.g. something along the lines of

[a-z][a-z0-9-_]+[a-z0-9](\.[a-z][a-z0-9-_]+[a-z0-9])

Unfortunately, maven is not very specific with their naming conventions

@codecov-commenter
Copy link

codecov-commenter commented Mar 2, 2023

Codecov Report

Merging #93 (a5b4226) into main (fcaf307) will not change coverage.
The diff coverage is 100.00%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@           Coverage Diff           @@
##             main      #93   +/-   ##
=======================================
  Coverage   94.51%   94.51%           
=======================================
  Files           6        6           
  Lines         875      875           
=======================================
  Hits          827      827           
  Misses         48       48           
Impacted Files Coverage Δ
src/jgo/jgo.py 91.97% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@ctrueden
Copy link
Member

ctrueden commented Mar 2, 2023

@hanslovsky Thanks for pursuing a solution! But I would have thought the -- separator should be good enough to prevent jgo from matching subsequent arguments as potential endpoint expressions, no? The fact that using -- didn't work for your use case is a bug, isn't it? Or do I misunderstand the issue?

@hanslovsky
Copy link
Member Author

@ctrueden I am not sure that this is a bug and if -- is intended to be used by jgo. I did not find it at all in jgo.py. We can certainly think about adding it here, but we will also need to do some research if it already has some reserved functionality in argparse. I found it in util.main_from_endpoint, though, but the use case is a bit different: It does not need to scan the argv for the endpoint string. It converts the user-provided argv into something that jgo.main can use.

There is also a similar check to the one I added in is_endpoint:

endpoint_elements[0].startswith("-")

going back to 3c57718 (2018 is 5 years ago btw 😱)

There is also a pattern check for the entrypoint candidate:

(.*https?://.*|[a-zA-Z]:\\.*)

Maybe we should change this to

(.*(https?|grpc)://.*|[a-zA-Z]:\\.*)

or maybe even to

(.*://.*|[a-zA-Z]:\\.*)

Unfortunately, the maven naming guides are not very helpful for what we can exclude. // is probably never part of a valid groupId or artifactId. I would like to have a single pattern that we can use confidently rather than updating the check for every problem we run into.

@hanslovsky
Copy link
Member Author

I added grpc to pattern instead of adding a check for / in Endpoint.is_endpoint. This seems cleaner to me. It might make sense to move the pattern check into Endpoint.is_endpoint, too.

@kephale
Copy link

kephale commented Oct 29, 2023

Bumping this, @ctrueden @hanslovsky. I'm running into this when using jgo with arguments that contain s3:// paths. I liked @hanslovsky original proposal of catching things with a URL shape instead of going protocol by protocol.

@hanslovsky
Copy link
Member Author

hanslovsky commented Oct 31, 2023

Maybe the best option would be to change this PR to use

(.*://.*|[a-zA-Z]:\\.*)

and investigate double dash (--) as another option to explicitly separate jgo args from program arguments.

@kephale this could be a temporary workaround: I was able to fix the issue that I had by changing the URL shape input from a positional argument of my Java CLI to an option that I could specify like

--dataset=grpc://host:port

This, of course, works only if you own the code for the CLI that you would like to run, and can change the CLI without breaking anyone else's programs.

@kephale
Copy link

kephale commented Oct 31, 2023

@hanslovsky aha! I tried the generalized regex, but only now realize what you were saying with the --. I'll try that one when I run into this again (which will be soon :D)

@@ -271,7 +271,7 @@ def run_and_combine_outputs(command, *args):

def find_endpoint(argv, shortcuts={}):
# endpoint is first positional argument
pattern = re.compile("(.*https?://.*|[a-zA-Z]:\\.*)")
pattern = re.compile("(.*(https?|grpc)://.*|[a-zA-Z]:\\.*)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this just kicks the can down the road. Another protocol will hit this problem, and be needed to be added to the regex.

For the case mentioned in #92 , the problem is that the -- isnt being treat with sufficient priority. Everything after the -- should be ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants