Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More speed efficient #3

Open
vaab opened this issue Apr 30, 2013 · 3 comments
Open

More speed efficient #3

vaab opened this issue Apr 30, 2013 · 3 comments

Comments

@vaab
Copy link
Member

vaab commented Apr 30, 2013

The python implementation is desperately slow.
Might have to rewrite this in C if we want to go faster.

@nowox
Copy link

nowox commented Nov 19, 2015

Absolutely true. I recently discovered shyaml, but I cannot use it, it is way too slow.

@vaab
Copy link
Member Author

vaab commented Dec 14, 2018

Version 0.6.0 and on makes sure to use the libyaml C bindings that might help for speed efficiency.
You might want to check shyaml --version (starting from 0.6.1) to double-check that you are using the libyaml binded version.

Would be happy to have an example YAML (or bunch of them) for benchmark to actually set some metric so we know what we speak about.

@vaab
Copy link
Member Author

vaab commented Dec 17, 2018

There are numerous other ways to get more speed out of shyaml:

The code itself can be a little quicker (although, with the libyaml binding, is there a lot improvement left to be achieved ?):

  • compilation to binary code directly from existing python via nuitka (this work out of the box and could be done for each release/platform).
  • write intermediary code to get also a direct binary code via Cython.

Please note that a binary would have the added valuable benefit of not requiring python (well only libpython, but we could also go static...), and would not have any sort of dependency induced failure, to a point were we could also completely forget the python testing compatibility matrices (versions of python, versions of dependencies, installing tests) and save a lot of time from the python distribution hell. On the other hand, we would get into another hell of managing dependency between systems and architecture.

Even blazing fast code, because we use shyaml in shell, will face the costly spawning of process... So:

  • change of API to allow more done into one call of shyaml (rebuilding some more shyaml, tests, or little efficient language...), every call (spawning) that we can save can potentially save calls in tight loop. Some research might be necessary to see what are the most common mangling that would benefit of this.
    • This could go through a clever little language (but why introduce a new language ?) which could borrow
      a lot of ideas here and there (like XPATH). Of course, having a look to jq is mandatory.
    • But this could also leverage an available existing language, like evaling python. (What would be the real
      performance cost ?) And it seems to be way to big language for simple task.
  • A solution based on a daemon and interprocess communication (thinking of sockets) would be much more difficult to grasp for most, but would remove entirely the cost of spawning. With some work, we could probably offer a bash function using only builtin, insuring that shyaml is launched in daemon mode, and send it the proper way the queries and returning the responses and effectively offer the same interface that the normal CLI.

On the road toward better performance, we could think of adding a switch in shyaml measuring time spent in it's actual code compared to the time spent in PyYAML. I'm not expecting a surprise here and do not think the python code here is so important in itself.

The most important metric to move forward are :

  • the actual cost of spawning
  • The python interpreter loading time
  • the time in shyaml's code
  • the time in PyYAML's code (this ones dependends completely on the input YAML of course)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants