-
I am trying out codeql cli for quite some time now. While running queries in Also, similar kinds of queries show different behavior(w.r.t. time they use to analyze). For example I also observed for most of the queries where some custom value can be given - for example So is there an understanding of how to know which queries usually will take time? Additional info:-
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Generally speaking, resolving call targets in Python isn't simply a matter of finding calls with a particular name, since Python doesn't provide static types for objects, and module members can potentially be redefined on the fly. Instead, we need to track the set of possible definitions for a function or method name at each point in the program, which is a rather expensive computation. The result of that analysis is then cached, so it only needs to be done once for each database. Also, note that in Python 3, the builtin
Unless those files are quite small, this would be an extremely large codebase, especially for Python.
Are you running on a specific project that's that large, or is it files from multiple projects that you're combining into a single directory? If it's the latter, I expect there's a better way to accomplish your underlying goals. Maybe try analyzing one project at a time and combining the results at the end? |
Beta Was this translation helpful? Give feedback.
-
While these two examples are similar, this similarity is only skin deep. The main difference is that the first query searches only for integer literals (that is, constants appearing directly in source code) and the second searches for all instances where The part of the Python analysis that keeps track of this is called the "points-to" analysis. Basically, we compute (an approximation of) what a given name can point to at a given point in the source code. As you can imagine, this analysis depends on itself, since in order to figure out the meaning of So, as a rule of thumb, anything that involves the points-to analysis is going to be time-consuming. On the other hand, this analysis is cached, so you only have to pay that price once for a given codebase. Most of our existing queries take advantage of this fact (as the points-to analysis allows for greater precision and generality), so you will find that "most" of the queries are slow (but hopefully only for the first query). |
Beta Was this translation helpful? Give feedback.
Generally speaking, resolving call targets in Python isn't simply a matter of finding calls with a particular name, since Python doesn't provide static types for objects, and module members can potentially be redefined on the fly. Instead, we need to track the set of possible definitions for a function or method name at each point in the program, which is a rather expensive computation. The result of that analysis is then cached, so it only needs to be done once for each database. Also, note that in Python 3, the builtin
print
is a function instead a keyword, so it has the same potential to be shadowed or replaced as any other function name (and/python/ql/examples/snippets/print.ql
does …