Conversation
When a TCP flow packet has not led to app-layer updates, it is useless to run DetectRunTx, as there cannot be new matches. This happens for instance, when one side sends in a row multiple packets which are not acked (and thus not parsed in IDS mode). Doing so requires to move up the call to AppLayerParserSetTransactionInspectId so that it is only run after DetectRunTx is run, and not in the case where the transaction was not updated. Ticket: 6299
Ticket: OISF#6299 Simply because it is faster (just linear).
If flags are zero, there is nothing to store and remember. Stored signatures will be reused on a later packet, and qsorted (which may be expensive), with newer matches candidates. Avoiding to store, leads to avoid the call to qsort.
Especially sets transactions to complete when we get a response without having seen the request, so that the transactions end up getting cleaned (instead of living/leaking in the state). Also try to set the event on the relevant transaction, instead of creating a new transaction just for the purpose of having the event. Ticket: OISF#6299
|
WARNING:
Pipeline 17437 |
| } | ||
| } | ||
|
|
||
| static inline RuleMatchCandidateMergeSorted(DetectEngineThreadCtx *det_ctx, uint32_t j, uint32_t k) |
There was a problem hiding this comment.
I'm having trouble understanding this. When we get called we have have 0-N entries in tx_candidates from prefilter. These are already ordered. The candidates coming from the match_array are merged into this, and the result has to be a sorted list by the candidates s->num (internal sig id). I guess I'm not seeing how we insert sort things into the already existing list.
There was a problem hiding this comment.
Is this a matter of where this has been called, together with the fact that we still have the qsort called later on, (and we're not taking num into consideration when doing the ordering comparison), or are these factors irrelevant here?
There was a problem hiding this comment.
with the fact that we still have the qsort called later on
The point is to avoid the call to qsort (in the case we do not have match candidates added after by stored flags) cf removal of do_sort = (array_idx > x); // sort if match added anything
There was a problem hiding this comment.
I'm having trouble understanding this
Will put more comments.
Quick and dirty answer :
This is basically merging two sorted lists.
The trick is do it in place, without shifting all the elements.
So we start from the end (if we start from the beginning, and the new element should be first, we would have to shift all j elements already in place)
| TRACE_SID_TXS(s->id, tx, "no need to store no-match sig, " | ||
| "mpm will revisit it"); | ||
| } else { | ||
| } else if (inspect_flags != 0) { |
There was a problem hiding this comment.
we store more than just the flags, we also store file_no_match. Can we have a case where we'd need that stored still?
There was a problem hiding this comment.
Very good point from your knowledge and wisdom. Will look that through
There was a problem hiding this comment.
Side note, feels weird to see file_no_match as u16 when it is only 0 or 1 up to its use in StoreFileNoMatchCnt as += file_no_match;
There was a problem hiding this comment.
The goal of the logic, but I wouldn't be surprised if it is broken, is to stop tracking the files if all sigs that need it definitively failed to match. So it should increment this for each unique sig that fails to match.
There was a problem hiding this comment.
The goal of the logic, but I wouldn't be surprised if it is broken, is to stop tracking the files if all sigs that need it definitively failed to match. So it should increment this for each unique sig that fails to match.
That is what I understood from reading the code
jufajardini
left a comment
There was a problem hiding this comment.
Feels a bit out of my league, but tried to add some comments to understand better...
| @@ -1296,6 +1296,46 @@ static inline void StoreDetectFlags(DetectTransaction *tx, const uint8_t flow_fl | |||
| } | |||
| } | |||
|
|
|||
There was a problem hiding this comment.
If we merge this one, could we have a comment indicating what params j and k are?
There was a problem hiding this comment.
Good idea, will do. IIRC they are size of both sorted lists to merge into a big one
| const SigIntId id = s->num; | ||
| if (j > 0) { | ||
| const RuleMatchCandidateTx *s0 = &det_ctx->tx_candidates[j - 1]; | ||
| if (s->id > s0->id) { |
There was a problem hiding this comment.
Shouldn't this comparison be with num instead of id, since num is what's used for sorting?
There was a problem hiding this comment.
id is correct. I do think the names of objects should probably have been consistent in these structs.
Edit: Just realized what you meant, indeed the first item should be id or s->num.
There was a problem hiding this comment.
Thinking about renaming Signature::num to Signature::iid (internal id)
There was a problem hiding this comment.
cf DetectRunTxSortHelper
There was a problem hiding this comment.
Nice catch Juliana
| } | ||
| } | ||
|
|
||
| static inline RuleMatchCandidateMergeSorted(DetectEngineThreadCtx *det_ctx, uint32_t j, uint32_t k) |
There was a problem hiding this comment.
Is this a matter of where this has been called, together with the fact that we still have the qsort called later on, (and we're not taking num into consideration when doing the ordering comparison), or are these factors irrelevant here?
inashivb
left a comment
There was a problem hiding this comment.
I think I understand why you call merge sort linear.
Some background for anyone who wants to understand why this kind of makes sense but not all the way (to me).
- Quick Sort in most cases is a superior sorting algorithm than Merge Sort. Quick Sort exploits cache locality principles better and choosing the right pivot can make it much better in many aspects although both share an average time complexity of O(nlogn).
- Merge Sort works by breaking the given n element array down into smaller parts, sorting them and then putting them back together. Now, it's an incredible advantage for merge sort to get already sorted data and that we know where that is. In best cases, it can then work in O(n) [which is why I think Philippe calls it linear?]
- Now, the above points fit in our situation because as Victor said
tx_candidatesentries are already ordered and then after the merge wmatch_arrayelements it stays ordered (didn't check this). Hence, giving quite an advantage to merge sort for this case.
Questions that don't make sense to me all the way:
qsortcall at the end of this construct still exists. I would have expected that to be replaced by merge sort.- Some research tells me that Tim Sort is better suited for our usecase here. ref: https://en.wikipedia.org/wiki/Timsort What do you think?
inashivb
left a comment
There was a problem hiding this comment.
Should the commit ea12eeccf8 say "it is useless to run AppLayerParserSetTransactionInspectId"? Because then I can read the code and commit syncing.
But As we add a case to skip the call of |
First disclaimer : I am not sure I do not know what you call merge sort :-/
I have no guarantee that this third source comes ordered...
Looks nice, but maybe complex |
|
Thanks for the reviews. Will fix the dummy warning in next iteration as well |
hmm I was aware of the quadratic complexity. But, found it can be avoided by choosing the right pivot. I didn't try to check what
Oh. Based on your commit d8dcc8e, I assumed you were referring to the classic merge sort ref: https://en.wikipedia.org/wiki/Merge_sort then I tried to understand why you called it linear.
ok.
ok. Thank you! |
|
Replaced by #10160 |
Link to redmine ticket:
https://redmine.openinfosecfoundation.org/issues/6299
Describe changes:
#10127 with better commit messages and inline helper function