You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There's absolutely no need for the boyer-moore pattern matching algorithm at all, it's a waste of effort, and actually wrong in the way you use it.
Effectively all you are doing is checking that the line starts with a block of characters that match a similar block of characters of the same size (but not necessarily in the same order) as the boundary you are looking for.
For example, it seems to me that your logic would have these 2 blocks match:
--abcdefghijk
--gekfhdbacij
Obviously they are different, but using your boyer-moore logic, they are equivalent.
It's actually even worse than that. As long as the block limits itself to the subset of characters that appear in the most recent boundary marker, it doesn't even have to use all of them. Example:
--aaaaaaaaaaa
That would also match the "--abcdefghijk" boundary.
Secondly, (and I was bitten by this, too) it is actually legal for there to be any amount of linear whitespace after the boundary token before the CRLF sequence.
Thirdly, it is possible for multiparts to not end with their end-boundary marker, so if your parser is going to handle nested multiparts, then it will need a boundary stack and not just keep track of the most recent boundary. I suspect for your use-cases, you don't need to support nested multiparts which is why your parser doesn't support them, but I figured I'd throw that out there in case it was useful to you.
The text was updated successfully, but these errors were encountered:
There's absolutely no need for the boyer-moore pattern matching algorithm at all, it's a waste of effort, and actually wrong in the way you use it.
Effectively all you are doing is checking that the line starts with a block of characters that match a similar block of characters of the same size (but not necessarily in the same order) as the boundary you are looking for.
For example, it seems to me that your logic would have these 2 blocks match:
--abcdefghijk
--gekfhdbacij
Obviously they are different, but using your boyer-moore logic, they are equivalent.
It's actually even worse than that. As long as the block limits itself to the subset of characters that appear in the most recent boundary marker, it doesn't even have to use all of them. Example:
--aaaaaaaaaaa
That would also match the "--abcdefghijk" boundary.
Secondly, (and I was bitten by this, too) it is actually legal for there to be any amount of linear whitespace after the boundary token before the CRLF sequence.
In other words, when you have:
Content-Type: multipart/mixed; boundary="my-boundary"
it should match:
--my-boundary<SPACE><TAB><SPACE>\r\n
Thirdly, it is possible for multiparts to not end with their end-boundary marker, so if your parser is going to handle nested multiparts, then it will need a boundary stack and not just keep track of the most recent boundary. I suspect for your use-cases, you don't need to support nested multiparts which is why your parser doesn't support them, but I figured I'd throw that out there in case it was useful to you.
The text was updated successfully, but these errors were encountered: