boundary checking #2

jstedfast · 2013-09-15T11:57:53Z

There's absolutely no need for the boyer-moore pattern matching algorithm at all, it's a waste of effort, and actually wrong in the way you use it.

Effectively all you are doing is checking that the line starts with a block of characters that match a similar block of characters of the same size (but not necessarily in the same order) as the boundary you are looking for.

For example, it seems to me that your logic would have these 2 blocks match:

--abcdefghijk

--gekfhdbacij

Obviously they are different, but using your boyer-moore logic, they are equivalent.

It's actually even worse than that. As long as the block limits itself to the subset of characters that appear in the most recent boundary marker, it doesn't even have to use all of them. Example:

--aaaaaaaaaaa

That would also match the "--abcdefghijk" boundary.

Secondly, (and I was bitten by this, too) it is actually legal for there to be any amount of linear whitespace after the boundary token before the CRLF sequence.

In other words, when you have:

Content-Type: multipart/mixed; boundary="my-boundary"

it should match:

--my-boundary<SPACE><TAB><SPACE>\r\n

Thirdly, it is possible for multiparts to not end with their end-boundary marker, so if your parser is going to handle nested multiparts, then it will need a boundary stack and not just keep track of the most recent boundary. I suspect for your use-cases, you don't need to support nested multiparts which is why your parser doesn't support them, but I figured I'd throw that out there in case it was useful to you.

AdmiralCurtiss mentioned this issue Dec 12, 2023

IOS/KD: Implement receiving mail dolphin-emu/dolphin#12385

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

boundary checking #2

boundary checking #2

jstedfast commented Sep 15, 2013

boundary checking #2

boundary checking #2

Comments

jstedfast commented Sep 15, 2013