Skip to content
This repository has been archived by the owner on Nov 4, 2022. It is now read-only.

BlockFile.AllPackets() fail when using non-standard blocksize #229

Open
emillynge opened this issue Sep 23, 2020 · 0 comments · May be fixed by #236 or QXIP/stenographer#8
Open

BlockFile.AllPackets() fail when using non-standard blocksize #229

emillynge opened this issue Sep 23, 2020 · 0 comments · May be fixed by #236 or QXIP/stenographer#8

Comments

@emillynge
Copy link
Contributor

emillynge commented Sep 23, 2020

Dear Maintainers

I have a use case where I need the AF_PACKET blocks to be larger than the default of 1Mb (Disk IO performance related)

This however seems to cause issues with queries with only time related filters. This is because base.AllPositions uses a shortcut that sidesteps the index file completely and calls BlockFile.AllPackets.

Unfortunately, allPacketsIter.Next() reads in blockData in chunks hardcoded to 1 << 20 (1Mb) and thus creates an out of bounds exception if the actual block is larger than 1Mb.

Problem is, that the tpacket_hdr_v1 does not provide any information about how large the requested block is.

//if_packet.h

	/* Number of valid bytes (including padding)
	 * blk_len <= tp_block_size
	 */
	__u32	blk_len;

That is, even if we request blocks of 1Mb from the kernel, then the blk_len can be anything from 0-1Mb and thus stenographer cannot infer the offsets between blocks from the headers.
A solution could be to not write the full 1Mb block, but only write a truncated block of size blk_len.

This would both save disk space and make it possible to handle many different block sizes. It would however increase the complexity of some of the code in blockfile.go since we cannot blindly read 1Mb from disk. But that problem more or less exists anyways if some user sets --blocksize_kb=512.

CORRECTION

In the interest of understanding te history of this issue, I will not rewrite the above. But I now realize that the sentence:

blocks of 1Mb from the kernel, then the blk_len can be anything from 0-1Mb and thus stenographer cannot infer the offsets between blocks from the headers.

is entirely incorrect. Indeed, the source code I quoted explicitly contradict this: /* Number of valid bytes (including padding)
i.e blk_len will be 1Mb even if it contains no packet at all.

As such, the proposed solution is perfectly viable, and I have provided a PR to implement it. #236

emillynge added a commit to wehowsky/stenographer that referenced this issue Jun 3, 2021
@emillynge emillynge linked a pull request Jun 3, 2021 that will close this issue
@lmangani lmangani linked a pull request Jul 16, 2021 that will close this issue
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
1 participant