Skip to content
Open
Show file tree
Hide file tree
Changes from 58 commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
02de760
Added heuristics file content detector that determines the content ba…
Dimi1010 Sep 12, 2025
d2b6339
Merge remote-tracking branch 'upstream/dev' into feature/heuristic-fi…
Dimi1010 Sep 12, 2025
685dd9f
Moved stream checkpoint outside format detector as it is not directly…
Dimi1010 Sep 12, 2025
40dee69
Added a new factory function `createReader` that uses the new heurist…
Dimi1010 Sep 12, 2025
f1e3e18
Add <algorithm> include.
Dimi1010 Sep 12, 2025
8da1790
Added unit tests.
Dimi1010 Sep 12, 2025
3ad51e2
Deprecated old factory function.
Dimi1010 Sep 12, 2025
15c2000
Add byte-swapped zstd magic number.
Dimi1010 Sep 12, 2025
17af8d4
Lint
Dimi1010 Sep 12, 2025
46418ec
Move enum closer to first usage.
Dimi1010 Sep 12, 2025
3d713ab
Added unit tests for file reader device factory.
Dimi1010 Sep 15, 2025
a2391ec
Revert indentation.
Dimi1010 Sep 15, 2025
ea328d7
Fixed StreamCheckpoint to restore original stream state.
Dimi1010 Sep 15, 2025
db86c3e
Merge branch 'dev' into feature/heuristic-file-selection
Dimi1010 Sep 15, 2025
4aed9bd
Merge branch 'dev' into feature/heuristic-file-selection
Dimi1010 Sep 20, 2025
a83ae2b
Moved isStreamSeekable helper to inside `CaptureFileFormatDetector`.
Dimi1010 Sep 20, 2025
916e872
Added pcap magic number for Alexey Kuznetzov's modified pcap format.
Dimi1010 Sep 20, 2025
022529f
Merge remote-tracking branch 'upstream/dev' into feature/heuristic-fi…
Dimi1010 Sep 26, 2025
169fcd2
Split the unit test into multiple smaller tests.
Dimi1010 Sep 26, 2025
db8c848
Merge branch 'dev' into feature/heuristic-file-selection
Dimi1010 Oct 2, 2025
3e74912
Merge remote-tracking branch 'upstream/dev' into feature/heuristic-fi…
Dimi1010 Oct 2, 2025
f1613c4
Added helper to indicate if ZstSupport is enabled for PcapNg devices.
Dimi1010 Oct 2, 2025
bc2bacd
Split pcap microsecond and nanosecond file heuristics tests.
Dimi1010 Oct 2, 2025
58ac45d
Skipping Zst test case if zst is not supported.
Dimi1010 Oct 2, 2025
3b4b5ad
Due to file heuristics returning PcapNG format on Zstd archive, if Zs…
Dimi1010 Oct 2, 2025
18379b4
Lint
Dimi1010 Oct 2, 2025
8a4f6f8
Added invalid device factory to pcap tag.
Dimi1010 Oct 2, 2025
7776e0e
Updated static zst archives to be actual archives.
Dimi1010 Oct 2, 2025
4f52f59
Centralized PTF test name width under a macro.
Dimi1010 Oct 3, 2025
88ebfff
Add Pcap++Test header files to test sources for IDE tooling.
Dimi1010 Oct 3, 2025
41fe188
Fixed test output formatting.
Dimi1010 Oct 3, 2025
c8ae4f8
Lint
Dimi1010 Oct 3, 2025
c7cab2b
Typo fix.
Dimi1010 Oct 3, 2025
6d55077
Merge remote-tracking branch 'upstream/dev' into feature/heuristic-fi…
Dimi1010 Oct 6, 2025
682eeac
Shortened test names.
Dimi1010 Oct 6, 2025
07804da
Simplified invalid file test.
Dimi1010 Oct 6, 2025
9c4fc08
Simplified ZST tests.
Dimi1010 Oct 6, 2025
d975157
Added snoop test.
Dimi1010 Oct 7, 2025
40530df
Expanded granularity of file format detection.
Dimi1010 Oct 7, 2025
96a61b2
Marked `checkSupport` functions as constexpr to enable compile time o…
Dimi1010 Oct 7, 2025
55a6b7a
Exclude json from pre-commit cppcheck as it is slow due to many defin…
Dimi1010 Oct 7, 2025
3ab14e7
Lint
Dimi1010 Oct 7, 2025
5dd9a30
Fix runtime side effects inside constexpr function.
Dimi1010 Oct 7, 2025
45ad769
Added a secondary factory function to separate mixed error handling m…
Dimi1010 Oct 7, 2025
d24a9ad
Revert deprecation message, as doxygen is unhappy.
Dimi1010 Oct 7, 2025
f5ff879
Update tests.
Dimi1010 Oct 7, 2025
2c1b2c4
Update deprecation warning to point to the function closer to the sig…
Dimi1010 Oct 7, 2025
8d1ed1d
Catch general exception instead of runtime error.
Dimi1010 Oct 7, 2025
0ea2da9
Shortened deprecation message due to pre-commit warnings when its is …
Dimi1010 Oct 7, 2025
c209c90
Fix braces.
Dimi1010 Oct 9, 2025
8d77aa0
Simplfy test.
Dimi1010 Oct 9, 2025
af12d2f
Added tests for createReader failures.
Dimi1010 Oct 9, 2025
b357087
Merge branch 'dev' into feature/heuristic-file-selection
Dimi1010 Oct 10, 2025
443c883
Simplified pcap detection to not require to read the entire pcap header.
Dimi1010 Oct 11, 2025
202d5cc
Added const qualifiers to detector methods.
Dimi1010 Oct 11, 2025
e6b2aa9
Added dedicated unit tests for CaptureFileFormatDetector.
Dimi1010 Oct 11, 2025
b8fb635
Added more tests for `createReader`.
Dimi1010 Oct 11, 2025
c6c7720
Add static assert for array indice checks.
Dimi1010 Oct 11, 2025
181a8b4
Updated detectPcap selection.
Dimi1010 Oct 18, 2025
76aa850
Merge branch 'dev' into feature/heuristic-file-selection
Dimi1010 Oct 18, 2025
d8f7419
Extracted capture format detector to remove it from publicly availabl…
Dimi1010 Oct 18, 2025
91a7a0a
Fix includes.
Dimi1010 Oct 18, 2025
e275950
Removed duplicate files from tracking.
Dimi1010 Oct 18, 2025
e7a42b5
Lint
Dimi1010 Oct 18, 2025
54f7bae
Trimmed pcapng sample.
Dimi1010 Oct 18, 2025
93cba3d
Merge branch 'dev' into feature/heuristic-file-selection
Dimi1010 Oct 24, 2025
3643fac
Change PcapNGZst to ZstArchive. Zst to PcapNG branch folding is done …
Dimi1010 Oct 24, 2025
ec5980f
Added separate format value for "modified" pcap to separate the forma…
Dimi1010 Oct 24, 2025
b3639a9
Docs fix.
Dimi1010 Oct 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
exclude: '.*\.(pcap|pcapng|dat)|(PacketExamples|PcapExamples|expected_output|pcap_examples).*\.txt'
exclude: '.*\.(pcap|pcapng|dat)|(PacketExamples|PcapExamples|expected_output|pcap_examples).*\.(txt|zst|zstd)'
fail_fast: false
repos:
- repo: local
Expand Down Expand Up @@ -37,6 +37,7 @@ repos:
files: ^(Common\+\+|Packet\+\+|Pcap\+\+|Tests|Examples)/.*\.(cpp|h)$
- id: cppcheck
args: ["--std=c++14", "--language=c++", "--suppressions-list=cppcheckSuppressions.txt", "--inline-suppr", "--force"]
exclude: ^3rdParty/json
- repo: https://github.com/BlankSpruce/gersemi
rev: 0.22.3
hooks:
Expand Down
1 change: 1 addition & 0 deletions Pcap++/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,7 @@ target_link_libraries(
)

if(LIGHT_PCAPNG_ZSTD)
target_compile_definitions(Pcap++ PRIVATE -DPCPP_PCAPNG_ZSTD_SUPPORT)
target_link_libraries(Pcap++ PRIVATE light_pcapng)
endif()

Expand Down
60 changes: 60 additions & 0 deletions Pcap++/header/PcapFileDevice.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,36 @@ namespace pcpp
/// @struct LightPcapNgHandle
/// An opaque struct representing a handle for pcapng files.
struct LightPcapNgHandle;

/// @brief An enumeration representing different capture file formats.
enum class CaptureFileFormat
{
Unknown,
Pcap, // regular pcap with microsecond precision
PcapNano, // regular pcap with nanosecond precision
PcapNG, // uncompressed pcapng
PcapNGZstd, // zstd compressed pcapng
Snoop, // solaris snoop
};

/// @brief Heuristic file format detector that scans the magic number of the file format header.
class CaptureFileFormatDetector
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken, this used to be in the .cpp file, right? Is the reason we moved it to the .h file is to make it easier to test?

If yes, I think we can test it using createReader() - create a temporary fake file with the data we want to test, and delete it when the test is done

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that suggestion initially, but it would have been an extremely fragile unit test. The "pass" conditions would have been checked indirectly.

Also, createReader has multiple return paths for Nano / Zst file formats, which would have caused complications since the format test would have needed to care about the environment it runs at, which it doesn't have to as a standalone.

Any additional changes to createReader could also break the test, which they really shouldn't. For example, I am thinking of maybe adding additional logic for Zst archive to check if the compressed data is actually a pcapng, and not a random file. This would be a nightmare to make compatible with the "spoofed files" test due to assumptions on the test that createReader doesn't do anything more complicated than check the initial magic number.

So, in the end, you end up with a more compilcated unit test to read through that:

  • depends on the environment it runs on.
  • can be broken not just by changes to the format detector but also changes to the createReader factory, too.
  • induces requirements on createReader as it uses its behavior to test detectFormat.

Copy link
Owner

@seladb seladb Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand it's better to test CaptureFileFormatDetector as a standalone class, but it requires exposing it in the .h file which is not great (even though it's in the internal namespace). Testing createReader is a bit more fragile, but I don't think the difference is that big. Of course, if we add logic to detect more file types or update the existing detection logic some tests might break, but we easily fix them as needed.

I usually try to avoid the internal namespace where possible because it's still in the .h file and is exposed to users, and we'd like to keep our API as clean as possible

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing createReader is a bit more fragile, but I don't think the difference is that big. Of course, if we add logic to detect more file types or update the existing detection logic some tests might break, but we easily fix them as needed.

It is a big difference and it's not always an easy fix. I plan to add the aforementioned Zst checks in another PR after this one, and that would make zst spoofing in createReader impossible, due to zst format automatically being checked for PcapNg or Unknown contents. Therefor you can't rely on the return of createReader to find out what the return of detectFormat was, because nullptr can be returned from several paths from detectFormat return value (Unknown, Nano + unsupported, Zst + unsupported). We have already had issues with tests being silently broken (#1977 comes to mind), so I would prefer to avoid fragile tests if we can.

I usually try to avoid the internal namespace where possible because it's still in the .h file and is exposed to users, and we'd like to keep our API as clean as possible

Fair, it is exposed, but the that is the entire reason of having the internal namespace. It is a common convention that external users shouldn't really touch it. If you want to keep the primary public header files clean there are a couple options:

  • I have seen many libraries have a subfolder internal / detail in their public include folder, where they keep all their internal code headers that need to be exposed. That keeps the "internal" code separate from the "public" code, if users want to read through the headers. This is a common convention used in Boost libraries. "public" headers that depend on internal headers include them from the internal subfolder.
  • In the current case, we have another option. Since the CaptureFileFormatDetector is only needed in the cpp part and not in the header part, we can extract it to a fully internal header, kept with the source files. This would prevent it from being exposed in the public API, but the Test project can be manually set to search for headers from "Pcap++/src" too, to allow it to link in the tests.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a big difference and it's not always an easy fix. I plan to add the aforementioned Zst checks in another PR after this one, and that would make zst spoofing in createReader impossible, due to zst format automatically being checked for PcapNg or Unknown contents. Therefor you can't rely on the return of createReader to find out what the return of detectFormat was, because nullptr can be returned from several paths from detectFormat return value (Unknown, Nano + unsupported, Zst + unsupported). We have already had issues with tests being silently broken (#1977 comes to mind), so I would prefer to avoid fragile tests if we can.

I'm not sure I understand... if we create fake files we know which type to expect, so all the test needs to do is verify the created file device is of the expected type 🤔

  • In the current case, we have another option. Since the CaptureFileFormatDetector is only needed in the cpp part and not in the header part, we can extract it to a fully internal header, kept with the source files. This would prevent it from being exposed in the public API, but the Test project can be manually set to search for headers from "Pcap++/src" too, to allow it to link in the tests.

I guess we can do that, but I still don't understand why we can't test it with createReader or tryCreateReader

Copy link
Collaborator Author

@Dimi1010 Dimi1010 Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why I labeled the PR as enhancement, not refactor.
Nowhere in those steps do I think we are restructuring existing code for the sole reason of testing existing features.

If it wasn't for the sake of testing - would you include CaptureFileFormatDetector.h as a separate file or have its implementation within PcapFileDevice.cpp? I think I know the answer 🙂 because it was initially inside PcapFileDevice.cpp and we only extracted it to a separate file for the sake of the tests...

With what the current iteration of the detector is, either works, tbh. The initial implementation, which was added to the .cpp because a lot of the business logic of the factory was intermixed with the detector. (e.g. Zst archive -> PcapNG folding, Pcap and PcapNano being just Pcap). That is no longer the case.

But sure, it was extracted due to the requirements for unit tests for every magic number, which would be much easier to maintain in the long run if done directly on the format detector. For one, it avoids the filesystem, which IMO is always more trouble than its worth if it can be avoided relatively trivially. For another, it is a technical debt on expanding createReader validation logic.

I agree it's not a large complication, but we almost never do it in PcapPlusPlus, and if we do, we need to have a good reason for it. Testing could be a good reason, but in this case the same test could be run on createReader even though the abstraction is not ideal

The good reason I have is that this unit test through createReader will need to be changed literally in the next PR I plan to make after this one , to keep it running even though I don't plan to touch the format detector code.

The planned changes in validation being:

  • Compressed PcapNG: Unpacking a ZST archive in createReader and checking the format of the archived file. This will essentially brick any spoofed ZST file, as it will not be able to be unpacked, fail factory validation and return nullptr.
  • Have open() / close() be called inside the factory prior to device return to run secondary validation that the reader can actually be opened. File devices can't be retargeted so no point in returning a reader that will just fail to open when the user tries, IMO. This will essentially brick all spoofed files since they can't be opened by the device by definition.

If you insist on having it done through createReader then fine, but that solution opens up more work for future changes that I have planned around with the current one.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But sure, it was extracted due to the requirements for unit tests for every magic number, which would be much easier to maintain in the long run if done directly on the format detector. For one, it avoids the filesystem, which IMO is always more trouble than its worth if it can be avoided relatively trivially. For another, it is a technical debt on expanding createReader validation logic.

I don't think this piece of logic will change much because I don't expect more file types to be added (and even if we will, it only happens rarely), so maintenance in the long run shouldn't be an issue. For the same reason I don't think we'll expand createReader much

  • Compressed PcapNG: Unpacking a ZST archive in createReader and checking the format of the archived file. This will essentially brick any spoofed ZST file, as it will not be able to be unpacked, fail factory validation and return nullptr.

I don't think we want to unpack the Zstd archive just to see if it's valid. We have the pacpng library that does that. The logic in createReader does an educated guess, not a bullet-proof validation. Otherwise we can argue that checking the magic numbers is not enough - why not validate the entire pcap / pcapng file? Of course we don't want to do that because libpcap is doing it for us. The same should apply for Zstd

  • Have open() / close() be called inside the factory prior to device return to run secondary validation that the reader can actually be opened. File devices can't be retargeted so no point in returning a reader that will just fail to open when the user tries, IMO. This will essentially brick all spoofed files since they can't be opened by the device by definition.

As mentioned earlier, I don't think it's the approach we want. createReader should do an educated guess, nothing more

If you insist on having it done through createReader then fine, but that solution opens up more work for future changes that I have planned around with the current one.

Again, I don't think this logic will change much after this refactoring. Even if it will, I don't think it'll be a huge refactoring

Copy link
Collaborator Author

@Dimi1010 Dimi1010 Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic in createReader does an educated guess, not a bullet-proof validation

Curious why wouldn't you want that?

I would reason that having a better validation sequence inside the factory would make for cleaner UX due to less boilerplate needed by the user?

auto dev = IFileDevice::tryCreateReader("filePath");
if (!dev)
{
  // User has to handle error here.
}

// User has to open device.
bool res = dev.open();
if (!res)
{
  // User also has to handle error here.
  // Zst particularly may fail here if contents are not pcapng.
}

// Use device here.

Having the integrated open / validation would allow single line, before use. I don't see much use cases where you would want to create a device reader and not open to read from it, no? It also fits with the RAII methodology of avoiding a 2-stage init where possible.

auto dev = IFileDevice::tryCreateReader("filePath");
if(!dev)
{
  // Failed to create device.
  // Note: No need for second boilerplate error handler prior to use.
}

// Use device here.

The live devices need open() because they are created by the runtime at startup.
File devices don't need to have that limitation since they are entirely created by the user.

I don't think we want to unpack the Zstd archive just to see if it's valid. We have the pacpng library that does that.

Yes, but I am unsure if it gives a precise error message of what went wrong or just a generic failure error.

Otherwise we can argue that checking the magic numbers is not enough - why not validate the entire pcap / pcapng file?

Which is as simple as calling open() inside the factory function, no? As you said, the backend already does validation, so why not reuse it for the factory validation?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why wouldn't you want that?

Extracting the archived file just to verify the format is wasteful and might take a long time, especially if it's called on large files. Also - for these large files it'd mean the file is extracted twice - once in createReader and again when actually reading the file

Yes, but I am unsure if it gives a precise error message of what went wrong or just a generic failure error.

If this is indeed the case, maybe we need to fix the LightPcapNg code?

Which is as simple as calling open() inside the factory function, no? As you said, the backend already does validation, so why not reuse it for the factory validation?

Not necessarily - as far as I know open() checks mostly the header and doesn't go over the rest of the file, so a user can open a file with a correct header but corrupted data and reading the file will fail

Copy link
Collaborator Author

@Dimi1010 Dimi1010 Nov 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracting the archived file just to verify the format is wasteful and might take a long time, especially if it's called on large files.

There is no need to extract the entire file. ZST compression works on independent frames allowing frame-by-frame (streaming) decompression. We only need to decompress the first frame to read the magic number to validate that the archive contents appear to be PcapNG. How large the total file is is irrelevant.

Incidentally, frame-by-frame is also how LightPcapNG reads the ZST archive. It decompresses a frame, reads the fully decompressed PcapNG records in it, and decompresses the next frame if needed.

If this is indeed the case, maybe we need to fix the LightPcapNg code?

Which is C code, and makes it much harder to output a readable error. Not to mention we need to deal with passing that error up the stack.

Not necessarily - as far as I know open() checks mostly the header and doesn't go over the rest of the file, so a user can open a file with a correct header but corrupted data and reading the file will fail

But it will still have passed open(). My idea isn't that createReader should validate that everything is correct. It is that it should validate just enough to guarantee that the returned device can successfully pass an open() call. The device might even be returned already opened and ready for reading, reducing the user side boilerplate.

There is no reason to return a device that can't even be opened, since the user can't do anything with it. It just adds more boilerplate as the user has to do the error handling twice.

If the records afterwards are corrupted at some point, the read should fail when the corrupted data is reached.

{
public:
/// @brief Checks a content stream for the magic number and determines the type.
/// @param content A content stream that contains the file content.
/// @return A CaptureFileFormat value with the detected content type.
CaptureFileFormat detectFormat(std::istream& content) const;

private:
CaptureFileFormat detectPcapFile(std::istream& content) const;

bool isPcapNgFile(std::istream& content) const;

bool isSnoopFile(std::istream& content) const;

bool isZstdArchive(std::istream& content) const;
};
} // namespace internal

/// @enum FileTimestampPrecision
Expand Down Expand Up @@ -124,7 +154,29 @@ namespace pcpp
/// it returns an instance of PcapFileReaderDevice
/// @param[in] fileName The file name to open
/// @return An instance of the reader to read the file. Notice you should free this instance when done using it
/// @deprecated Prefer `createReader` or `tryCreateReader` due to selection of reader based on file content
/// instead of extension.
PCPP_DEPRECATED("Prefer `tryCreateReader` due to selection of reader based on file content.")
static IFileReaderDevice* getReader(const std::string& fileName);

/// @brief Creates an instance of the reader best fit to read the file.
///
/// The factory function uses heuristics based on the file content to decide the reader.
/// If the file type is known at compile time, it is better to construct a concrete reader instance directly.
///
/// @param[in] fileName The path to the file to open.
/// @return A unique pointer to a reader instance
/// @throws std::runtime_error If the file could not be opened or unsupported.
static std::unique_ptr<IFileReaderDevice> createReader(const std::string& fileName);

/// @brief Tries to create an instance of the reader best fit to read the file.
///
/// The factory function uses heuristics based on the file content to decide the reader.
/// If the file type is known at compile time, it is better to construct a concrete reader instance directly.
///
/// @param fileName The path to the file to open.
/// @return A unique pointer to a reader instance, or nullptr if the file could not be opened or unsupported.
static std::unique_ptr<IFileReaderDevice> tryCreateReader(const std::string& fileName);
};

/// @class IFileWriterDevice
Expand Down Expand Up @@ -313,6 +365,10 @@ namespace pcpp
PcapNgFileReaderDevice& operator=(const PcapNgFileReaderDevice& other);

public:
/// @brief A static method that checks if the device was built with zstd compression support
/// @return True if zstd compression is supported, false otherwise.
static bool isZstdSupported();

/// A constructor for this class that gets the pcap-ng full path file name to open. Notice that after calling
/// this constructor the file isn't opened yet, so reading packets will fail. For opening the file call open()
/// @param[in] fileName The full path of the file to read
Expand Down Expand Up @@ -397,6 +453,10 @@ namespace pcpp
PcapNgFileWriterDevice& operator=(const PcapNgFileWriterDevice& other);

public:
/// @brief A static method that checks if the device was built with zstd compression support.
/// @return True if zstd compression is supported, false otherwise.
static bool isZstdSupported();

/// A constructor for this class that gets the pcap-ng full path file name to open for writing or create. Notice
/// that after calling this constructor the file isn't opened yet, so writing packets will fail. For opening the
/// file call open()
Expand Down
Loading
Loading