Skip to content

Conversation

@LaurenceJJones
Copy link
Member

@LaurenceJJones LaurenceJJones commented Oct 27, 2025

This PR is mainly aimed at supporting a looser key value parsing for Unifi CEF lines (even though the tests have full log line in practice we will parse out the message so it more aimed). Our current implementation for key value only parses:

  • foo=bar
  • bar="foo bar"

Overview

This PR introduces ParseKVLax, a new key-value parser that complements the existing ParseKV function. While ParseKV maintains strict regex-based parsing for backward compatibility, ParseKVLax uses a scanner-based approach to support unquoted values with spaces and other complex log formats.

Changes

New Function: ParseKVLax

  • Scanner-based approach: Finds valid key= patterns and determines value boundaries intelligently
  • Supports multi-word values: UNIFIhost=Express 7 port=443 now correctly parses as {"UNIFIhost": "Express 7", "port": "443"}
  • Robust quote handling: Properly filters out key= patterns inside quoted values
  • Escape sequence support: Handles escaped quotes (\") and backslashes (\\)

Key Features

  • Backward Compatible: Original ParseKV unchanged - all existing code continues to work
  • Opt-in: Use ParseKVLax only where needed for loose parsing
  • Well-tested: 13 comprehensive test cases covering edge cases
  • Production-ready: Handles CEF logs, iptables output, keycloak JSON values, etc.

Use Cases

  • CEF logs: msg=User login successful → captures full phrase
  • iptables: RES=0x00 SYN URGP=0 → captures flags as part of values
  • UniFi: UNIFIhost=Express 7 → captures version as part of value

@github-actions
Copy link

@LaurenceJJones: There are no 'kind' label on this PR. You need a 'kind' label to generate the release automatically.

  • /kind feature
  • /kind enhancement
  • /kind refactoring
  • /kind fix
  • /kind chore
  • /kind dependencies
Details

I am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the BirthdayResearch/oss-governance-bot repository.

@github-actions
Copy link

@LaurenceJJones: There are no area labels on this PR. You can add as many areas as you see fit.

  • /area agent
  • /area local-api
  • /area cscli
  • /area appsec
  • /area security
  • /area configuration
Details

I am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the BirthdayResearch/oss-governance-bot repository.

@codecov
Copy link

codecov bot commented Oct 27, 2025

Codecov Report

❌ Patch coverage is 83.33333% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 62.71%. Comparing base (e40f284) to head (89218fb).
⚠️ Report is 30 commits behind head on master.

Files with missing lines Patch % Lines
pkg/exprhelpers/helpers.go 83.33% 7 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4007      +/-   ##
==========================================
+ Coverage   62.67%   62.71%   +0.04%     
==========================================
  Files         410      410              
  Lines       32943    33003      +60     
==========================================
+ Hits        20646    20698      +52     
- Misses      10173    10177       +4     
- Partials     2124     2128       +4     
Flag Coverage Δ
bats 46.26% <0.00%> (-0.03%) ⬇️
unit-linux 35.35% <83.33%> (+0.06%) ⬆️
unit-windows 24.82% <83.33%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Replace regex-based parsing with scanner approach for better handling of complex key-value pairs
- Add support for unquoted values containing spaces (e.g., UNIFIhost=Express 7)
- Maintain backward compatibility with existing quoted and simple unquoted values
- Add robust filtering to prevent false positives from invalid key patterns
- Improve quote handling and escaping for quoted values
- Add comprehensive test cases covering edge cases and mixed scenarios

Fixes parsing issues with CEF logs and other formats where values contain spaces without quotes.
@LaurenceJJones LaurenceJJones force-pushed the improve-parsekv-broad-kv-pairs branch from 4303a62 to 0f4a627 Compare October 27, 2025 12:26
@LaurenceJJones LaurenceJJones changed the title feat: improve ParseKV function to handle unquoted values with spaces feat: Add ParseKVLax for Flexible Key-Value Parsing Oct 27, 2025
@LaurenceJJones
Copy link
Member Author

linked too: crowdsecurity/hub#940

@LaurenceJJones LaurenceJJones requested a review from blotus October 28, 2025 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/agent kind/enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant