Add NDJSON support for logging and profiling #3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add NDJSON Logging Format with Optimized Structured Logging
Summary
Adds NDJSON (Newline Delimited JSON) as a new logging format alongside CSV. Implements direct Value→JSON conversion for efficient structured logging, a helper function to reduce boilerplate, and maintains 100% backward compatibility.
Key Features
1. NDJSON Format Support
NDJSONLogStorageclass for NDJSON output formatSET log_storage = 'ndjson'SET profiler_format = 'ndjson'2. Direct Value→JSON Conversion
Added
WriteLogEntryStructured()API to LogStorage hierarchy:NDJSONLogStorageconverts DuckDBValueobjects directly to JSONValueToJSON()helper handles STRUCT, LIST, MAP, and all primitive types3. Structured Message Constructors
Added
ConstructLogMessageValue()methods to all log types:4. HTTPFS Stats in JSON/NDJSON Profiling (Infrastructure)
Added infrastructure for structured HTTPFS statistics in JSON/NDJSON profiler output:
WriteProfilingInformationToJSON()virtual method inClientContextStatebase classquery_tree,EXPLAIN ANALYZE)Note: The HTTPFS extension implementation is in a separate PR: duckdb/duckdb-httpfs#158
When httpfs PR merges, structured metrics will include:
total_bytes_received- Total bytes downloaded via HTTP(S)total_bytes_sent- Total bytes uploaded via HTTP(S)head_count- Number of HTTP HEAD requestsget_count- Number of HTTP GET requestsput_count- Number of HTTP PUT requestspost_count- Number of HTTP POST requestsdelete_count- Number of HTTP DELETE requestsBenefits:
Implementation Details
Call Sites Updated (6 locations)
Migrated to direct structured API using helper:
Backward Compatibility
✅ 100% backward compatible:
DUCKDB_LOGmacro calls work unchangedWriteLogEntryStructured()falls back to.ToString()for CSVDual-Path Architecture
Direct Structured API (PhysicalOperator, Checkpoint, HTTP logs)
Value → JSON → NDJSONBuffered Path with JSON Parsing (FileSystem, Metrics logs)
FlushChunk()parses JSON strings via yyjsonValue → ConstructLogMessage() → JSON string → yyjson parse → NDJSONTesting
All tests pass (274 total assertions):
logging_ndjson.test- 83 assertions (NEW)logging_csv.test- 182 assertionstest_enable_profile.test- 9 assertionsCoverage includes all log types (FileSystem, Query, Metrics, HTTP, PhysicalOperator, Checkpoint) in both formats.
New Files
test/sql/logging/logging_ndjson.test- Comprehensive NDJSON test suiteCore Infrastructure
src/include/duckdb/logging/log_storage.hpp- WriteLogEntryStructured APIsrc/logging/log_storage.cpp- ValueToJSON implementation, NDJSONLogStoragesrc/include/duckdb/logging/log_manager.hpp- LogStructured helpersrc/logging/log_manager.cpp- Helper implementationsrc/include/duckdb/logging/log_type.hpp- ConstructLogMessageValue declarationssrc/logging/log_types.cpp- ConstructLogMessageValue implementationssrc/include/duckdb/logging/logger.hpp- GetLogManager() accessorCall Sites
src/execution/operator/join/physical_hash_join.cppsrc/execution/join_hashtable.cppsrc/storage/table/row_group_collection.cppsrc/main/http/http_util.cppProfiling Support
src/main/query_profiler.cpp- NDJSON format support, extension stats integrationsrc/include/duckdb/common/enums/profiler_format.hpp- NDJSON enumsrc/main/settings/custom_settings.cpp- Settings integrationsrc/include/duckdb/main/client_context_state.hpp- WriteProfilingInformationToJSON API for extensionsUsage
Enable NDJSON Logging
NDJSON Profiler Output
JSON Profiler Output with HTTPFS Stats
Example NDJSON Logging Output
{"timestamp":"2025-01-15T10:30:45.123456","level":"INFO","type":"physical_operator","message":{"operator":"PhysicalHashJoin","event":"Finalize","external":"false"},"scope":"THREAD","connection_id":1,"transaction_id":2,"query_id":3} {"timestamp":"2025-01-15T10:30:45.234567","level":"INFO","type":"checkpoint","message":{"database":"main","table":"my_table","task":"vacuum","segment_idx":0,"merge_count":3},"scope":"DATABASE"} {"timestamp":"2025-01-15T10:30:45.345678","level":"INFO","type":"http","message":{"method":"GET","url":"https://example.com/data","status_code":200,"bytes":1024},"scope":"DATABASE"}Example JSON Profiler Output with httpfs extension stats
Note: The example below shows
httpfs_statswhich will be available after duckdb-httpfs#158 merges. The infrastructure to support this is included in this PR.{ "query_name": "SELECT * FROM 'https://example.com/data.parquet' LIMIT 1;", "latency": 0.059454625, "rows_returned": 1, "total_bytes_read": 10308, "httpfs_stats": { "total_bytes_received": 10308, "total_bytes_sent": 0, "head_count": 1, "get_count": 1, "put_count": 0, "post_count": 0, "delete_count": 0 }, "children": [ { "operator_type": "STREAMING_LIMIT", "operator_timing": 0.000001, "children": [ { "operator_type": "TABLE_SCAN", "operator_name": "PARQUET_SCAN", "operator_timing": 0.000051, "extra_info": { "Function": "PARQUET_SCAN", "Total Files Read": "1" } } ] } ] }