Skip to content

Commit

Permalink
[fix](parquet-reader) Fixed the issue of excessive scanning data in l…
Browse files Browse the repository at this point in the history
…ate materialization‌ case of parquet reader (#46121)

### What problem does this PR solve?

Related PR: #40641

Problem Summary:

[Fix](parquet-reader) Fixed the issue of excessive scanning data in late
materialization‌ case of parquet reader introduced by #40641 in
scenarios with particularly high filtering rates.
  • Loading branch information
kaka11chen authored Dec 30, 2024
1 parent d139934 commit 0348b33
Showing 1 changed file with 6 additions and 4 deletions.
10 changes: 6 additions & 4 deletions be/src/vec/exec/format/parquet/vparquet_group_reader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -522,16 +522,18 @@ Status RowGroupReader::_do_lazy_read(Block* block, size_t batch_size, size_t* re
Block::erase_useless_column(block, origin_column_num);

if (!pre_eof) {
if (pre_raw_read_rows >= config::doris_scanner_row_num) {
break;
}
// If continuous batches are skipped, we can cache them to skip a whole page
_cached_filtered_rows += pre_read_rows;
if (pre_raw_read_rows >= config::doris_scanner_row_num) {
*read_rows = 0;
_convert_dict_cols_to_string_cols(block);
return Status::OK();
}
} else { // pre_eof
// If filter_map_ptr->filter_all() and pre_eof, we can skip whole row group.
*read_rows = 0;
*batch_eof = true;
_lazy_read_filtered_rows += pre_read_rows;
_lazy_read_filtered_rows += (pre_read_rows + _cached_filtered_rows);
_convert_dict_cols_to_string_cols(block);
return Status::OK();
}
Expand Down

0 comments on commit 0348b33

Please sign in to comment.