Skip to content

Commit

Permalink
Added reference with 2022 year
Browse files Browse the repository at this point in the history
  • Loading branch information
canimus committed Jun 21, 2024
1 parent f842302 commit a70dbcc
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 1 deletion.
10 changes: 10 additions & 0 deletions paper/paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,16 @@ @inproceedings{10.1145/2872427.2883029
doi = {10.1145/2872427.2883029}
}

@article{10.3389/fdata.2022.945720,
title={Toward Data Lakes as Central Building Blocks for Data Management and Analysis},
author={Dumke, André R. and Parchmann, Andreas and Schmid, Stefan and Hauswirth, Manfred},
journal={Frontiers in Big Data},
volume={3},
pages={564115},
year={2020},
publisher={Frontiers},
doi={10.3389/fdata.2022.945720}
}

@misc{oreilly2023technology,
title = {Technology Trends for 2023},
Expand Down
2 changes: 1 addition & 1 deletion paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ One last argument in favor of using a quality tool such as `cuallee` is the need


# Data Quality Frameworks
Data platforms have diversified from file systems and relational databases, to full ecosystems including the concept of data lakes. Modern platforms host a variety of data formats, beyond traditional tabular data, including semi-structured like `JSON` [@10.1145/2872427.2883029] or unstructured like audio or images.
Data platforms have diversified from file systems and relational databases, to full ecosystems including the concept of data lakes [@10.3389/fdata.2022.945720]. Modern platforms host a variety of data formats, beyond traditional tabular data, including semi-structured like `JSON` [@10.1145/2872427.2883029] or unstructured like audio or images.

Operating with modern data platforms, requires a versatile data processing framework capable to handle structured and unstructured data, supports data operations in various programming languages, fulfills the imperative and declarative form to data operations from practitioners and does it reliably for any size of data. Apache Spark [@10.1145/2723372.2742797] represents an exemplar framework due to the wide range of data processing capabilities —batch processing, real-time streaming, machine learning, and graph processing—within a unified framework commended and adopted [@oreilly2023technology] by the data industry.

Expand Down
Binary file modified paper/paper.pdf
Binary file not shown.

0 comments on commit a70dbcc

Please sign in to comment.