Skip to content

Latest commit

 

History

History

challenge-2

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

Challenge 2 - Design a Data lake for a High Volume Trading Exchange

Table of Contents

Problem Statement

You are the Architect for a new Big Data based Data Lake system for a High Volume Trading Exchange which processes approx. one million transactions per second. During peek volume it can reach upto five million transaction per second. For ease of use, let's call this system Exchange Store

Create a -

  • efficient, scalable, fault tolerant and highly available system for the Exchange Store
  • data is sourced via the following options
    • realtime message (~400k messages per second) via Kafka
    • Start of day positions via files (~10k files with various formats like CSV, TXT, JSON and XML)
  • reports are generated by End of Day (EOD) via the following options
    • via Kafka topic for consumers who need to process EOD positions and trades
    • EOD feed files (~12k feed files) sent via different mechanisms to consumers (SFTP, Object Store, etc.,)

Use relevant database, technology stack, frameworks and tools for creating an efficient system which processes these huge volumes without much delay. Also if possible mention why would you choose the tool over others.

Hint: Some of these which come to my mind are: Apache Spark, Apache Storm, Apache Flink, Apache Hadoop, Apache HBase, Apache Hive, Amazon EMR, Windows Azure HDInsight, GCP Dataproc, etc.,

Assumptions

  • You need to also consider the maintainability and operational aspects of the deployment too (Observability).

Things to consider

  • You can leverage any non-Cloud platforms or Cloud Platforms (eg, Cloud Foundry, AWS, Azure or GCP) to overlay your deployment diagram and leverage features from these platforms.

Solutions from Community

Name Solution Comments
Name Solution This architecture uses so and so.. This is a sample text.