This directory contains a source code that demonstrates use of latest Delta Live Tables (DLT) features for cybersecurity use cases. You can find more information in the blog post.
In general, this project consists of three DLT pipelines that perform data ingestion, normalization to Open Cybersecurity Schema Framework (OCSF), and doing a rudimentary detections against normalized data as it's shown on the image below:
- Ingestion of Apache Web and Nginx logs into
apache_web
table and then normalizing it into a table corresponding to OCSF's HTTP activity. - Ingestion of Zeek data:
- Zeek HTTP data into
zeek_http
table, and then normalizing it into anhttp
table corresponding to OCSF's HTTP activity. - Zeek Conn data into
zeek_conn
table, and then normalizing it into anetwork
table corresponding to OCSF's Network activity.
- Detection pipeline that does the following:
- Matches network connections data from
network
table againstiocs
table. - Checks HTTP logs from
http
table for admin pages scans from external parties. - All matches are stored in the
detections
table, and optionally pushed to EventHubs and/or Splunk.
Important
This bundle uses Serverless compute, so make sure that it's enabled for your workspace. If it's not, then you need to adjust parameters of the job and DLT pipelines!
-
Install the latest version of Databricks CLI.
-
Authenticate to your Databricks workspace, if you have not done so already:
databricks configure
- Set workspace URL and configure necessary variables in the
dev
profile ofdatabricks.yml
file. You need to specify the following:
catalog_name
- the name of the default UC Catalog used in configuration.silver_schema_name
- the name of an existing UC Schema to put processed data of individual log sources.normalized_schema_name
- the name of an existing UC Schema to put tables with normalized data, IoCs and Detections tables.log_files_path
- the path to an existing UC Volume where raw log data will be stored.
- To deploy a development copy of this project, type:
databricks bundle deploy
- Run a job to set up the normalized tables and download sample log files:
databricks bundle run dlt_cyber_demo_setup
- Run DLT pipelines to ingest data:
databricks bundle run demo_ingest_zeek_data
databricks bundle run demo_ingest_apache_data
- Run DLT pipeline that emulates detections against normalized data:
databricks bundle run demo_detections