Skip to content

Latest commit

 

History

History

dlt_modern_stuff

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

dlt_modern_stuff

This directory contains a source code that demonstrates use of latest Delta Live Tables (DLT) features for cybersecurity use cases. You can find more information in the blog post.

In general, this project consists of three DLT pipelines that perform data ingestion, normalization to Open Cybersecurity Schema Framework (OCSF), and doing a rudimentary detections against normalized data as it's shown on the image below:

  1. Ingestion of Apache Web and Nginx logs into apache_web table and then normalizing it into a table corresponding to OCSF's HTTP activity.
  2. Ingestion of Zeek data:
  • Zeek HTTP data into zeek_http table, and then normalizing it into an http table corresponding to OCSF's HTTP activity.
  • Zeek Conn data into zeek_conn table, and then normalizing it into a network table corresponding to OCSF's Network activity.
  1. Detection pipeline that does the following:
  • Matches network connections data from network table against iocs table.
  • Checks HTTP logs from http table for admin pages scans from external parties.
  • All matches are stored in the detections table, and optionally pushed to EventHubs and/or Splunk.

Implemented pipelines

Setting up & running

Important

This bundle uses Serverless compute, so make sure that it's enabled for your workspace. If it's not, then you need to adjust parameters of the job and DLT pipelines!

  1. Install the latest version of Databricks CLI.

  2. Authenticate to your Databricks workspace, if you have not done so already:

databricks configure
  1. Set workspace URL and configure necessary variables in the dev profile of databricks.yml file. You need to specify the following:
  • catalog_name - the name of the default UC Catalog used in configuration.
  • silver_schema_name - the name of an existing UC Schema to put processed data of individual log sources.
  • normalized_schema_name - the name of an existing UC Schema to put tables with normalized data, IoCs and Detections tables.
  • log_files_path - the path to an existing UC Volume where raw log data will be stored.
  1. To deploy a development copy of this project, type:
databricks bundle deploy
  1. Run a job to set up the normalized tables and download sample log files:
databricks bundle run dlt_cyber_demo_setup
  1. Run DLT pipelines to ingest data:
databricks bundle run demo_ingest_zeek_data
databricks bundle run demo_ingest_apache_data
  1. Run DLT pipeline that emulates detections against normalized data:
databricks bundle run demo_detections