-
Notifications
You must be signed in to change notification settings - Fork 52
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
chetan thote
authored and
chetan thote
committed
Nov 21, 2024
1 parent
d42a383
commit f12b133
Showing
2 changed files
with
347 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
[meta] | ||
authors=["chetan-thote"] | ||
title="Importing Data from Kafka into SingleStore using Pipelines" | ||
description="""\ | ||
This notebook demonstrates how to create a sample table in SingleStore, set up a pipeline to import data from Kafka topic.""" | ||
difficulty="beginner" | ||
tags=["starter", "loaddata", "kafka"] | ||
lesson_areas=["Ingest"] | ||
icon="database" | ||
destinations=["spaces"] | ||
minimum_tier="free-shared" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,336 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"id": "3b02d847", | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"<div id=\"singlestore-header\" style=\"display: flex; background-color: rgba(235, 249, 245, 0.25); padding: 5px;\">\n", | ||
" <div id=\"icon-image\" style=\"width: 90px; height: 90px;\">\n", | ||
" <img width=\"100%\" height=\"100%\" src=\"https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/header-icons/database.png\" />\n", | ||
" </div>\n", | ||
" <div id=\"text\" style=\"padding: 5px; margin-left: 10px;\">\n", | ||
" <div id=\"badge\" style=\"display: inline-block; background-color: rgba(0, 0, 0, 0.15); border-radius: 4px; padding: 4px 8px; align-items: center; margin-top: 6px; margin-bottom: -2px; font-size: 80%\">SingleStore Notebooks</div>\n", | ||
" <h1 style=\"font-weight: 500; margin: 8px 0 0 4px;\">Importing Data from Kafka into SingleStore using Pipelines</h1>\n", | ||
" </div>\n", | ||
"</div>" | ||
] | ||
}, | ||
{ | ||
"id": "1bee474b", | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"<div class=\"alert alert-block alert-warning\">\n", | ||
" <b class=\"fa fa-solid fa-exclamation-circle\"></b>\n", | ||
" <div>\n", | ||
" <p><b>Note</b></p>\n", | ||
" <p>This notebook can be run on a Free Starter Workspace. To create a Free Starter Workspace navigate to <tt>Start</tt> using the left nav. You can also use your existing Standard or Premium workspace with this Notebook.</p>\n", | ||
" </div>\n", | ||
"</div>" | ||
] | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"<div class=\"alert alert-block alert-warning\">\n", | ||
" <b class=\"fa fa-solid fa-exclamation-circle\"></b>\n", | ||
" <div>\n", | ||
" <p><b>Input Credentials</b></p>\n", | ||
" <p>Define the <b>BOOTSTRAP_SERVER</b>, <b>PORT</b>, <b>TOPIC</b>,<b>SASL_USERNAME</b>,<b>SASL_MECHANISM</b>,<b>SECURITY_PROTOCOL</b>, and <b>SASL_PASSWORD</b> variables below for integration, replacing the placeholder values with your own.</p>\n", | ||
" </div>\n", | ||
"</div>" | ||
], | ||
"id": "46a1d738" | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"BOOTSTRAP_SERVER = 'bootstrap-server-url'\n", | ||
"PORT = kafka-broker-port\n", | ||
"TOPIC = 'kafka-topic'\n", | ||
"SASL_USERNAME = 'username'\n", | ||
"SASL_MECHANISM = 'sasl-mechanism'\n", | ||
"SECURITY_PROTOCOL = 'security-proptocol'\n", | ||
"SASL_PASSWORD = 'password'" | ||
], | ||
"id": "88b1ac9d" | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"id": "64fdd646", | ||
"metadata": {}, | ||
"source": [ | ||
"This notebook demonstrates how to create a sample table in SingleStore, set up a pipeline to import data from an Kafka topic, and run queries on the imported data. It is designed for users who want to integrate Kafka data with SingleStore and explore the capabilities of pipelines for efficient data ingestion." | ||
] | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"<h3>Pipeline Flow Illustration</h3>" | ||
], | ||
"id": "c35b30d7" | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"<img src=https://singlestoreloaddata.s3.ap-south-1.amazonaws.com/images/LoadDataKafka.png width=\"100%\" hight=\"50%\"/>" | ||
], | ||
"id": "979e53c2" | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Creating Table in SingleStore\n", | ||
"\n", | ||
"Start by creating a table that will hold the data imported from Kafka." | ||
], | ||
"id": "9f9d6aa2" | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%%sql\n", | ||
"%%sql\n", | ||
"/* Feel free to change table name and schema */\n", | ||
"\n", | ||
"CREATE TABLE IF NOT EXISTS my_table (\n", | ||
" id INT,\n", | ||
" name VARCHAR(255),\n", | ||
" age INT,\n", | ||
" address TEXT,\n", | ||
" created_at TIMESTAMP\n", | ||
");" | ||
], | ||
"id": "82a48dd0" | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Create a Pipeline to Import Data from Kafka\n", | ||
"\n", | ||
"You'll need to create a pipeline that pulls data from Kafka topic into this table. This example assumes you have a JSON Message in your Kakfa topic.\n", | ||
"\n", | ||
"<i>Ensure that:\n", | ||
"You have access to the Kafka topic.\n", | ||
"Proper IAM roles or access keys are configured in SingleStore.\n", | ||
"The JSON message has a structure that matches the table schema.</i>" | ||
], | ||
"id": "90ad124f" | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Using these identifiers and keys, execute the following statement." | ||
], | ||
"id": "6e401f87" | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%%sql\n", | ||
"CREATE OR REPLACE PIPELINE kafka_import_pipeline AS LOAD DATA KAFKA '{{BOOTSTRAP_SERVER}}:{{PORT}}/{{TOPIC}}'\n", | ||
"CONFIG '{\n", | ||
" \"sasl.username\": \"{{SASL_USERNAME}}\",\n", | ||
" \"sasl.mechanism\": \"{{SASL_MECHANISM}}\",\n", | ||
" \"security.protocol\": \"{{SECURITY_PROTOCOL}}\",\n", | ||
" \"ssl.ca.location\": \"/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem\"\n", | ||
"}'\n", | ||
"CREDENTIALS '{\n", | ||
" \"sasl.password\": \"{{SASL_PASSWORD}}\"\n", | ||
"}'\n", | ||
"INTO TABLE my_table\n", | ||
"FORMAT JSON ;" | ||
], | ||
"id": "0b4f42d6" | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Start the Pipeline\n", | ||
"\n", | ||
"To start the pipeline and begin importing the data from the S3 bucket:" | ||
], | ||
"id": "1d137801" | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%%sql\n", | ||
"START PIPELINE kafka_import_pipeline;" | ||
], | ||
"id": "e94cff73" | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Select Data from the Table\n", | ||
"\n", | ||
"Once the data has been imported, you can run a query to select it:" | ||
], | ||
"id": "094b857c" | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 5, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%%sql\n", | ||
"SELECT * FROM my_table LIMIT 10;" | ||
], | ||
"id": "bc0f7b0c" | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Check if all data of the data is loaded" | ||
], | ||
"id": "669dac71" | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 6, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%%sql\n", | ||
"%%sql\n", | ||
"SELECT count(*) FROM my_table" | ||
], | ||
"id": "a47c2f0f" | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Conclusion\n", | ||
"\n", | ||
"We have shown how to insert data from a Amazon S3 using `Pipelines` to SingleStoreDB. These techniques should enable you to\n", | ||
"integrate your Amazon S3 with SingleStoreDB." | ||
], | ||
"id": "91eae728" | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Clean up\n", | ||
"\n", | ||
"Remove the '#' to uncomment and execute the queries below to clean up the pipeline and table created." | ||
], | ||
"id": "6dc86514" | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"#### Drop Pipeline" | ||
], | ||
"id": "706ccd4c" | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 7, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%%sql\n", | ||
"#STOP PIPELINE kafka_import_pipeline;\n", | ||
"\n", | ||
"#DROP PIPELINE kafka_import_pipeline;" | ||
], | ||
"id": "ed7dc33a" | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"#### Drop Data" | ||
], | ||
"id": "b5e15411" | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 8, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%%sql\n", | ||
"#DROP TABLE my_table;" | ||
], | ||
"id": "f8f3d6ef" | ||
}, | ||
{ | ||
"id": "12d50a52", | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"<div id=\"singlestore-footer\" style=\"background-color: rgba(194, 193, 199, 0.25); height:2px; margin-bottom:10px\"></div>\n", | ||
"<div><img src=\"https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/singlestore-logo-grey.png\" style=\"padding: 0px; margin: 0px; height: 24px\"/></div>" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"jupyterlab": { | ||
"notebooks": { | ||
"version_major": 6, | ||
"version_minor": 4 | ||
} | ||
}, | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.11.6" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |