Skip to content

Commit

Permalink
Added notebooks for load data (#103)
Browse files Browse the repository at this point in the history
* Create load-CSV-data-S3

* Added notebooks for Load data sections of UI

* Modified with suggested changes

* Modified with suggested changes

* Remove extra header

---------

Co-authored-by: chetan thote <[email protected]>
Co-authored-by: Kevin D Smith <[email protected]>
  • Loading branch information
3 people authored Jul 11, 2024
1 parent e2becae commit 2540ebf
Show file tree
Hide file tree
Showing 5 changed files with 791 additions and 0 deletions.
4 changes: 4 additions & 0 deletions authors/chetan-thote.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
name="Chetan Thote"
title="Product Team"
image="singlestore"
external=false
11 changes: 11 additions & 0 deletions notebooks/load-csv-data-s3/meta.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[meta]
authors=["chetan-thote"]
title="Sales Data Analysis Dataset From Amazon S3"
description="""\
The Sales Data Analysis use case demonstrates how to utilize Singlestore's powerful querying capabilities to analyze sales data stored in a CSV file."""
difficulty="beginner"
tags=["starter", "loaddata", "s3"]
lesson_areas=["Ingest"]
icon="database"
destinations=["spaces"]
minimum_tier="free-shared"
360 changes: 360 additions & 0 deletions notebooks/load-csv-data-s3/notebook.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,360 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "97f96c34-81a9-495a-a55d-c565695e87f0",
"metadata": {},
"source": [
"<div id=\"singlestore-header\" style=\"display: flex; background-color: rgba(235, 249, 245, 0.25); padding: 5px;\">\n",
" <div id=\"icon-image\" style=\"width: 90px; height: 90px;\">\n",
" <img width=\"100%\" height=\"100%\" src=\"https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/header-icons/database.png\" />\n",
" </div>\n",
" <div id=\"text\" style=\"padding: 5px; margin-left: 10px;\">\n",
" <div id=\"badge\" style=\"display: inline-block; background-color: rgba(0, 0, 0, 0.15); border-radius: 4px; padding: 4px 8px; align-items: center; margin-top: 6px; margin-bottom: -2px; font-size: 80%\">SingleStore Notebooks</div>\n",
" <h1 style=\"font-weight: 500; margin: 8px 0 0 4px;\">Sales Data Analysis Dataset From Amazon S3</h1>\n",
" </div>\n",
"</div>"
]
},
{
"cell_type": "markdown",
"id": "612bd378-f145-42f1-b8ce-32557a4c00cd",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-warning\">\n",
" <b class=\"fa fa-solid fa-exclamation-circle\"></b>\n",
" <div>\n",
" <p><b>Note</b></p>\n",
" <p>This notebook can be run on a Free Starter Workspace. To create a Free Starter Workspace navigate to <tt>Start</tt> using the left nav. You can also use your existing Standard or Premium workspace with this Notebook.</p>\n",
" </div>\n",
"</div>"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "481ce5ae-2ee0-4b63-b3f3-a4b53a5bc381",
"metadata": {},
"source": [
"The Sales Data Analysis use case demonstrates how to utilize Singlestore's powerful querying capabilities to analyze sales data stored in a CSV file. This demo showcases typical operations that businesses perform to gain insights from their sales data, such as calculating total sales, identifying top-selling products, and analyzing sales trends over time. By working through this example, new users will learn how to load CSV data into Singlestore, execute aggregate functions, and perform time-series analysis, which are essential skills for leveraging the full potential of Singlestore in a business intelligence context."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "72fe6854-5b6e-4b79-a2d0-79bda0e18429",
"metadata": {},
"source": [
"<h3>Demo Flow</h3>"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "5ed26ab8-1217-4fbd-be0c-4e7728314671",
"metadata": {},
"source": [
"<img src=https://singlestoreloaddata.s3.ap-south-1.amazonaws.com/images/LoadDataCSV.png width=\"100%\" hight=\"50%\"/>"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "46fb95a8-1402-4b97-b04a-560741f96181",
"metadata": {},
"source": [
"## How to use this notebook"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "a701cd90-dd42-4a06-b7a1-e0a2132af558",
"metadata": {},
"source": [
"<img src=https://singlestoreloaddata.s3.ap-south-1.amazonaws.com/images/notebookuse.gif width=\"75%\" hight=\"50%\"/>"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "2d22fd53-2c18-40e5-bb38-6d8ebc06f1b8",
"metadata": {},
"source": [
"## Create a database\n",
"\n",
"We need to create a database to work with in the following examples."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "1624ccea-0c15-4048-ab2a-fe2178e5912a",
"metadata": {},
"outputs": [],
"source": [
"shared_tier_check = %sql show variables like 'is_shared_tier'\n",
"if not shared_tier_check or shared_tier_check[0][1] == 'OFF':\n",
" %sql DROP DATABASE IF EXISTS SalesAnalysis;\n",
" %sql CREATE DATABASE SalesAnalysis;"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "901e6ec1-2530-497a-857e-7973bb9714f1",
"metadata": {},
"source": [
"<h3>Create Table</h3>"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "7ac4285d-0d2d-44ec-8b1e-eef7b4f9358c",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"CREATE TABLE `SalesData` (\n",
" `Date` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,\n",
" `Store_ID` bigint(20) DEFAULT NULL,\n",
" `ProductID` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,\n",
" `Product_Name` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,\n",
" `Product_Category` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,\n",
" `Quantity_Sold` bigint(20) DEFAULT NULL,\n",
" `Price` float DEFAULT NULL,\n",
" `Total_Sales` float DEFAULT NULL\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "1de959eb-4f17-45d4-af74-42f45684d67b",
"metadata": {},
"source": [
"<h3>Load Data Using Pipelines</h3>"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "84f592b8-a12e-41d8-bff0-fe96175992b9",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"CREATE PIPELINE SalesData_Pipeline AS\n",
"LOAD DATA S3 's3://singlestoreloaddata/SalesData/sales_data.csv'\n",
"CONFIG '{ \\\"region\\\": \\\"ap-south-1\\\" }'\n",
"/*\n",
"CREDENTIALS '{\"aws_access_key_id\": \"<access key id>\",\n",
" \"aws_secret_access_key\": \"<access_secret_key>\"}'\n",
" */\n",
"INTO TABLE SalesData\n",
"FIELDS TERMINATED BY ','\n",
"LINES TERMINATED BY '\\r\\n'\n",
"IGNORE 1 lines;\n",
"\n",
"\n",
"START PIPELINE SalesData_Pipeline;"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "352e340a-a613-4ec5-94a5-c4e1f3565757",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT * FROM SalesData LIMIT 10"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "4508d431-7683-4ac9-a4e8-d939c47dd1fc",
"metadata": {},
"source": [
"<h3>Sample Queries</h3>\n",
"\n",
"We will try to execute some Analytical Queries"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "55ac6134-976c-4f27-bc2b-140835b64f13",
"metadata": {},
"source": [
"<b>Top-Selling Products"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "d666c04b-ccb0-47cc-a1e7-efaa7a590d27",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT product_name, SUM(quantity_sold) AS total_quantity_sold FROM SalesData\n",
" GROUP BY product_name ORDER BY total_quantity_sold DESC LIMIT 5;"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "87c36700-0db8-405f-97c0-e13a6a2ae0cb",
"metadata": {},
"source": [
"<b>Sales Trends Over Time"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "b46d72c7-07a3-4e23-8fe4-c238b5517ef6",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT date, SUM(total_sales) AS total_sales FROM SalesData\n",
"GROUP BY date ORDER BY total_sales desc limit 5;"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "e6c232a1-acce-4d25-aebd-1a89aafba47d",
"metadata": {},
"source": [
"<b>Total Sales by Store"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "af571f6c-0145-4466-9ed7-000d37e4738f",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT Store_ID, SUM(total_sales) AS total_sales FROM SalesData\n",
"GROUP BY Store_ID ORDER BY total_sales DESC limit 5;"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "9bf1d7f3-c636-4ac0-b2be-e48eaca747ef",
"metadata": {},
"source": [
"<b>Sales Contribution by Product (Percentage)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "5613b3e8-72d2-48dc-a7ae-47911df24cd2",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT product_name, SUM(total_sales) * 100.0 / (SELECT SUM(total_sales) FROM SalesData) AS sales_percentage FROM SalesData\n",
" GROUP BY product_name ORDER BY sales_percentage DESC limit 5;"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "afed201d-d9f2-49cc-8a14-df35103abd4e",
"metadata": {},
"source": [
"<b>Top Days with Highest Sale</b>"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "7fd8d785-7861-4570-88b3-0185c2c9c298",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT date, SUM(total_sales) AS total_sales FROM SalesData\n",
" GROUP BY date ORDER BY total_sales DESC LIMIT 5;"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "6738b6e4-5e8b-45db-b3dc-ebcb73bcf629",
"metadata": {},
"source": [
"## Conclusion\n",
"\n",
"<div class=\"alert alert-block alert-warning\">\n",
" <b class=\"fa fa-solid fa-exclamation-circle\"></b>\n",
" <div>\n",
" <p><b>Action Required</b></p>\n",
" <p> If you created a new database in your Standard or Premium Workspace, you can drop the database by running the cell below. Note: this will not drop your database for Free Starter Workspaces. To drop a Free Starter Workspace, terminate the Workspace using the UI. </p>\n",
" </div>\n",
"</div>\n",
"\n",
"We have shown how to insert data from a Amazon S3 using `Pipelines` to SingleStoreDB. These techniques should enable you to\n",
"integrate your Amazon S3 with SingleStoreDB."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "d5053a52-5579-4fea-9594-5250f6fcc289",
"metadata": {},
"outputs": [],
"source": [
"shared_tier_check = %sql show variables like 'is_shared_tier'\n",
"if not shared_tier_check or shared_tier_check[0][1] == 'OFF':\n",
" %sql DROP DATABASE IF EXISTS SalesAnalysis;"
]
},
{
"cell_type": "markdown",
"id": "2dcc585a-43c2-4598-93bf-888143dd5e29",
"metadata": {},
"source": [
"<div id=\"singlestore-footer\" style=\"background-color: rgba(194, 193, 199, 0.25); height:2px; margin-bottom:10px\"></div>\n",
"<div><img src=\"https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/singlestore-logo-grey.png\" style=\"padding: 0px; margin: 0px; height: 24px\"/></div>"
]
}
],
"metadata": {
"jupyterlab": {
"notebooks": {
"version_major": 6,
"version_minor": 4
}
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
12 changes: 12 additions & 0 deletions notebooks/load-data-kakfa/meta.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
[meta]
authors=["chetan-thote"]
title="Real-Time Event Monitoring Dataset From Kafka"
description="""\
The Real-Time Event Monitoring use case illustrates how to leverage Singlestore's capabilities to process and analyze streaming data from a Kafka data source.
"""
difficulty="beginner"
tags=["starter", "loaddata", "kafka"]
lesson_areas=["Ingest"]
icon="database"
destinations=["spaces"]
minimum_tier="free-shared"
Loading

0 comments on commit 2540ebf

Please sign in to comment.