Skip to content

Commit 61f53af

Browse files
authored
add PMTiles notebook (#91)
* add notebook * make linter happy * add blog link * fix capitalization
1 parent 685a5c7 commit 61f53af

File tree

2 files changed

+351
-0
lines changed

2 files changed

+351
-0
lines changed
Lines changed: 350 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,350 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "7de6fe5f-ec16-47f8-94d1-16aa6ca43ac4",
6+
"metadata": {},
7+
"source": [
8+
"![Wherobots Logo](https://raw.githubusercontent.com/wherobots/wherobots-examples/refs/heads/main/assets/img/header-logo.png)\n",
9+
"\n",
10+
"# Generate PMTiles using Wherobots\n",
11+
"\n",
12+
"This notebook demonstrates how to generate a PMTiles file from the U.S. Census Bureau's TIGER railroad dataset using Wherobots.\n",
13+
"\n",
14+
"This notebook is part of a hands-on project that shows you how to generate and visualize PMTiles. It consists of three parts:\n",
15+
"\n",
16+
"1. [**Blog Post:**](https://wherobots.com/pmtiles-rendered-in-esri-maps-api/) - A quick post that introduces and showcases this capability.\n",
17+
"2. **Jupyter Notebook (This file):** The practical, step-by-step code for generating the PMTiles file.\n",
18+
"3. [**Web Visualization Repo:**](https://github.com/wherobots/pmtiles-esri-tile-layer) - Contains a tile server and the client-side code using the **Esri JavaScript SDK** to render your PMTiles on a basemap.\n",
19+
"\n",
20+
"---\n",
21+
"### What You'll Do in This Notebook:\n",
22+
"\n",
23+
"In the following cells, you will:\n",
24+
"* Download and prepare the TIGER railroad shapefile, uploading it to your Wherobots Managed Storage.\n",
25+
"* Filter the nationwide data for a specific region (Texas) using spatial SQL with Sedona.\n",
26+
"* Generate a PMTiles file with a single command using the Wherobots `vtiles` library.\n",
27+
"* Visualize the resulting map tiles directly within the notebook.\n",
28+
"\n",
29+
"### Cost to generate PMTiles over Texas\n",
30+
"\n",
31+
"* Time taken: **1m 18s**\n",
32+
"* Cost: **$0.16**\n",
33+
"* Runtime size: **Tiny**"
34+
]
35+
},
36+
{
37+
"cell_type": "code",
38+
"execution_count": null,
39+
"id": "237e2a97-07ee-4926-9af4-2ba55d1bac22",
40+
"metadata": {},
41+
"outputs": [],
42+
"source": [
43+
"import os\n",
44+
"import requests\n",
45+
"import zipfile\n",
46+
"import io\n",
47+
"import boto3\n",
48+
"import wkls\n",
49+
"from wherobots import vtiles\n",
50+
"from urllib.parse import urlparse\n",
51+
"from sedona.spark import *\n",
52+
"from pyspark.sql.functions import *"
53+
]
54+
},
55+
{
56+
"cell_type": "markdown",
57+
"id": "d2f5eae3-6273-4ccb-a65c-461eda3ec589",
58+
"metadata": {},
59+
"source": [
60+
"# Download the railroad dataset from TIGER\n",
61+
"\n",
62+
"This piece of code is a helper function that downloads the zipped folder, extracts it, and uploads it to your Managed Storage (S3 bucket).\n",
63+
"\n",
64+
"If the TIGER dataset's FTP server is down, we have mirrored the data in our public S3 bucket:\n",
65+
"\n",
66+
"`s3://wherobots-examples/data/pmtiles-blog/tl_2024_us_rails/`"
67+
]
68+
},
69+
{
70+
"cell_type": "code",
71+
"execution_count": null,
72+
"id": "e4229550-ed59-43b8-a2c2-092dd30f17b8",
73+
"metadata": {},
74+
"outputs": [],
75+
"source": [
76+
"def parse_s3_uri(s3_uri):\n",
77+
" \"\"\"\n",
78+
" Parses an S3 URI (e.g., 's3://bucket-name/folder/path')\n",
79+
" and returns the bucket name and the path.\n",
80+
" \n",
81+
" Args:\n",
82+
" s3_uri (str): The S3 URI string.\n",
83+
" \n",
84+
" Returns:\n",
85+
" tuple: A tuple containing (bucket_name, folder_path).\n",
86+
" \"\"\"\n",
87+
" parsed_uri = urlparse(s3_uri)\n",
88+
" if parsed_uri.scheme != 's3':\n",
89+
" raise ValueError(\"Invalid S3 URI. Must start with 's3://'\")\n",
90+
" return parsed_uri.netloc, parsed_uri.path.lstrip('/')\n",
91+
"\n",
92+
"def download_and_upload_to_s3(zip_url, s3_uri):\n",
93+
" \"\"\"\n",
94+
" Downloads a zip file from a URL using requests, extracts its contents,\n",
95+
" and uploads each file to an S3 bucket specified by an S3 URI.\n",
96+
"\n",
97+
" Args:\n",
98+
" zip_url (str): The URL of the zip file to download.\n",
99+
" s3_uri (str): The S3 URI (e.g., 's3://bucket-name/folder/path')\n",
100+
" where extracted files will be uploaded.\n",
101+
" \"\"\"\n",
102+
" try:\n",
103+
" # Ignore the InsecureRequestWarning when verify=False\n",
104+
" requests.packages.urllib3.disable_warnings(requests.packages.urllib3.exceptions.InsecureRequestWarning)\n",
105+
"\n",
106+
" # 1. Parse the S3 URI\n",
107+
" s3_bucket, s3_path_prefix = parse_s3_uri(s3_uri)\n",
108+
"\n",
109+
" # 2. Download the zip file into memory, ignoring SSL certificate errors\n",
110+
" print(\"Downloading zip file...\")\n",
111+
" response = requests.get(zip_url, verify=False)\n",
112+
" response.raise_for_status()\n",
113+
" \n",
114+
" # 3. Extract and upload each file to S3\n",
115+
" zip_buffer = io.BytesIO(response.content)\n",
116+
" s3_client = boto3.client('s3')\n",
117+
" with zipfile.ZipFile(zip_buffer, 'r') as zip_file:\n",
118+
" file_list = zip_file.namelist()\n",
119+
" print(f\"Found {len(file_list)} files in the zip.\")\n",
120+
" for filename in zip_file.namelist():\n",
121+
" if not filename.endswith('/'):\n",
122+
" with zip_file.open(filename, 'r') as file_in_zip:\n",
123+
" file_buffer = io.BytesIO(file_in_zip.read())\n",
124+
"\n",
125+
" s3_key = f\"{s3_path_prefix}/{filename}\".lstrip('/')\n",
126+
"\n",
127+
" # Upload the file from memory to S3\n",
128+
" print(f\"Uploading {s3_key} to {s3_bucket}...\")\n",
129+
" s3_client.upload_fileobj(file_buffer, s3_bucket, s3_key)\n",
130+
" \n",
131+
" print(\"All files extracted and uploaded to S3 successfully!\")\n",
132+
" \n",
133+
" except requests.exceptions.RequestException as e:\n",
134+
" print(f\"HTTP Request failed: {e}\")\n",
135+
" except zipfile.BadZipFile:\n",
136+
" print(\"The downloaded file is not a valid zip file.\")\n",
137+
" except ValueError as e:\n",
138+
" print(f\"Input error: {e}\")\n",
139+
" except Exception as e:\n",
140+
" print(f\"An error occurred: {e}\")"
141+
]
142+
},
143+
{
144+
"cell_type": "code",
145+
"execution_count": null,
146+
"id": "90660339-7d6e-4467-854a-625ddccd32b9",
147+
"metadata": {},
148+
"outputs": [],
149+
"source": [
150+
"zip_url = 'https://www2.census.gov/geo/tiger/TIGER2024/RAILS/tl_2024_us_rails.zip'\n",
151+
"base_s3_uri = f'{os.getenv(\"USER_S3_PATH\")}PMTiles-example'\n",
152+
"s3_destination_uri = f'{base_s3_uri}/data'"
153+
]
154+
},
155+
{
156+
"cell_type": "code",
157+
"execution_count": null,
158+
"id": "bf7c9ebb-f5f8-460f-90c4-b24f07c007de",
159+
"metadata": {},
160+
"outputs": [],
161+
"source": [
162+
"download_and_upload_to_s3(zip_url, s3_destination_uri)"
163+
]
164+
},
165+
{
166+
"cell_type": "markdown",
167+
"id": "1cbc07c0-7e69-4685-b8f3-edd2df9d3857",
168+
"metadata": {},
169+
"source": [
170+
"## Getting WherobotsDB started\n",
171+
"\n",
172+
"This gives you access to WherobotsDB and PMTiles generator"
173+
]
174+
},
175+
{
176+
"cell_type": "code",
177+
"execution_count": null,
178+
"id": "bd6f9a02-0ec9-45d4-86f7-407f484feda3",
179+
"metadata": {},
180+
"outputs": [],
181+
"source": [
182+
"config = SedonaContext.builder().getOrCreate()\n",
183+
"\n",
184+
"sedona = SedonaContext.create(config)"
185+
]
186+
},
187+
{
188+
"cell_type": "markdown",
189+
"id": "45f42618-2156-485c-a35e-61afd3a65f29",
190+
"metadata": {},
191+
"source": [
192+
"## Read in the files that we downloaded"
193+
]
194+
},
195+
{
196+
"cell_type": "code",
197+
"execution_count": null,
198+
"id": "83ceac98-c680-4af8-be13-f1ca286ec6cd",
199+
"metadata": {},
200+
"outputs": [],
201+
"source": [
202+
"df_rail = sedona.read.format(\"shapeFile\").load(s3_destination_uri)"
203+
]
204+
},
205+
{
206+
"cell_type": "markdown",
207+
"id": "38152e6e-c4b8-4945-8187-6dd9b555f604",
208+
"metadata": {},
209+
"source": [
210+
"## Filter by Texas boundary\n",
211+
"\n",
212+
"Feel free to alter this to some other US state or remove it entirely to get the same experience of the blog.\n",
213+
"\n",
214+
"The code to generate PMTiles on the entire dataset:\n",
215+
"\n",
216+
"```python\n",
217+
"df_rail = df_rail.withColumn(\"layer\", lit(\"railroads\"))\n",
218+
"```\n",
219+
"\n",
220+
"[Click here to learn how to select another state using the `wkls` library.](https://github.com/wherobots/wkls?tab=readme-ov-file#quick-start)"
221+
]
222+
},
223+
{
224+
"cell_type": "code",
225+
"execution_count": null,
226+
"id": "66fd0b5d-0c4d-4ee6-82d3-2c3dfa592e80",
227+
"metadata": {},
228+
"outputs": [],
229+
"source": [
230+
"texas_wkt = wkls.us.tx.wkt()\n",
231+
"\n",
232+
"df_rail = df_rail \\\n",
233+
" .where(f\"ST_Intersects(geometry, ST_GeomFromWKT('{texas_wkt}'))\")\\\n",
234+
" .withColumn(\"layer\", lit(\"railroads\"))"
235+
]
236+
},
237+
{
238+
"cell_type": "code",
239+
"execution_count": null,
240+
"id": "2057fa35-90ee-488f-b35a-c058891674fc",
241+
"metadata": {},
242+
"outputs": [],
243+
"source": [
244+
"df_rail.printSchema()"
245+
]
246+
},
247+
{
248+
"cell_type": "markdown",
249+
"id": "ee060b06-bb09-475c-a129-6b287f4163f9",
250+
"metadata": {},
251+
"source": [
252+
"## FYI about the data\n",
253+
"\n",
254+
"MTFCC stands for MAF/TIGER Feature Class Code and is a code that is assigned by the U.S. Census Bureau to classify and describe geographic objects or features, such as roads, rivers, and railroad tracks. The MTFCC code `R1011` means a Railroad Feature (Main, Spur, or Yard). \n",
255+
"\n",
256+
"LINEARID is a Linear Feature Identifier, a unique ID number used in U.S. Census Bureau TIGER (Topologically Integrated Geographic Encoding and Referencing) data to associate a street or feature name with its location, such as an edge or address range in the spatial data. "
257+
]
258+
},
259+
{
260+
"cell_type": "code",
261+
"execution_count": null,
262+
"id": "a2a61c3d-e299-4d68-b5d9-7dee892c9898",
263+
"metadata": {},
264+
"outputs": [],
265+
"source": [
266+
"df_rail.show()"
267+
]
268+
},
269+
{
270+
"cell_type": "code",
271+
"execution_count": null,
272+
"id": "0517ea33-d386-411b-9bc6-9328ec6e22d5",
273+
"metadata": {},
274+
"outputs": [],
275+
"source": [
276+
"df_rail.select(\"LINEARID\").distinct().count() == df_rail.count()"
277+
]
278+
},
279+
{
280+
"cell_type": "markdown",
281+
"id": "6018d5ab-59a9-45d1-ac1f-baf21c2a5fff",
282+
"metadata": {},
283+
"source": [
284+
"## Generating the PMTiles\n",
285+
"\n",
286+
"A single line of code generates the PMTiles file from the processed DataFrame and saves it directly to your S3 bucket."
287+
]
288+
},
289+
{
290+
"cell_type": "code",
291+
"execution_count": null,
292+
"id": "cb42252d-c977-4224-a2a8-308a66037c3a",
293+
"metadata": {},
294+
"outputs": [],
295+
"source": [
296+
"df_rail.count()"
297+
]
298+
},
299+
{
300+
"cell_type": "code",
301+
"execution_count": null,
302+
"id": "ffbdaa2a-d15f-481a-a779-ef0dc56e6736",
303+
"metadata": {},
304+
"outputs": [],
305+
"source": [
306+
"s3_full_path = f\"{base_s3_uri}/pmtiles/railroads.pmtiles\"\n",
307+
"\n",
308+
"vtiles.generate_pmtiles(df_rail, s3_full_path)"
309+
]
310+
},
311+
{
312+
"cell_type": "markdown",
313+
"id": "11f22174-e707-4777-ac76-15aba426a28b",
314+
"metadata": {},
315+
"source": [
316+
"Alternatively, you can load the PMTiles to [Wherobots hosted PMTiles viewer](https://tile-viewer.wherobots.com/) to visualize it."
317+
]
318+
},
319+
{
320+
"cell_type": "code",
321+
"execution_count": null,
322+
"id": "68bbb9c4-29cf-45b0-8fe0-01fa28fdad38",
323+
"metadata": {},
324+
"outputs": [],
325+
"source": [
326+
"vtiles.show_pmtiles(s3_full_path)"
327+
]
328+
}
329+
],
330+
"metadata": {
331+
"kernelspec": {
332+
"display_name": "Python 3 (ipykernel)",
333+
"language": "python",
334+
"name": "python3"
335+
},
336+
"language_info": {
337+
"codemirror_mode": {
338+
"name": "ipython",
339+
"version": 3
340+
},
341+
"file_extension": ".py",
342+
"mimetype": "text/x-python",
343+
"name": "python",
344+
"nbconvert_exporter": "python",
345+
"pygments_lexer": "ipython3"
346+
}
347+
},
348+
"nbformat": 4,
349+
"nbformat_minor": 5
350+
}

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ will pass as the pre-commit hooks will fix the issues it finds.
2929
| |-- K_Nearest_Neighbor_Join.ipynb
3030
| |-- Local_Outlier_Factor.ipynb
3131
| |-- Object_Detection.ipynb
32+
| |-- PMTiles-railroad.ipynb
3233
| |-- Raster_Classification.ipynb
3334
| |-- Raster_Segmentation.ipynb
3435
| |-- Raster_Text_To_Segments_Airplanes.ipynb

0 commit comments

Comments
 (0)