Skip to content

Commit

Permalink
feat: reference server setup (#1)
Browse files Browse the repository at this point in the history
* feat: implement delta sharing protocol up to reading data from table

* docs: add PR template

* refactor: extract delta sharing to router

* refactor: extract forward request logic

* fix: handle case if body is null
  • Loading branch information
Kenneth Domingo authored Nov 13, 2023
1 parent b9210f1 commit fdd8131
Show file tree
Hide file tree
Showing 22 changed files with 1,728 additions and 205 deletions.
39 changes: 39 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
## What type of PR is this?

- `build`: Commits that affect build components like build tool, dependencies, project
version
- `chore`: Miscellaneous commits (e.g. modifying `.gitignore`)
- `ci`: Commits are special `build` commits that affect the CI/CD pipeline
- `docs`: Commits that affect documentation only
- `feat`: Commits that add a new feature
- `fix`: Commits that fix a bug
- `perf`: Commits are special `refactor` commits that improve performance
- `refactor`: Commits that rewrite/restructure your code, however does not change any
behaviour
- `revert`: Commits that revert another commit/PR, usually can be autogenerated on
GitHub or using `git revert`
- `style`: Commits are special `refactor` commits that edit the code to comply with a
code style, linter, or formatter
- `test`: Commits that add missing tests or correcting existing tests

## Summary

What does this PR do

## How to test

1. Instructions on how to test
2. Specify which files to review
3. etc.

## Link to Jira/Asana/Airtable task (if applicable)

_placeholder_

## Wireframe screenshot/screencap (if applicable)

_placeholder_

## Implementation screenshot/screencap (if applicable)

_placeholder_
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -160,4 +160,5 @@ cython_debug/
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

.task/
conf/
19 changes: 19 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
FROM python:3.11

ENV PYTHONUNBUFFERED 1
ENV PYTHONDONTWRITEBYTECODE 1
ARG POETRY_VERSION=1.6.1

RUN pip install "poetry==$POETRY_VERSION" && \
poetry config virtualenvs.create false && \
poetry config installer.max-workers 4

WORKDIR /tmp

COPY pyproject.toml poetry.lock ./

RUN poetry install

WORKDIR /app

CMD [ "uvicorn", "main:app", "--host", "0.0.0.0", "--port", "5000", "--reload" ]
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Delta Sharing POC
# Giga Data Sharing

## Prerequisites

Expand Down
13 changes: 10 additions & 3 deletions Taskfile.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,14 @@ dotenv:
- .env

vars:
NAME: giga-dataops-platform_data-sharing
NAME: giga-dataops_data-sharing

tasks:
default:
desc: Build and start Docker containers
cmds:
- task: config
- docker compose --project-name giga-dataops-platform --file docker-compose-network.yaml up --detach --build --remove-orphans {{.CLI_ARGS}}
- docker compose --project-name {{.NAME}} up --detach --build --remove-orphans {{.CLI_ARGS}}

setup:
Expand All @@ -23,13 +24,18 @@ tasks:

config:
desc: Generate config files
sources:
- ./conf-template/*
- ./.env
generates:
- ./conf/*
cmds:
- mkdir -p conf
- >
sed 's!{{`{{.SAS_TOKEN}}`}}!{{.SAS_TOKEN}}!g'
sed 's|{{`{{.STORAGE_ACCESS_KEY}}`}}|{{.STORAGE_ACCESS_KEY}}|'
conf-template/core-site.xml > conf/core-site.xml
- >
sed 's!{{`{{.DELTA_BEARER_TOKEN}}`}}!{{.DELTA_BEARER_TOKEN}}!g'
sed 's!{{`{{.DELTA_BEARER_TOKEN}}`}}!{{.DELTA_BEARER_TOKEN}}!'
conf-template/delta-sharing-server-config.yml > conf/delta-sharing-server-config.yml
logs:
Expand Down Expand Up @@ -57,3 +63,4 @@ tasks:
desc: Remove containers
cmds:
- docker compose --project-name {{.NAME}} down --volumes --remove-orphans {{.CLI_ARGS}}
- docker compose --project-name giga-dataops-platform --file docker-compose-network.yaml down --volumes --remove-orphans {{.CLI_ARGS}}
8 changes: 2 additions & 6 deletions conf-template/core-site.xml
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,7 @@
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.azure.account.auth.type.tmgigasandboxadlsg2.dfs.core.windows.net</name>
<value>SharedKey</value>
</property>
<property>
<name>fs.azure.account.key.tmgigasandboxadlsg2.dfs.core.windows.net</name>
<value>{{.SAS_TOKEN}}</value>
<name>fs.azure.account.key.tmgigasandboxadlsg2.blob.core.windows.net</name>
<value>{{.STORAGE_ACCESS_KEY}}</value>
</property>
</configuration>
20 changes: 11 additions & 9 deletions conf-template/delta-sharing-server-config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,24 @@
version: 1
# Config shares/schemas/tables to share
shares:
- name: "qos"
- name: "gold"
id: "4bfa74c2-f1af-44e8-bc42-205d4d92dd15"
schemas:
- name: "proco-api"
- name: "fake-school-geolocation"
tables:
- name: "uz"
- name: "delta-test"
# Azure Blob Storage. See https://github.com/delta-io/delta-sharing#azure-blob-storage for how to config the credentials
location: "wasbs://[email protected]/qos/proco-api/2023-10-23_Proco-API_QoS"
id: "458dc4db-666e-41e6-8c12-3f025b060a9b"
location: "wasbs://[email protected]/fake-gold/delta-spark"
id: "093f2fe7-fdc8-4438-85dd-6da7610d0e74"
cdfEnabled: true

# Set the host name that the server will use
host: "localhost"
# Set the port that the server will listen on. Note: using ports below 1024
# may require a privileged user in some operating systems.
port: 8890
# Set the url prefix for the REST APIs
endpoint: "/delta-sharing"
endpoint: "/sharing"
# Set the timeout of S3 presigned url in seconds
preSignedUrlTimeoutSeconds: 3600
# How many tables to cache in the server
Expand All @@ -26,11 +28,11 @@ deltaTableCacheSize: 10
# static tables that will never be changed.
stalenessAcceptable: false
# Whether to evaluate user provided `predicateHints`
evaluatePredicateHints: false
evaluatePredicateHints: true
# Whether to evaluate user provided `jsonPredicateHints`
evaluateJsonPredicateHints: false
evaluateJsonPredicateHints: true
# Whether to evaluate user provided `jsonPredicateHints` for V2 predicates.
evaluateJsonPredicateHintsV2: false
evaluateJsonPredicateHintsV2: true
# The maximum page size permitted by queryTable/queryTableChanges API.
queryTablePageSizeLimit: 10000
# The TTL of the page token generated in queryTable/queryTableChanges API (in milliseconds).
Expand Down
31 changes: 31 additions & 0 deletions data_sharing/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import ORJSONResponse

from data_sharing.constants import __version__
from data_sharing.routers import delta_sharing
from data_sharing.settings import settings

app = FastAPI(
title="Giga Data Sharing",
version=__version__,
docs_url="/",
redoc_url="/redoc",
default_response_class=ORJSONResponse,
)

app.add_middleware(
CORSMiddleware,
allow_origins=settings.CORS_ALLOWED_ORIGINS,
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)


@app.get("", tags=["core"])
async def health_check():
return {"status": "ok"}


app.include_router(delta_sharing.router)
25 changes: 25 additions & 0 deletions data_sharing/constants.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
import tomllib
from functools import lru_cache

from pydantic_settings import BaseSettings

from data_sharing.settings import settings


class Constants(BaseSettings):
pass


@lru_cache
def get_constants():
return Constants()


@lru_cache
def get_app_version():
with open(settings.BASE_DIR / "pyproject.toml", "rb") as f:
return tomllib.load(f)["tool"]["poetry"]["version"]


constants = get_constants()
__version__ = get_app_version()
13 changes: 13 additions & 0 deletions data_sharing/permissions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
from secrets import compare_digest

from fastapi import Depends, HTTPException, status
from fastapi.security.api_key import APIKeyHeader

from data_sharing.settings import settings

header_scheme = APIKeyHeader(name="Authorization", scheme_name="Bearer")


def is_authenticated(token=Depends(header_scheme)):
if not compare_digest(token, settings.DELTA_BEARER_TOKEN):
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED)
Empty file.
Loading

0 comments on commit fdd8131

Please sign in to comment.