Skip to content

Commit 64251f0

Browse files
authored
Merge branch 'main' into release-0.3.0
2 parents 6f7ae86 + 2a44cbf commit 64251f0

File tree

4 files changed

+14
-6
lines changed

4 files changed

+14
-6
lines changed

Architecture-Diagram.jpg

39 KB
Loading

Dockerfile

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ ARG HADOOP_VERSION=3.2.4
66
ARG AWS_SDK_VERSION=1.11.901
77
ARG PYSPARK_VERSION=3.3.0
88

9-
#FRAMEWORK will passed during the Docker build
9+
#FRAMEWORK will passed during the Docker build. For Apache Iceberg in somecase downgrading PYSPARK_VERSION to 3.2.0 will be good
1010
ARG FRAMEWORK
1111
ARG DELTA_FRAMEWORK_VERSION=2.2.0
1212
ARG HUDI_FRAMEWORK_VERSION=0.12.2
@@ -23,8 +23,10 @@ RUN yum update -y && \
2323
yum -y install yum-plugin-versionlock && \
2424
yum -y versionlock add java-1.8.0-openjdk-1.8.0.362.b08-0.amzn2.0.1.x86_64 && \
2525
yum -y install java-1.8.0-openjdk && \
26+
2627
pip install --upgrade pip && \
2728
pip install pyspark==$PYSPARK_VERSION boto3==1.28.27 && \
29+
2830
yum clean all
2931

3032
# Install pydeequ if FRAMEWORK is DEEQU

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,9 @@ Once the container is deployed on AWS Lambda, it remains the same until the func
1919

2020
The Spark logs will be part of the AWS Lambda logs stored in AWS Cloudwatch.
2121

22+
2223
![Architecture](https://github.com/aws-samples/spark-on-aws-lambda/blob/release-0.3.0/images/SoAL-Architecture.jpg)
24+
2325
***
2426

2527
### Current Challenge
@@ -63,6 +65,7 @@ This script is invoked in AWS Lambda when an event is triggered. The script down
6365

6466
Here is a summary of the main steps in the script:
6567

68+
6669
1. **Entry Point**: The `lambda_handler` function is the entry point for the Lambda function. It receives an event object and a context object as parameters.
6770
2. **S3 Script Location**: The `s3_bucket_script` and `input_script` variables are used to specify the Amazon S3 bucket and object key where the Spark script is located.
6871
3. **Download Script**: The `boto3` module is used to download the Spark script from Amazon S3 to a temporary file on the Lambda function's file system.

sparkLambdaHandler.py

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -34,14 +34,17 @@ def spark_submit(s3_bucket_script: str,input_script: str, event: dict)-> None:
3434
Submits a local Spark script using spark-submit.
3535
"""
3636
# Set the environment variables for the Spark application
37-
pyspark_submit_args = event.get('PYSPARK_SUBMIT_ARGS', '')
38-
# Source input and output if available in event
39-
input_path = event.get('INPUT_PATH','')
40-
output_path = event.get('OUTPUT_PATH', '')
37+
# pyspark_submit_args = event.get('PYSPARK_SUBMIT_ARGS', '')
38+
# # Source input and output if available in event
39+
# input_path = event.get('INPUT_PATH','')
40+
# output_path = event.get('OUTPUT_PATH', '')
41+
42+
for key,value in event.items():
43+
os.environ[key] = value
4144
# Run the spark-submit command on the local copy of teh script
4245
try:
4346
logger.info(f'Spark-Submitting the Spark script {input_script} from {s3_bucket_script}')
44-
subprocess.run(["spark-submit", "/tmp/spark_script.py", "--event", json.dumps(event)], check=True)
47+
subprocess.run(["spark-submit", "/tmp/spark_script.py", "--event", json.dumps(event)], check=True, env=os.environ)
4548
except Exception as e :
4649
logger.error(f'Error Spark-Submit with exception: {e}')
4750
raise e

0 commit comments

Comments
 (0)