Skip to content

Commit dae28e4

Browse files
committed
Add example instructions on using Lambda
1 parent 209e8ab commit dae28e4

File tree

7 files changed

+316
-9
lines changed

7 files changed

+316
-9
lines changed

README.md

+4-1
Original file line numberDiff line numberDiff line change
@@ -58,12 +58,14 @@ To use dumb-pypi, you need two things:
5858

5959
My recommended high-availability (but still quite simple) deployment is:
6060

61-
* Store all of the packages in S3.
61+
* Store all of the packages in an S3 bucket.
6262

6363
* Have a cronjob (or equivalent) which rebuilds the index based on the packages
6464
in S3. This is incredibly fast—it would not be unreasonable to do it every
6565
sixty seconds. After building the index, sync it into a separate S3 bucket.
6666

67+
(You can also use AWS Lambda for this step; [instructions here!][lambda])
68+
6769
* Have a webserver (or set of webservers behind a load balancer) running nginx
6870
(with the config provided below), with the source being that second S3
6971
bucket.
@@ -172,6 +174,7 @@ To run the tests, call `make test`. To run an individual test, you can do
172174
`py.test -k name_of_test tests` (with the virtualenv activated).
173175

174176

177+
[lambda]: https://github.com/chriskuehl/dumb-pypi/blob/master/lambda/README.md
175178
[rationale]: https://github.com/chriskuehl/dumb-pypi/blob/master/RATIONALE.md
176179
[pep503]: https://www.python.org/dev/peps/pep-0503/#normalized-names
177180
[s3-metadata]: https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html#UserMetadata

lambda/.gitignore

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
/bundle
2+
/bundle.zip

lambda/Makefile

+13
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
bundle: config.json handler.py ../setup.py
2+
rm -rf $@
3+
mkdir $@
4+
cp handler.py config.json $@/
5+
pip install .. -t $@/
6+
7+
bundle.zip: bundle
8+
rm -f $@
9+
cd bundle && zip -r ../bundle.zip .
10+
11+
.PHONY: clean
12+
clean:
13+
rm -rf bundle bundle.zip

lambda/README.md

+206
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,206 @@
1+
# Integrating dumb-pypi with AWS Lambda
2+
3+
[AWS Lambda][lambda] is a way to run code ("functions") in response to triggers
4+
(like a change in an S3 bucket) without running any servers yourself.
5+
6+
dumb-pypi works very well with Lambda; you only need to regenerate the index
7+
when your list of packages changes (relatively rare), and you can serve the
8+
generated index without involving dumb-pypi at all.
9+
10+
The steps below walk you through an example AWS Lambda setup where a change in
11+
a "source" bucket (containing all your packages) automatically triggers
12+
dumb-pypi to regenerate the index and store it in the "output" bucket.
13+
14+
Depending on if you need to support old pip versions, you may even be able to
15+
serve your index directly from S3, avoiding running any servers entirely.
16+
17+
18+
## Initial deployment
19+
20+
These instructions use the sample code in this directory as the base for the
21+
Lambda handler. The specifics of your bucket will likely vary; it's expected
22+
that you may need to adjust configuration options or the code itself to match
23+
your deployment.
24+
25+
1. Create two S3 buckets, e.g. `dumb-pypi-source` and `dumb-pypi-output`.
26+
27+
The source bucket is where you'll drop Python packages (tarballs, wheels,
28+
etc.) in a flat listing (all objects at the root of the bucket).
29+
30+
The output bucket will contain the generated index (HTML files) which pip
31+
uses.
32+
33+
2. Create an IAM role which allows reading from the source bucket and
34+
reading/writing to the output bucket. Select "Lambda" as the AWS resource
35+
the role applies to during creation.
36+
37+
Here's an example policy (adjust as needed):
38+
39+
```json
40+
{
41+
"Version": "2012-10-17",
42+
"Statement": [
43+
{
44+
"Sid": "AllowReadToSourceBucket",
45+
"Effect": "Allow",
46+
"Action": [
47+
"s3:List*",
48+
"s3:Get*"
49+
],
50+
"Resource": [
51+
"arn:aws:s3:::dumb-pypi-source/*",
52+
"arn:aws:s3:::dumb-pypi-source"
53+
]
54+
},
55+
{
56+
"Sid": "AllowReadWriteToOutputBucket",
57+
"Effect": "Allow",
58+
"Action": [
59+
"s3:List*",
60+
"s3:Get*",
61+
"s3:PutObject",
62+
"s3:DeleteObject"
63+
],
64+
"Resource": [
65+
"arn:aws:s3:::dumb-pypi-output/*",
66+
"arn:aws:s3:::dumb-pypi-output"
67+
]
68+
}
69+
]
70+
}
71+
```
72+
73+
3. Adjust `config.json` in this directory as necessary (e.g. update
74+
source/output bucket and the arguments). You can easily change this stuff
75+
later.
76+
77+
4. Build the first deployment bundle to upload to Lambda. From this directory,
78+
just run `make bundle.zip`.
79+
80+
5. Create the function. For example, here's how you might do it with the AWS cli:
81+
82+
```bash
83+
aws lambda create-function \
84+
--region us-west-1 \
85+
--function-name dumb-pypi \
86+
--runtime python3.6 \
87+
--role arn:aws:iam::XXXXXXXXXXXX:role/dumb-pypi \
88+
--handler handler.main \
89+
--zip-file fileb://bundle.zip
90+
```
91+
92+
(Replace the role, region, etc. to match your setup.)
93+
94+
6. [Give your S3 source bucket permission][s3-allow-trigger] to trigger your new
95+
Lambda function. For example:
96+
97+
```bash
98+
aws lambda add-permission \
99+
--region us-west-1 \
100+
--function-name dumb-pypi \
101+
--statement-id AllowSourceBucketToTriggerDumbPyPI \
102+
--action lambda:InvokeFunction \
103+
--principal s3.amazonaws.com \
104+
--source-arn arn:aws:s3:::dumb-pypi-source \
105+
--source-account XXXXXXXXXXXX
106+
```
107+
108+
7. Set up a trigger so that changes to the source bucket cause the `dumb-pypi`
109+
function to run and regenerate the index.
110+
111+
The AWS cli is very awkward, the easiest way to do this is to make a file
112+
like `policy.json` with contents like:
113+
114+
```json
115+
{
116+
"LambdaFunctionConfigurations": [
117+
{
118+
"Id": "NotifyDumbPyPI",
119+
"LambdaFunctionArn": "arn:aws:lambda:us-west-1:XXXXXXXXXXXX:function:dumb-pypi",
120+
"Events": ["s3:ObjectCreated:*", "s3:ObjectRemoved:*"]
121+
}
122+
]
123+
}
124+
```
125+
126+
(Again, replacing the function's ARN as appropriate for your account.)
127+
128+
Then, using the AWS cli, add a "notification configuration" to the source
129+
bucket:
130+
131+
```bash
132+
aws s3api put-bucket-notification-configuration \
133+
--bucket dumb-pypi-source \
134+
--notification-configuration "$(< policy.json)"
135+
```
136+
137+
138+
## Serving from the S3 buckets directly
139+
140+
The whole point of Lambda is to avoid running your own servers, so you might as
141+
well serve directly from S3 :)
142+
143+
Keep in mind that if you need to support old pip versions, you [can't yet serve
144+
directly from S3][rationale] because these old versions rely on the PyPI server
145+
to do package name normalization; see [the README][README] for suggestions on
146+
how to use nginx to do this normalization.
147+
148+
If you **do** want to serve from S3 directly, it's pretty easy:
149+
150+
1. Enable read access to your source bucket. You can enable this to the public,
151+
whitelisted only to your company's IPs, etc.
152+
153+
Here's an example policy which whitelists your bucket to everyone:
154+
155+
```json
156+
{
157+
"Version": "2008-10-17",
158+
"Id": "AllowReadOnlyAccess",
159+
"Statement": [
160+
{
161+
"Sid": "AllowReadOnlyAccess",
162+
"Effect": "Allow",
163+
"Principal": {
164+
"AWS": "*"
165+
},
166+
"Action": "s3:GetObject",
167+
"Resource": "arn:aws:s3:::dumb-pypi-source/*"
168+
}
169+
]
170+
}
171+
```
172+
173+
This will make your source bucket available at a URL like
174+
`https://dumb-pypi-source.s3.amazonaws.com`.
175+
176+
2. Enable read access to your output bucket. Again, it's up to you who you
177+
allow; you can use the same example policy from above (just adjust the
178+
bucket name).
179+
180+
3. Enable static website hosting for your output bucket, and set `index.html`
181+
as your "Index document". This appears to be the only way to get
182+
`index.html` to show up when accessing the root of a "directory" in S3.
183+
184+
This will make your output bucket available at a URL like
185+
`http://dumb-pypi-output.s3-website-us-west-1.amazonaws.com/`.
186+
187+
188+
## Updating the code or confi
189+
190+
Any time you update the code or config, you need to re-deploy the bundle to
191+
Lambda.
192+
193+
1. Run `make deploy.zip` to build a new deployment bundle.
194+
195+
2. Use the AWS cli to update the code for the function:
196+
197+
```bash.
198+
aws lambda update-function-code \
199+
--function-name dumb-pypi \
200+
--zip-file fileb://bundle.zip
201+
```
202+
203+
[lambda]: https://aws.amazon.com/lambda/
204+
[rationale]: https://github.com/chriskuehl/dumb-pypi/blob/master/RATIONALE.md
205+
[s3-allow-trigger]: https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html#grant-destinations-permissions-to-s3
206+
[README]: https://github.com/chriskuehl/dumb-pypi/blob/master/README.md

lambda/config.json

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
{
2+
"source-bucket": "dumb-pypi-source",
3+
"output-bucket": "dumb-pypi-output",
4+
"args": [
5+
"--packages-url", "https://dumb-pypi-source.s3.amazonaws.com/",
6+
"--title", "My Cool PyPI on S3"
7+
]
8+
}

lambda/handler.py

+71
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
import json
2+
import mimetypes
3+
import os
4+
import os.path
5+
import tempfile
6+
import time
7+
8+
import boto3
9+
10+
import dumb_pypi.main
11+
12+
13+
def _load_config():
14+
with open(os.path.join(os.path.dirname(__file__), 'config.json')) as f:
15+
return json.load(f)
16+
17+
18+
def _list_bucket(bucket):
19+
s3 = boto3.client('s3')
20+
paginator = s3.get_paginator('list_objects_v2')
21+
for page in paginator.paginate(Bucket=bucket):
22+
yield from (
23+
json.dumps(
24+
{
25+
'filename': package['Key'],
26+
'upload_timestamp': time.mktime(package['LastModified'].timetuple()),
27+
},
28+
sort_keys=True,
29+
)
30+
for package in page.get('Contents', ())
31+
)
32+
33+
34+
def _sync_bucket(localdir, bucket_name):
35+
# TODO: should also delete removed files
36+
s3 = boto3.resource('s3')
37+
bucket = s3.Bucket(bucket_name)
38+
for dirpath, _, filenames in os.walk(localdir):
39+
for filename in filenames:
40+
path_on_disk = os.path.join(dirpath, filename)
41+
key = os.path.relpath(path_on_disk, localdir)
42+
print(f'Uploading {path_on_disk} => s3://{bucket_name}/{key}')
43+
with open(path_on_disk, 'rb') as f:
44+
bucket.put_object(
45+
Key=key,
46+
Body=f,
47+
ContentType=mimetypes.guess_type(filename)[0]
48+
)
49+
50+
51+
def main(event, context):
52+
config = _load_config()
53+
54+
with tempfile.TemporaryDirectory() as td:
55+
with tempfile.NamedTemporaryFile(mode='w') as tf:
56+
for line in _list_bucket(config['source-bucket']):
57+
tf.write(line + '\n')
58+
tf.flush()
59+
60+
dumb_pypi.main.main((
61+
'--package-list-json', tf.name,
62+
'--output-dir', td,
63+
*config['args'],
64+
))
65+
66+
_sync_bucket(td, config['output-bucket'])
67+
68+
69+
# Strictly for testing; we don't look at the event or context anyway.
70+
if __name__ == '__main__':
71+
exit(main(None, None))

vendor/venv-update

+12-8
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ See https://pip.readthedocs.org/en/stable/user_guide/#environment-variables
4949
For example:
5050
PIP_INDEX_URL=https://pypi.example.com/simple venv-update
5151
52-
Please send issues to: https://github.com/yelp/pip-faster
52+
Please send issues to: https://github.com/yelp/venv-update
5353
'''
5454
from __future__ import absolute_import
5555
from __future__ import print_function
@@ -59,16 +59,16 @@ from os.path import exists
5959
from os.path import join
6060
from subprocess import CalledProcessError
6161

62-
__version__ = '2.0.0'
62+
__version__ = '3.0.0'
6363
DEFAULT_VIRTUALENV_PATH = 'venv'
6464
DEFAULT_OPTION_VALUES = {
6565
'venv=': (DEFAULT_VIRTUALENV_PATH,),
6666
'install=': ('-r', 'requirements.txt',),
6767
'pip-command=': ('pip-faster', 'install', '--upgrade', '--prune'),
6868
'bootstrap-deps=': ('venv-update==' + __version__,),
6969
}
70-
__doc__ = __doc__.format( # pylint:disable=redefined-builtin
71-
**dict((key, ' '.join(val)) for key, val in DEFAULT_OPTION_VALUES.items())
70+
__doc__ = __doc__.format(
71+
**{key: ' '.join(val) for key, val in DEFAULT_OPTION_VALUES.items()}
7272
)
7373

7474
# This script must not rely on anything other than
@@ -89,10 +89,10 @@ def parseargs(argv):
8989
else:
9090
options[key] += (arg,)
9191

92-
if set(args) & set(('-h', '--help')):
92+
if set(args) & {'-h', '--help'}:
9393
print(__doc__, end='')
9494
exit(0)
95-
elif set(args) & set(('-V', '--version')):
95+
elif set(args) & {'-V', '--version'}:
9696
print(__version__)
9797
exit(0)
9898
elif args:
@@ -169,7 +169,7 @@ def exec_(argv): # never returns
169169
# in python3, sys.exitfunc has gone away, and atexit._run_exitfuncs seems to be the only pubic-ish interface
170170
# https://hg.python.org/cpython/file/3.4/Modules/atexitmodule.c#l289
171171
import atexit
172-
atexit._run_exitfuncs() # pylint:disable=protected-access
172+
atexit._run_exitfuncs()
173173

174174
from os import execv
175175
execv(argv[0], argv)
@@ -307,6 +307,10 @@ def ensure_virtualenv(args, return_values):
307307
argv[:] = ('virtualenv',) + args
308308
info(colorize(argv))
309309
raise_on_failure(virtualenv.main)
310+
# There might not be a venv_path if doing something like "venv= --version"
311+
# and not actually asking virtualenv to make a venv.
312+
if return_values.venv_path is not None:
313+
run(('rm', '-rf', join(return_values.venv_path, 'local')))
310314

311315

312316
def wait_for_all_subprocesses():
@@ -398,7 +402,7 @@ def venv_update(
398402
def execfile_(filename):
399403
with open(filename) as code:
400404
code = compile(code.read(), filename, 'exec')
401-
exec(code, {'__file__': filename}) # pylint:disable=exec-used
405+
exec(code, {'__file__': filename})
402406

403407

404408
def pip_faster(venv_path, pip_command, install, bootstrap_deps):

0 commit comments

Comments
 (0)