Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example instructions on using Lambda #27

Closed
wants to merge 1 commit into from
Closed

Conversation

chriskuehl
Copy link
Owner

Fixes #23

Still a little WIP, but I think most of the content is here.

I intentionally kept the lambda bits in a separate directory not distributed with the main package, as I don't want dumb-pypi to grow a dependency on boto or anything like that.

The code here is pretty simple... most of the complication is configuring all the permissions for AWS.

Testing this out, it is pretty cool to drop a package into the bucket and a couple seconds later see the updated index :)

)


def _sync_bucket(localdir, bucket_name):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking at how aws s3 sync works and it basically does a (mtime, size) comparison against the remote objects -- going to try and implement something like that as well :)

@asottile
Copy link
Collaborator

Also probably want to note something about debouncing here. Uploading 100 packages causes the lambda to fire 100 times (I've noticed that for anything above ~100 packages ~25% of the invocations fail due to list bucket rate limiting)

Some ideas for debouncing:

  • reduce lambda concurrency
  • short circuit if too long after upload timestamp (I assume this can be retrieved from the event metadata?)
  • introduce lambda -> kinesis -> lambda and use a large batch size
  • write a timestamp file to the destination bucket and short circuit on that?

@asottile
Copy link
Collaborator

The upload timestamp check worked for me!

then code that's roughly like this:

def _iso_8601(dt: datetime.datetime) -> str:
    return dt.isoformat() + 'Z'


def _parse_dt(s: str) -> datetime.datetime:
    return datetime.datetime.strptime(s, '%Y-%m-%dT%H:%M:%S.%fZ')


def _get_sync_dt() -> datetime.datetime:
    s3 = boto3.client('s3')
    try:
        resp = s3.get_object(Bucket=SYNC_KEY.bucket, Key=SYNC_KEY.prefix)
    except ClientError:
        return datetime.datetime.min
    else:
        return _parse_dt(resp['Body'].read().decode().strip())


def _set_sync_dt(dt: datetime.datetime) -> None:
    now = _iso_8601(dt).encode()
    s3 = boto3.client('s3')
    s3.put_object(
        Body=now, Bucket=SYNC_KEY.bucket, Key=SYNC_KEY.prefix,
        ServerSideEncryption='AES256',
    )


def main(event: Dict[str, Any], context: Any) -> int:
    event_dt = _parse_dt(event['Records'][0]['eventTime'])
    sync_dt = _get_sync_dt()
    now_dt = datetime.datetime.utcnow()
    if sync_dt > event_dt:
        print(f'skipping (synced at {sync_dt}, event at {event_dt})')
        return 0

    # (snip...) creation and sync here

    _set_sync_dt(now_dt)
    return 0


# Strictly for testing
if __name__ == '__main__':
    now = _iso_8601(datetime.datetime.utcnow())
    exit(main({'Records': [{'eventTime': now}]}, None))

screen shot 2018-06-21 at 10 59 01 am

@chriskuehl chriskuehl closed this May 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AWS lambda support
2 participants