Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bypass kinesis #111

Open
wants to merge 2 commits into
base: development
Choose a base branch
from
Open

Bypass kinesis #111

wants to merge 2 commits into from

Conversation

czirker
Copy link
Contributor

@czirker czirker commented Dec 8, 2020

Allows for the ability to bypass kinesis and write directly to the bus. This is in response to the Nov 2020 AWS Kinesis outage.

Usage:
leo.write and leo.load now accept option.disable_kinesis: true or you can use the environment variable DISABLE_KINESIS=true

Note:
Bypassing kinises can create a race condition if 2 bots are writing to the same queue. The race happens in the bot reading from that queue. If the bot reads too quickly it will checkpoint and potentially skip events that were created by the seconds source bot but still being written to the DB.
A workaround is to not read all the way to the current timestamp as this will allow the in flight events to land in the correct place before being read.

Also, when there are 2 source bots there is the possibility that they generate the same eid. This would happen when both bots begin the kinesis stream processor code on the same millisecond. There is no workaround for this currently. However, ideas may include giving each writing bot a block record range for the eid, extending the eid to include an partition/writer identifier between the timestamp and record number, making queue partitions a first class feature

This does NOT solve the case of firehose. firehose is used to collect the events over the course of 1 mintue and then it emits an event to kinesis. A workaround for firehose is to stop sending to firehose and bypass kinesis.

Clint Zirker added 2 commits December 8, 2020 12:55
…s. This is in response to the Nov 2020 AWS Kinesis outage.

Usage:
leo.write and leo.load now accept `option.disable_kinesis: true` or you can use the environment variable `DISABLE_KINESIS=true`

Note:
Bypassing kinises can create a race condition if 2 bots are writing to the same queue.  The race happens in the bot reading from that queue.  If the bot reads too quickly it will checkpoint and potentially skip events that were created by the seconds source bot but still being written to the DB.
A workaround is to not read all the way to the current timestamp as this will allow the in flight events to land in the correct place before being read.

Also, when there are 2 source bots there is the possibility that they generate the same eid.  This would happen when both bots begin the kinesis stream processor code on the same millisecond.  There is no workaround for this currently. However, ideas may include giving each writing bot a block record range for the eid, extending the eid to include an partition/writer identifier between the timestamp and record number, making queue partitions a first class feature

This does NOT solve the case of firehose. firehose is used to collect the events over the course of 1 mintue and then it emits an event to kinesis. A workaround for firehose is to stop sending to firehose and bypass kinesis.
@czirker czirker requested a review from jgrantr December 8, 2020 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant