Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R/W ops can get out of sync #140

Open
szarnyasg opened this issue Nov 3, 2021 · 3 comments
Open

R/W ops can get out of sync #140

szarnyasg opened this issue Nov 3, 2021 · 3 comments

Comments

@szarnyasg
Copy link
Member

szarnyasg commented Nov 3, 2021

Moving a discussion on Slack here:

  • read ops & write ops are sent to different threads, read threads are a pool, write threads are long lived threads that are pinned to op streams
  • so reads and/or writes can outpace one another if the compression ratio is set too ambitiously…
  • passing all ops via a pool is a no-go, because it’s too slow
  • but what might make sense is to use those long lived threads for all ops
  • and assign reads to those long lived threads before even starting, so there is no need to feed ops to threads via a queue. rather, each thread would get an iterator that it simply exhausts
@jackwaudby
Copy link
Member

As mentioned in Slack, generating the complete workload upfront and distributing these across long-lived threads feels a good idea and would simplify the code and runtime complexity.

My only concern is how we would generate the input parameters for the short reads which are currently fed by the results of complex reads at runtime.

@jackwaudby
Copy link
Member

Relevant section from the docs:
For each complex read instance, a sequence of short reads is planned. There are two types of short read sequences: Person centric and Message centric. Depending on the type of the complex read, one of them is chosen. Each sequence consists of a set of short reads which are issued in a row. The issue time assigned to each short read in the sequence is determined at run time, and is based on the completion time of the complex read it depends on. The substitution parameters for short reads are taken from the results of previously executed complex reads and short reads. Once a short read sequence is issued (and provided that sufficient substitution parameters exist), there is a probability that another short read sequence is issued. This probability decreases for each new sequence issued.

@jackwaudby
Copy link
Member

AFAIK the idea behind the pool of complex read results being used to fed short reads is to simulate the behavior of running some complex queries then exploring the area around/looking up the results.

@szarnyasg szarnyasg added the dependencies Pull requests that update a dependency file label Dec 12, 2021
@szarnyasg szarnyasg added enhancement and removed dependencies Pull requests that update a dependency file labels May 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants