Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement dht.put() backoff algorithm #135

Open
lmatteis opened this issue Jul 30, 2016 · 20 comments
Open

Implement dht.put() backoff algorithm #135

lmatteis opened this issue Jul 30, 2016 · 20 comments

Comments

@lmatteis
Copy link

To keep things alive in the DHT store (get/put), it's suggested an algorithm such as (http://libtorrent.org/dht_rss.html#re-announcing) to avoid overload of the network:

  1. pick one random item (i) from the local repository (except
    items already announced this round)
  2. If all items in the local repository have been announced
    2.1 terminate
  3. look up item i in the DHT
  4. If fewer than 8 nodes returned the item
    4.1 announce i to the DHT
    4.2 goto 1

However, I can't seem to find a way to see how many nodes returned the item after a get request. Is such functionality exposed?

@lmatteis lmatteis changed the title DHT.get - number of nodes that returned an item DHT.get() - number of nodes that returned an item Jul 30, 2016
@lmatteis
Copy link
Author

lmatteis commented Aug 1, 2016

Actually, this might be implemented as a dht.keepAlive(opts) method. From BEP44 (quite recent addition): https://github.com/bittorrent/bittorrent.org/blob/master/beps/bep_0044.rst#expiration

I'm willing to implement such functionality however I'm unsure how it would look like as an API. Some ideas:

// every 2 hours runs .put() based on above algorithm and executes callback?
dht.keepAlive(opts, (err, res) => console.log(err, res)) 

// perhaps the interval should exist at the app level (not inside the module)?
setInterval(() => dht.backoff(opts, () => dht.put(opts)), 7200000)

// or perhaps the functionality should be inside .put() itself?
// put() might not actually write to the DHT if the backoff algorithm doesn't allow it
setInterval(() => dht.put(opts), 7200000)

@substack @mafintosh @feross suggestions?

Also please let me know what you think of this recent BEP suggestion for updating torrents based on DHT mutable items: bittorrent/bittorrent.org#35

@the8472
Copy link

the8472 commented Aug 2, 2016

http://libtorrent.org/dht_rss.html#re-announcing is obsolete and should be ignored.

You can find the current algorithm at https://github.com/bittorrent/bittorrent.org/blob/master/beps/bep_0044.rst#expiration

@feross
Copy link
Member

feross commented Aug 4, 2016

The third option seems the most usable to me, and it doesn't require new APIs to be added, or additional state to be tracked inside the DHT client. I like that it does the right thing by default.

We can add a { force: true } option to force a put to happen even when the backoff algorithm doesn't allow it.

@lmatteis
Copy link
Author

lmatteis commented Aug 4, 2016

@feross what about an opposite { backoff: true } so we don't break existing tests and use cases where users don't need multiple roundtrips?

@feross
Copy link
Member

feross commented Aug 4, 2016

@lmatteis I still slightly prefer { force: true } since that way we do the right thing by default. We can bump the major version. But, I'm fine with { backoff: true } as well. @mafintosh - do you have thoughts?

@lmatteis
Copy link
Author

lmatteis commented Aug 5, 2016

I started implementing this here: https://github.com/lmatteis/bittorrent-dht/tree/backoff-algorithm

I'm a really bad coder 😄 so I'd appreciate feedback. Specifically, I decided to only alter _preput() and put the backoff logic in there. It seems to run a get based on the target to populate the local routing table. So I'm taking advantage of those round trips to also store the values: https://github.com/lmatteis/bittorrent-dht/blob/backoff-algorithm/client.js#L209

@feross
Copy link
Member

feross commented Aug 6, 2016

Cool -- thanks for giving it a shot. I'm probably not the best person to review this, tbh as I didn't write this part of the codebase. I think @mafintosh or @substack ought to take a look before we merge this.

Few things I noticed in your work so far:

  • I don't think the test you wrote is actually testing that the backoff is working correctly. I think you need to verify that no put actually happened.
  • MAX_COPIES should not be changed to 0.

@lmatteis
Copy link
Author

lmatteis commented Aug 6, 2016

@feross yeah it's actually quite hard to test this behavior. I'd have to instantiate N nodes and find a way that I can control how many copies are being stored, in order to test for < 8 and > 8 copies.

And yeah MAX_COPIES is a typo should be 8.

@mafintosh
Copy link
Contributor

Does this mean that .put will re-announce in the background? Neither .announce/.lookup does this now so I'm unsure if that would be confusing. The feature seems useful though!

@lmatteis
Copy link
Author

lmatteis commented Aug 6, 2016

@mafintosh if we just alter .put (option 3), the re-announce happens at the app level (not inside the module). So we opted to go with option 3 by either doing force : true or backoff: true.

This is the actual algorithm: https://github.com/bittorrent/bittorrent.org/pull/38/files

I'm simply taking advantage of the _preput() to implement this. But am unsure about how it's implemented. I think I'm missing point 2 of the algorithm: 2. The 8 nodes closest to the target key which are eligible for a store all have indicated they have the data, either by returning it or through the``seq``number.

@the8472
Copy link

the8472 commented Aug 6, 2016

From skimming this implementation it looks like put and get are totally separate lookups.

In my own implementation a put is a non-lookup action that takes a finished get as an argument, that way the responses of the get can be reused.

@feross
Copy link
Member

feross commented Aug 6, 2016

@lmatteis Re-announce should happen at the app level (option 3) to remain consistent with .announce/.lookup.

@feross feross changed the title DHT.get() - number of nodes that returned an item Implement dht.put() backoff algorithm Aug 6, 2016
@lmatteis
Copy link
Author

lmatteis commented Aug 8, 2016

@substack any ideas how we can add tests for this?

@feross
Copy link
Member

feross commented Aug 8, 2016

@lmatteis Can't you just create 8 + 1 clients, then have one of them do two put() calls in a row? Only the first put() call should actually be sent to the clients.

To make this easier to test, you could make the constant 8 configurable, so you can set it to something smaller, like 2, for the tests.

@lmatteis
Copy link
Author

lmatteis commented Aug 8, 2016

@feross right but how do I test whether that second put() isn't sent to the clients?

@feross
Copy link
Member

feross commented Aug 8, 2016

@lmatteis You can spy on calls to the dht._onput() function. Not super elegant, but we don't have an event for when a put happens. So this works:

var onput = dht._onput // save the onput function
dht._onput = function () {
  t.pass('got `put` from remote peer')
  onput.apply(dht, arguments) // call the original onput function
}

@lmatteis
Copy link
Author

lmatteis commented Aug 8, 2016

Ok makes sense. I just didn't want to screw things up since I didn't see tests for private methods in the other tests.

@lmatteis
Copy link
Author

lmatteis commented Aug 9, 2016

Ok tests seem to pass. I'm just a little weary about this big change: master...lmatteis:backoff-algorithm#diff-215b25d5bf5c0b4623ad1a2b62456f60L544

I'm essentially saving the value v and seq as well in the table for a target, since I need them to see whether we got multiple copies.

@feross
Copy link
Member

feross commented Aug 10, 2016

@lmatteis Can you send a PR?

@mafintosh I'm gonna need your help to review this one 👍 I really haven't dug into the BEP44 stuff very deeply.

@lmatteis
Copy link
Author

@feross ok, i also just added the second point of the algorithm. Test still pass but we need more for the backoff. For instance, I'm not storing the key/signature in the table for a key, hence can't check whether a copy is valid or not. I'll send a PR so we can better review where to store this stuff , as it could increase memory footprint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants