feat: baseline spadev0 integration #28

jcace · 2023-09-04T07:12:18Z

Closes #24

Adds an integration to the Retrieval bot codebase to enable pulling CIDs from Spade and adding them to the task queue

integration/spadev0/main.go

integration/spadev0/util.go

jcace · 2023-09-07T02:20:18Z

integration/spadev0/util.go

+	for _, document := range replicas {
+		tasks = append(tasks, task.Task{
+			Requester: requester,
+			Module:    task.HTTP, // TODO: Bitswap


@xinaxu , do you think we should have an entirely separate Module here? ex task.Spade? Reason I ask is I don't see the metadata retrieve_type used anywhere (maybe I'm not looking in the right place though).. is is OK to use that metadata for control flow?

Reason it is not used is because it is currently assuming there is only one type of retrieve_type and I'm expecting, as part of this PR or next PR, you will fork the logic, i.e., if retrieve_type is spade, do spade, otherwise, do regular.

Also, metadata will be part of the result payload, so you will be able to query the result collection with certain filtering on metadata.

Meanwhile, please double check the current retrieval logic to make sure that it will not panic or crash on unexpected metadata/retrieve_type. You should be fine just by looking at your code

Finally, have you checked the logic here. This is where retrieval worker polls the task queue for tasks. If we end up filling the task but no-one is pulling from it, the task queue will get full and no new tasks will be added. I think you're fine by looking at your logic but want to make sure that you're aware.

RetrievalBot/pkg/task/task_worker.go

Line 98 in 342f350

if strings.HasPrefix(t.acceptedCountries, "!") {

Meanwhile, please double check the current retrieval logic to make sure that it will not panic or crash on unexpected metadata/retrieve_type. You should be fine just by looking at your code

Yup, confirmed - the metadata retrieve_type is not actually read anywhere in the codebase, so should not cause any issues.

Finally, have you checked the logic here. This is where retrieval worker polls the task queue for tasks. If we end up filling the task but no-one is pulling from it, the task queue will get full and no new tasks will be added. I think you're fine by looking at your logic but want to make sure that you're aware.

should we add some kind of back-pressure check to decide to not run the integration if the queue is too full?

integration/spadev0/main.go

ribasushi

This is a very dcent first pass! The method is solid, left a few comments regarding implementation errors ( nothing architecturally bad though )

integration/spadev0/main.go

.golangci.yml

xinaxu

Please take a look at the comments before merging

integration/spadev0/util.go

xinaxu · 2023-09-08T19:56:23Z

integration/spadev0/util.go

+	for _, document := range replicas {
+		tasks = append(tasks, task.Task{
+			Requester: requester,
+			Module:    task.HTTP, // TODO: Bitswap


Reason it is not used is because it is currently assuming there is only one type of retrieve_type and I'm expecting, as part of this PR or next PR, you will fork the logic, i.e., if retrieve_type is spade, do spade, otherwise, do regular.

Also, metadata will be part of the result payload, so you will be able to query the result collection with certain filtering on metadata.

xinaxu · 2023-09-08T19:59:57Z

integration/spadev0/util.go

+	for _, document := range replicas {
+		tasks = append(tasks, task.Task{
+			Requester: requester,
+			Module:    task.HTTP, // TODO: Bitswap


Meanwhile, please double check the current retrieval logic to make sure that it will not panic or crash on unexpected metadata/retrieve_type. You should be fine just by looking at your code

xinaxu · 2023-09-08T20:03:37Z

integration/spadev0/util.go

+	for _, document := range replicas {
+		tasks = append(tasks, task.Task{
+			Requester: requester,
+			Module:    task.HTTP, // TODO: Bitswap


Finally, have you checked the logic here. This is where retrieval worker polls the task queue for tasks. If we end up filling the task but no-one is pulling from it, the task queue will get full and no new tasks will be added. I think you're fine by looking at your logic but want to make sure that you're aware.

RetrievalBot/pkg/task/task_worker.go

Line 98 in 342f350

if strings.HasPrefix(t.acceptedCountries, "!") {

jcace mentioned this pull request Sep 4, 2023

v0 Spade CIDs - Retrieval Bot Integration #24

Closed

jcace requested a review from xinaxu September 6, 2023 05:41

xinaxu reviewed Sep 6, 2023

View reviewed changes

integration/spadev0/main.go Outdated Show resolved Hide resolved

integration/spadev0/util.go Outdated Show resolved Hide resolved

integration/spadev0/util.go Outdated Show resolved Hide resolved

jcace commented Sep 7, 2023

View reviewed changes

integration/spadev0/util.go Outdated Show resolved Hide resolved

jcace commented Sep 7, 2023

View reviewed changes

integration/spadev0/main.go Outdated Show resolved Hide resolved

jcace marked this pull request as ready for review September 7, 2023 02:37

jcace requested a review from xinaxu September 7, 2023 02:39

jcace self-assigned this Sep 7, 2023

ribasushi reviewed Sep 7, 2023

View reviewed changes

integration/spadev0/main.go Show resolved Hide resolved

integration/spadev0/main.go Outdated Show resolved Hide resolved

integration/spadev0/main.go Outdated Show resolved Hide resolved

Jason Cihelka added 12 commits September 7, 2023 14:33

feat: baseline spadev0 integration

fd44e1a

feat: spade test top-levle CID selection logic

5c02bcd

feat: AddSpadeTasks

8b29fbf

feat: prepareSpadeTasks

5285f42

feat: cleanup

bdf510e

feat: address pr comments

96147e3

feat: randomize top level CID selection

5393c0e

feat: better logging

b21de31

cleanup

d755854

cleanup

fc47e43

chore: fix comment and logging

450988e

feat: address PR comments

12bbdf6

jcace force-pushed the spade-v0-integration branch from bfc54bc to 12bbdf6 Compare September 7, 2023 05:33

Jason Cihelka added 2 commits September 7, 2023 14:58

fix: lint errors

4eaf7a5

fix: lint errors

bda00d8

jcace commented Sep 7, 2023

View reviewed changes

.golangci.yml Show resolved Hide resolved

xinaxu approved these changes Sep 8, 2023

View reviewed changes

Jason Cihelka added 2 commits September 11, 2023 10:24

fix: bitswap instead of http

d0643ac

feat: address.NewIDAddress

51a3187

jcace merged commit 1d59f77 into main Sep 11, 2023
1 check failed

jcace deleted the spade-v0-integration branch September 11, 2023 02:43

jcace mentioned this pull request Sep 15, 2023

Retrieval Bot - Run Spade Integrations for Spade RB data-preservation-programs/spade#38

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: baseline spadev0 integration #28

feat: baseline spadev0 integration #28

jcace commented Sep 4, 2023 •

edited

Loading

jcace Sep 7, 2023

xinaxu Sep 8, 2023

xinaxu Sep 8, 2023

xinaxu Sep 8, 2023

jcace Sep 11, 2023

ribasushi left a comment

xinaxu left a comment

xinaxu Sep 8, 2023

xinaxu Sep 8, 2023

xinaxu Sep 8, 2023

feat: baseline spadev0 integration #28

feat: baseline spadev0 integration #28

Conversation

jcace commented Sep 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ribasushi left a comment

Choose a reason for hiding this comment

xinaxu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcace commented Sep 4, 2023 •

edited

Loading