Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 3.3.1 #3963

Merged
merged 80 commits into from
Aug 9, 2024
Merged

Release 3.3.1 #3963

merged 80 commits into from
Aug 9, 2024

Conversation

karol-kokoszka
Copy link
Collaborator

Prep for release 3.3.1

Contains:
#3905
#3906
#3910
#3898
#3917
#3923
#3924
#3926
#3938
#3934
#3918
#3885
#3942
#3954
#3931
#3941
#3943
#3955
#3957
#3960


Please make sure that:

  • Code is split to commits that address a single change
  • Commit messages are informative
  • Commit titles have module prefix
  • Commit titles have issue nr. suffix

karol-kokoszka and others added 30 commits August 9, 2024 23:16
Since scylladb/scylladb@1577aa8 (5.2), the replace_address and replace_address_first_boot options are deprecated and their usage is discouraged, and warned about (when used).
Instead, the user should use the replace_node_first_boot which states the replaced node's host_id rather than its ip address.
…onfig value

Since scylladb/scylladb@1577aa8 (5.2), the replace_address and replace_address_first_boot options are deprecated and their usage is discouraged, and warned about (when used).
Instead, the user should use the replace_node_first_boot which states the replaced node's host_id rather than its ip address.
Tablets went from experimental feature to regular enable_tablets feature. This commit updates test env setup according to this change. It also gets rid of no longer needed things like:
- separate scylla.yaml and .properties for other cluster
- no need to control raft topology, as it is always enabled in 6.0, and we don't care for it in previous versions
This commit adds a script ('.github/cfg/main.go') which generates workflows and prints github badges according to './integration-test-cfg.yaml', './integration-test-core.yaml'.
…sIntegration

This commit:
- extends graceful stop timeout so that it catches all finished after ctx cancel repair jobs
- clears tablet ranges of not fully repaired table before starting new repair (as tablet tables always resume repair from scratch)
- makes it so that no redundant ranges are tolerated
This section is generated by the ./github/cfg/main.go script (except for the limitations, which should be added manually).

Fixes #3872
Even though taking state and appending ranges is done under mutex, it is still possible that (from SM POV) the first insert could reach SM DB later than the second insert. This would result in overwriting some successfully repaired ranges and re-repairing them in the next run.

Fixes #3919
Test case "repair temporary network outage" could fail (because the repair finished before network outage) when there were only 3 replica sets to repair (possible for 4 initial tablets).
This fixes flakiness like:
2024-07-10T16:09:57.1335875Z === RUN   TestRcloneStoppingTransferIntegration
{"host": "[2001:0DB9:200::11]:10001", "method": "POST", "uri": "/agent/rclone/core/bwlimit", "duration": "1ms", "status": 200, "bytes": 93}
{"host": "[2001:0DB9:200::11]:10001", "method": "POST", "uri": "/agent/rclone/sync/copydir?_async=true", "duration": "0ms", "status": 200, "bytes": 21}
{"host": "[2001:0DB9:200::11]:10001", "method": "POST", "uri": "/agent/rclone/job/info", "duration": "1000ms", "status": 200, "bytes": 556}
{"host": "[2001:0DB9:200::11]:10001", "method": "POST", "uri": "/agent/rclone/job/stop", "duration": "0ms", "status": 200, "bytes": 3}
{"host": "[2001:0DB9:200::11]:10001", "method": "POST", "uri": "/agent/rclone/job/info", "duration": "1000ms", "status": 200, "bytes": 549}
{"host": "[2001:0DB9:200::11]:10001", "method": "POST", "uri": "/agent/rclone/core/bwlimit", "duration": "9ms", "status": 200, "bytes": 79}
2024-07-10T16:10:02.5696715Z --- FAIL: TestRcloneStoppingTransferIntegration (5.44s)
2024-07-10T16:10:02.5716548Z panic: runtime error: index out of range [0] with length 0 [recovered]
Using errors.Errorf("%s", err) results in making it impossible to later check nested error cause with errors.Is function.
On the other hand, it works fine with fmt.Errorf("%w", err).

Fixes #3925
Since it's no longer supported, remove it.

Signed-off-by: Yaniv Kaul <[email protected]>
karol-kokoszka and others added 27 commits August 9, 2024 23:30
Previously test scenario:
- backup src
- restore to dst
- backup dst
- restore to *dst*

It was changed to (as roundtrip suggests):
- backup src
- restore to dst
- backup dst
- restore to *scr*

Also, src schema is dropped right after the backup, so it should propagate to all nodes before the last restore takes place.

Fixes #3939
Even though stage schema does not need indexed snapshot dirs, they are required in the next stage deduplicate.
The purpose of this endpoint is to delete files in batches, instead of deleting them one by one. It should improve purge and deduplication backup stages performance.
Previously it was only possible to delete files one by one.
Now it's possible to delete many files in a single API call (RcloneDeletePaths),
or in batches of given size (RcloneDeletePathsInBatches).
Tests were sometimes failing because gh actions machines had less disk space than the default amount.
…status

This commit adds dedicated errors which can be used to check task status.
Previously it was done by checking error returned by task,
but it was impossible to tell whether returned context.Canceled
originated from pausing the task or from task execution.

Fixes #3884
@karol-kokoszka karol-kokoszka merged commit 932a84b into branch-3.3 Aug 9, 2024
50 of 51 checks passed
@karol-kokoszka karol-kokoszka deleted the release-3.3.1 branch August 9, 2024 22:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants