Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Meta] Validate plugins compatibility with segment replication #8211

Closed
36 of 37 tasks
dreamer-89 opened this issue Jun 22, 2023 · 13 comments
Closed
36 of 37 tasks

[Meta] Validate plugins compatibility with segment replication #8211

dreamer-89 opened this issue Jun 22, 2023 · 13 comments
Assignees
Labels
distributed framework enhancement Enhancement or improvement to existing feature or request Indexing:Replication Issues and PRs related to core replication framework eg segrep v2.10.0

Comments

@dreamer-89
Copy link
Member

dreamer-89 commented Jun 22, 2023

Summary

With 2.9.0 release, there are lot of enhancements going in for segment replication[1][2] feature (went GA in 2.7.0), we need to ensure different plugins are compatible with current state of this feature. Previously, we ran tests on plugin repos to verify this compatibility but want plugin owners to be aware of these changes so that required updates (if any) can be made. With 2.10.0 release, remote store feature is going GA which internally uses SEGMENT replication strategy only i.e. it enforces all indices to use SEGMENT replication strategy. So, it is important to validate plugins are compatible with segment replication feature.

What changed

1. Refresh policy behavior

  1. RefreshPolicy.IMMEDIATE will only refresh primary shards but not replica shards immediately. Instead post refresh, primary will start a round of segment replication to update the replica shard copies leading to eventual consistency.
  2. RefreshPolicy.WAIT_UNTIL ensures the indexing operation is searchable in your cluster i.e. RAW (Read after write guarantee). With segment replication, this guarantee is not promised due to delay in replica shared updates from asynchronous background refreshes.

2. Refresh lag on replicas

With segment replication, there is inherent delay in documents to be searchable on replica shard copies. This is due to the fact that replica shard copies over data (segment) files from primary. Thus, compared to document replication, there will be on average increase in amount of time the replica shards are consistent with primaries.

3. System/hidden indices support

With #8200, system and hidden indices are now supported with SEGMENT replication strategy. We need to ensure there are no bottlenecks which prevents system/hidden indices with segment replication.

Next steps

With segment replication strong reads are not guaranteed. Thus, if the plugin needs strong reads guarantees specially as alternative to change in behavior of refresh policy and lag on replicas (point 1 and 2 above), we need to update search requests to target primary shard only. With #7375, core now supports primary shards only based search. Please follow documentation for examples and details

Open questions

In case of any questions or issues, please post your question on core

Reference

[1] Design
[2] Documentation

Opensearch Plugins

OpenSearch-dashboard plugins

@dreamer-89 dreamer-89 added enhancement Enhancement or improvement to existing feature or request untriaged labels Jun 22, 2023
@mch2
Copy link
Member

mch2 commented Jun 22, 2023

Related, remote store compat with system indices is broken until #8193 is resolved.

@dreamer-89 dreamer-89 added the v2.9.0 'Issues and PRs related to version v2.9.0' label Jun 27, 2023
@dreamer-89 dreamer-89 self-assigned this Jun 27, 2023
This was referenced Jun 29, 2023
@dreamer-89
Copy link
Member Author

@opensearch-project/admin : Request to add v2.9.0 labels to above plugins issues.

@minalsha
Copy link
Contributor

minalsha commented Jul 6, 2023

@dreamer-89 , @Rishikesh1159 can you please share a sample test that we can refer to which we need to add to create system indices and verify that if system/hidden indices in a given plugin work correctly with segment replication?

@dreamer-89
Copy link
Member Author

dreamer-89 commented Jul 6, 2023

@dreamer-89 , @Rishikesh1159 can you please share a sample test that we can refer to which we need to add to create system indices and verify that if system/hidden indices in a given plugin work correctly with segment replication?

Thanks you @minalsha for your question. Sharing below details on how to create segment replication indices and adding tests. Please have a look and let me know if you have more questions.

Create segment replication indices

There are two ways you can enable segment replication for an index as mentioned below. The documentation also covers this.

Cluster level setting

Use cluster level setting to create all indices with segment replication. This setting takes 1) DOCUMENT (default) and 2) SEGMENT as values.

cluster.indices.replication.strategy: SEGMENT

Index level setting (overrides cluster level setting)

To enable segment replication at index level, we can pass in replication.type as SEGMENT. Please note, it overrides the cluster level setting.

curl -X PUT "$host/{index_name}" -H 'Content-Type: application/json' -d '{
  "settings": {
    "index": {
      "number_of_shards": 1,
      "number_of_replicas": 1, 
      "replication.type": "SEGMENT"
    }
  }
}'

Though, it is easier to follow the option of using DOCUMENT replication for special indices (system/hidden indices), it is not recommended in favour of future-proofing for below two reasons.

  1. Segment replication can be enabled at cluster level, which means all indices on segrep.
  2. Remote store feature (going GA in 2.10.0) enforces segment replication strategy. Thus, to ensure compatibility, we need to have plugins running with segment replication.

Adding tests

There are two options to add tests. But, please ensure your test uses > 0 number of replica shards number_of_replicas setting.

  1. Add new integration tests. Example on core for integ tests and bwc tests.

  2. Update exixsting integration tests to randomly use SEGMENT as replication strategy. This is better as it covers all the existing tests and thus entire functionality.

Validations

The validation may vary from different plugins and I will leave it to plugin owners to decide that. At minimum, you need to verify that your shards (both primary and replica) have same number of documents and are searchable.

@dbwiddis
Copy link
Member

dbwiddis commented Jul 6, 2023

To enable segment replication at index level, we can pass in replication.type as SEGMENT. Please note, it overrides the cluster level setting.

curl -X PUT "$host/{index_name}" -H 'Content-Type: application/json' -d '{
  "settings": {
    "index": {
      "number_of_shards": 1,
      "number_of_replicas": 1, 
      "replication.type": "SEGMENT"
    }
  }
}'

Hey @dreamer-89 I'm trying to use this programmatically when creating a system index to make sure ReplicationType is always DOCUMENT, and want to make sure I'm reading this correctly.

The current code uses CreateIndexRequest:

CreateIndexRequest request = new CreateIndexRequest(LOCK_INDEX_NAME) ...

So in my case if I create a settings object to pass a setting "replication.type" (which is probably a constant somewhere) of "DOCUMENT" (which is an enum) this now becomes:

CreateIndexRequest request = new CreateIndexRequest(LOCK_INDEX_NAME, replicationSettings) ...

this would always have that index set document even if the user chose SEGMENT for their cluster, correct?

@dreamer-89
Copy link
Member Author

To enable segment replication at index level, we can pass in replication.type as SEGMENT. Please note, it overrides the cluster level setting.

curl -X PUT "$host/{index_name}" -H 'Content-Type: application/json' -d '{
  "settings": {
    "index": {
      "number_of_shards": 1,
      "number_of_replicas": 1, 
      "replication.type": "SEGMENT"
    }
  }
}'

Hey @dreamer-89 I'm trying to use this programmatically when creating a system index to make sure ReplicationType is always DOCUMENT, and want to make sure I'm reading this correctly.

The current code uses CreateIndexRequest:

CreateIndexRequest request = new CreateIndexRequest(LOCK_INDEX_NAME) ...

So in my case if I create a settings object to pass a setting "replication.type" (which is probably a constant somewhere) of "DOCUMENT" (which is an enum) this now becomes:

CreateIndexRequest request = new CreateIndexRequest(LOCK_INDEX_NAME, replicationSettings) ...

this would always have that index set document even if the user chose SEGMENT for their cluster, correct?

Thanks @dbwiddis for your comment. Yes index level replication.type setting overwrites the cluster level setting. Though the ask here is to verify plugin compatibility with segment replication but not to overcome it by using DOCUMENT as replication strategy. Going forward SEGMENT will be the only replication strategy supported for certain configurations (e.g. Remote store at cluster level). Thus, there is no way other than to validate indices actually created with SEGMENT replication.

@dbwiddis
Copy link
Member

dbwiddis commented Jul 7, 2023

Though the ask here is to verify plugin compatibility with segment replication but not to overcome it by using DOCUMENT as replication strategy.

@dreamer-89 Understood. To be clear:

  1. I'm tracking the ask to verify compatibility.
  2. I'm asking about the best way to avoid the known performance regression that this is introducing. There has been no testing to measure how much the performance will degrade, just a statement "System indices are generally small and should copy out quickly, so I wouldn't expect this to be of major concern."

In Job Scheduler, a system index is used for locking in highly parallel, performance-sensitive use cases. We absolutely need a way for users in these use cases to change this. And while the "change the index setting" works for other indices, changing settings on system indices has more safeguards.

I take your comment above "SEGMENT will be the only replication strategy supported for certain configurations" to mean that my proposed hard-coded forcing of this setting is wrong.

What, then, is the correct action for a performance-sensitive application that does not want WAIT_UNTIL on their system index to take here?

@stephen-crawford
Copy link
Contributor

stephen-crawford commented Jul 7, 2023

To echo @dbwiddis, I am concerned with the effect that these changes will have on other sensitive system index use cases. With the current security model, we make use of a system index to store important user information such as internal role mapping. With opensearch-project/security#2903, we appear to handle the consistency issue which arises with the replication. However, we still don't account for the performance impact of copying the system index repeatedly. We already run into performance concerns due to reloading the nodes every time there is a configuration change related to Security: opensearch-project/security#2443. With this change, I am concerned we would be significantly increasing the cost as we now would need to update the duplicated system index frequently.

Perhaps I am conflating the processes at hand here, but I do think we should aim for verifiable performance metrics before making such a significant change.

@seankao-az
Copy link

I have a question on this:

With segment replication strong reads are not guaranteed. Thus, if the plugin needs strong reads guarantees specially as alternative to change in behavior of refresh policy and lag on replicas (point 1 and 2 above), we need to update search requests to target primary shard only.

In case of primary shard failure, does a replica get all the update before it gets promoted to primary? Otherwise strong read isn't guaranteed even if we read only from primary.

@anasalkouz
Copy link
Member

@seankao-az
Recovery scenarios are covered on SegRep and you should't face any data loss. We guarantee data durability by storing all operation in translog, which means incase of primary shard failure, replica can replay all remaining operations before being promoted as primary

@mch2
Copy link
Member

mch2 commented Jul 7, 2023

Today with document replication only way to get strong reads is with GET request by ID (which reads from translog independent of refresh cycle) or use the RefreshPolicy.WAIT_UNTIL during ingest followed by issuing a search request.

Segment Replication does not currently support either of these mechanisms. With WAIT_UNTIL the write would be ack'd only after the primary has made the docs searchable, the replicas would still be in the process of syncing. To achieve strong read after this request is made the search would also need to prefer primary to achieve a strong read.

There is no reason (that I'm currently aware of) why we can't support get by id. This is a miss - I made an issue here to support this.

In case of primary shard failure, does a replica get all the update before it gets promoted to primary? Otherwise strong read isn't guaranteed even if we read only from primary.

Yes, in failover case the new primary catches up before serving reads or new writes.

@dreamer-89
Copy link
Member Author

Though the ask here is to verify plugin compatibility with segment replication but not to overcome it by using DOCUMENT as replication strategy.

@dreamer-89 Understood. To be clear:

  1. I'm tracking the ask to verify compatibility.
  2. I'm asking about the best way to avoid the known performance regression that this is introducing. There has been no testing to measure how much the performance will degrade, just a statement "System indices are generally small and should copy out quickly, so I wouldn't expect this to be of major concern."

In Job Scheduler, a system index is used for locking in highly parallel, performance-sensitive use cases. We absolutely need a way for users in these use cases to change this. And while the "change the index setting" works for other indices, changing settings on system indices has more safeguards.

I take your comment above "SEGMENT will be the only replication strategy supported for certain configurations" to mean that my proposed hard-coded forcing of this setting is wrong.

What, then, is the correct action for a performance-sensitive application that does not want WAIT_UNTIL on their system index to take here?

Hi @dbwiddis,
Thanks for sharing your use-case and apologies for delay on my end.

Yes, if primary shard based searching does not work for you, then your plan as mentioned in job-scheduler issue sounds right to me.

@dreamer-89
Copy link
Member Author

Closing this issue as all plugins except alerting have done necessary changes for compatibility with SEGMENT replication feature. For alerting, created a separate issue on core to fix #9669 and created documentation issue to mark this as limitation in opensearch-project/documentation-website#4967

CC @anasalkouz @mch2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
distributed framework enhancement Enhancement or improvement to existing feature or request Indexing:Replication Issues and PRs related to core replication framework eg segrep v2.10.0
Projects
None yet
Development

No branches or pull requests

9 participants