Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility with segment replication #1029

Closed
Tracked by #8211
dreamer-89 opened this issue Jun 29, 2023 · 9 comments
Closed
Tracked by #8211

Compatibility with segment replication #1029

dreamer-89 opened this issue Jun 29, 2023 · 9 comments
Assignees
Labels
enhancement New feature or request v2.9.0

Comments

@dreamer-89
Copy link
Member

dreamer-89 commented Jun 29, 2023

Summary

With 2.9.0 release, there are lot of enhancements going in for segment replication[1][2] feature (went GA in 2.7.0), we need to ensure different plugins are compatible with current state of this feature. Previously, we ran tests on plugin repos to verify this compatibility but want plugin owners to be aware of these changes so that required updates (if any) can be made. With 2.10.0 release, remote store feature is going GA which internally uses SEGMENT replication strategy only i.e. it enforces all indices to use SEGMENT replication strategy. So, it is important to validate plugins are compatible with segment replication feature.

What changed

1. Refresh policy behavior

  1. RefreshPolicy.IMMEDIATE will only refresh primary shards but not replica shards immediately. Instead post refresh, primary will start a round of segment replication to update the replica shard copies leading to eventual consistency.
  2. RefreshPolicy.WAIT_UNTIL ensures the indexing operation is searchable in your cluster i.e. RAW (Read after write guarantee). With segment replication, this guarantee is not promised due to delay in replica shared updates from asynchronous background refreshes.

2. Refresh lag on replicas

With segment replication, there is inherent delay in documents to be searchable on replica shard copies. This is due to the fact that replica shard copies over data (segment) files from primary. Thus, compared to document replication, there will be on average increase in amount of time the replica shards are consistent with primaries.

3. System/hidden indices support

With opensearch-project/OpenSearch#8200, system and hidden indices are now supported with SEGMENT replication strategy. We need to ensure there are no bottlenecks which prevents system/hidden indices with segment replication.

Next steps

With segment replication strong reads are not guaranteed. Thus, if the plugin needs strong reads guarantees specially as alternative to change in behavior of refresh policy and lag on replicas (point 1 and 2 above), we need to update search requests to target primary shard only. With opensearch-project/OpenSearch#7375, core now supports primary shards only based search. Please follow documentation for examples and details

Open questions

In case of any questions or issues, please post it in core issue

Reference

[1] Design

[2] Documentation

@dreamer-89
Copy link
Member Author

Request owners to add v2.9.0 label on this issue.

@ankitkala
Copy link
Member

Let's validate 2 things:

  • Manual round of CCR sanity with leader & follower on segrep. Also verify that replication between primary & replica happens without any issues.
  • Enable SegRep by default for all indices and run integration test suite ensuring a clean successful run. Replication metadata is forced to be on docrep as of now, so let's remove that limitation(and verify) as well while testing.

@dreamer-89
Copy link
Member Author

Hi Plugin Owners,
Gentle reminder to look into this issue as code freeze date for 2.9.0 release is near i.e. July 11th.

@monusingh-1
Copy link
Collaborator

monusingh-1 commented Jul 11, 2023

Published Opensearch 2.x to local

./gradlew publishToMavenLocal -Dbuild.snapshot=true

Updated CCR to use SEGEMENT REPLICATION

 return Settings.builder()
                .put(IndexMetadata.INDEX_NUMBER_OF_SHARDS_SETTING.key, 1)
                .put(IndexMetadata.INDEX_AUTO_EXPAND_REPLICAS_SETTING.key, "0-1")
                .put(IndexMetadata.INDEX_PRIORITY_SETTING.key, Int.MAX_VALUE)
                .put(IndexMetadata.INDEX_HIDDEN_SETTING.key, true)
                .put(IndexMetadata.INDEX_REPLICATION_TYPE_SETTING.key, ReplicationType.SEGMENT)
                .build()
    }

Running integ tests

@monusingh-1
Copy link
Collaborator

Deprecated Gradle features were used in this build, making it incompatible with Gradle 8.0.

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

See https://docs.gradle.org/7.6.1/userguide/command_line_interface.html#sec:command_line_warnings

BUILD SUCCESSFUL in 23m 39s

@monusingh-1
Copy link
Collaborator

Integration tests passing, completed manual testing with segment replication.

 curl -k -u 'admin:admin' -X PUT "http://localhost:9200/remote-index?pretty" -H 'Content-Type: application/json' -d'
  {
    "settings": {
      "index": {
        "number_of_shards": 1,
        "number_of_replicas": 0,
         "replication.type": "SEGMENT"
      }
    }
  }
  '
{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "remote-index"
}
❯ curl -k -u 'admin:admin' -X PUT "http://localhost:9201/_plugins/_replication/remote-index/_start?pretty" -H 'Content-Type: application/json' -d'
{
  "leader_alias": "leader-cluster",
  "leader_index": "remote-index",
  "use_roles": {
    "leader_cluster_role": "all_access",
    "follower_cluster_role": "all_access"
  }
}
'
{
  "acknowledged" : true
}
❯ curl -k -u 'admin:admin' "http://localhost:9200/_cat/indices?v"
health status index        uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   remote-index VCLPM0PnS_mOEO_yA4vFpg   1   0          0            0       230b           230b
yellow open   fruit-1      gS_LYuU6RvGAeItyNT8DhQ   1   1          1            0      3.7kb          3.7kb
❯ curl -k -u 'admin:admin' "http://localhost:9201/_cat/indices?v"
health status index        uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   remote-index FyGGjob7Q16BZca0apCFpg   1   0          0            0       230b           230b
yellow open   fruit-1      A1q757X0RfWKh_l7Lhc0zQ   1   1          1            0      3.7kb          3.7kb
❯ curl -k -u 'admin:admin' "http://localhost:9201/_plugins/_replication/remote-index/_status?pretty"
{
  "status" : "SYNCING",
  "reason" : "User initiated",
  "leader_alias" : "leader-cluster",
  "leader_index" : "remote-index",
  "follower_index" : "remote-index",
  "syncing_details" : {
    "leader_checkpoint" : -1,
    "follower_checkpoint" : -1,
    "seq_no" : 0
  }
}
❯ curl -k -u 'admin:admin' -X POST "http://localhost:9200/remote-index/_doc/?pretty" -H 'Content-Type: application/json' -d'
  {
      "user" : "kimchy",
      "post_date" : "2009-11-15T14:12:12",
      "message" : "trying out Elasticsearch5"
  }
  '
{
  "_index" : "remote-index",
  "_id" : "4g0qRIkBQVdTOfb4SM6v",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}
❯ curl -k -u 'admin:admin' "http://localhost:9201/_plugins/_replication/remote-index/_status?pretty"
{
  "status" : "SYNCING",
  "reason" : "User initiated",
  "leader_alias" : "leader-cluster",
  "leader_index" : "remote-index",
  "follower_index" : "remote-index",
  "syncing_details" : {
    "leader_checkpoint" : 0,
    "follower_checkpoint" : -1,
    "seq_no" : 0
  }
}
❯ curl -k -u 'admin:admin' "http://localhost:9200/_plugins/_replication/leader_stats?pretty"
{
  "num_replicated_indices" : 2,
  "operations_read" : 2,
  "translog_size_bytes" : 365,
  "operations_read_lucene" : 0,
  "operations_read_translog" : 2,
  "total_read_time_lucene_millis" : 0,
  "total_read_time_translog_millis" : 4,
  "bytes_read" : 250,
  "index_stats" : {
    "fruit-1" : {
      "operations_read" : 1,
      "translog_size_bytes" : 120,
      "operations_read_lucene" : 0,
      "operations_read_translog" : 1,
      "total_read_time_lucene_millis" : 0,
      "total_read_time_translog_millis" : 3,
      "bytes_read" : 53
    },
    "remote-index" : {
      "operations_read" : 1,
      "translog_size_bytes" : 245,
      "operations_read_lucene" : 0,
      "operations_read_translog" : 1,
      "total_read_time_lucene_millis" : 0,
      "total_read_time_translog_millis" : 1,
      "bytes_read" : 197
    }
  }
}%
❯ curl -k -u 'admin:admin' "http://localhost:9201/_plugins/_replication/follower_stats?pretty"
{
  "num_syncing_indices" : 2,
  "num_bootstrapping_indices" : 0,
  "num_paused_indices" : 0,
  "num_failed_indices" : 0,
  "num_shard_tasks" : 2,
  "num_index_tasks" : 2,
  "operations_written" : 2,
  "operations_read" : 2,
  "failed_read_requests" : 0,
  "throttled_read_requests" : 0,
  "failed_write_requests" : 0,
  "throttled_write_requests" : 0,
  "follower_checkpoint" : 0,
  "leader_checkpoint" : 0,
  "total_write_time_millis" : 119,
  "index_stats" : {
    "fruit-1" : {
      "operations_written" : 1,
      "operations_read" : 1,
      "failed_read_requests" : 0,
      "throttled_read_requests" : 0,
      "failed_write_requests" : 0,
      "throttled_write_requests" : 0,
      "follower_checkpoint" : 0,
      "leader_checkpoint" : 0,
      "total_write_time_millis" : 24
    },
    "remote-index" : {
      "operations_written" : 1,
      "operations_read" : 1,
      "failed_read_requests" : 0,
      "throttled_read_requests" : 0,
      "failed_write_requests" : 0,
      "throttled_write_requests" : 0,
      "follower_checkpoint" : 0,
      "leader_checkpoint" : 0,
      "total_write_time_millis" : 95
    }
  }
}%
❯ curl -k -u 'admin:admin' -X POST "http://localhost:9201/_plugins/_replication/remote-index/_pause?pretty" -H 'Content-Type: application/json' -d '{}'
{
  "acknowledged" : true
}
❯ curl -k -u 'admin:admin' "http://localhost:9201/_plugins/_replication/remote-index/_status?pretty"
{
  "status" : "PAUSED",
  "reason" : "User initiated",
  "leader_alias" : "leader-cluster",
  "leader_index" : "remote-index",
  "follower_index" : "remote-index"
}
❯ curl -k -u 'admin:admin' -X POST "http://localhost:9201/_plugins/_replication/remote-index/_resume?pretty" -H 'Content-Type: application/json' -d '{}'
{
  "acknowledged" : true
}
❯ curl -k -u 'admin:admin' "http://localhost:9201/_plugins/_replication/remote-index/_status?pretty"
{
  "status" : "SYNCING",
  "reason" : "User initiated",
  "leader_alias" : "leader-cluster",
  "leader_index" : "remote-index",
  "follower_index" : "remote-index",
  "syncing_details" : {
    "leader_checkpoint" : 0,
    "follower_checkpoint" : 0,
    "seq_no" : 1
  }
}
❯
curl -k -u 'admin:admin' -X POST "http://localhost:9200/remote-index/_doc/?pretty" -H 'Content-Type: application/json' -d'
  {
      "user" : "kimchy2",
      "post_date" : "2009-11-15T14:12:12",
      "message" : "trying out Elasticsearch5"
  }
  '
{
  "_index" : "remote-index",
  "_id" : "5A0rRIkBQVdTOfb4GM4Z",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 1,
  "_primary_term" : 1
}
❯ curl -k -u 'admin:admin' "http://localhost:9201/_plugins/_replication/remote-index/_status?pretty"
{
  "status" : "SYNCING",
  "reason" : "User initiated",
  "leader_alias" : "leader-cluster",
  "leader_index" : "remote-index",
  "follower_index" : "remote-index",
  "syncing_details" : {
    "leader_checkpoint" : 1,
    "follower_checkpoint" : 1,
    "seq_no" : 2
  }
}
❯ curl -k -u 'admin:admin' -X POST "http://localhost:9201/_plugins/_replication/remote-index/_stop?pretty" -H 'Content-Type: application/json' -d '{}'
{
  "acknowledged" : true
}
❯ curl -k -u 'admin:admin' "http://localhost:9201/_plugins/_replication/remote-index/_status?pretty"
{
  "status" : "REPLICATION NOT IN PROGRESS"
}
❯ curl -k -u 'admin:admin' -X POST "http://localhost:9201/remote-index/_doc/?pretty" -H 'Content-Type: application/json' -d'
quote>
❯  curl -k -u 'admin:admin' -X POST "http://localhost:9201/remote-index/_doc/?pretty" -H 'Content-Type: application/json' -d'
  {
      "user" : "kimchy3",
      "post_date" : "2009-11-15T14:12:12",
      "message" : "trying out Elasticsearch5"
  }
  '
{
  "_index" : "remote-index",
  "_id" : "NwsrRIkB5zqszzZaeItm",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 2,
  "_primary_term" : 4
}

@monusingh-1 monusingh-1 self-assigned this Jul 11, 2023
@dreamer-89
Copy link
Member Author

@ankitkala @monusingh-1: Thanks for working on this issue. I understand from work here that there is no change needed in CCR plugin with respect to ask in this issue. Please confirm.

@monusingh-1
Copy link
Collaborator

Hi @dreamer-89 , no change needed

@dreamer-89
Copy link
Member Author

Thanks @monusingh-1 for the confirmation. Also, pinged @ankitkala offline where they mentioned no change is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request v2.9.0
Projects
None yet
Development

No branches or pull requests

3 participants