[BUG] Bidirectional Cross Cluster Replication Using alias for both read and writeIndex with true cause replication fails

**What is the bug?**
When you try to setup the cross cluster replication bidirectional setup and then setup both read and write aliases(writeIndex=true), it results in an exception during replication. Whenever, we set writeIndex to true, it fails when trying to write using alias.
**Basically, when writeIndex set to true, then it returns an error that there is more than one write index to write with same alias.
But, if we don't enable writeIndex, then, it says no alias exists for the write index.**

**So, when we want to achieve bidirectional replication using alias it never works, either ways.**

I followed the elasticsearch documentation **(https://www.elastic.co/blog/bi-directional-replication-with-elasticsearch-cross-cluster-replication-ccr)**. However, I understand that in opensearch there are some differences which I followed i.e. used autofollow rules to setup bidirectional replication

`| java.lang.IllegalStateException: alias [cooker] has more than one write index [cooker-dc2,cooker-dc1]
opensearch-node2       |        at org.opensearch.cluster.metadata.IndexAbstraction$Alias.computeAndValidateAliasProperties(IndexAbstraction.java:305) ~[opensearch-2.16.0.jar:2.16.0]  
opensearch-node2       |        at org.opensearch.cluster.metadata.Metadata$Builder.lambda$buildIndicesLookup$13(Metadata.java:1804) ~[opensearch-2.16.0.jar:2.16.0]
opensearch-node2       |        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) ~[?:?]
opensearch-node2       |        at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179) ~[?:?]
opensearch-node2       |        at java.base/java.util.TreeMap$ValueSpliterator.forEachRemaining(TreeMap.java:3250) ~[?:?]
opensearch-node2       |        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) ~[?:?]
opensearch-node2       |        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) ~[?:?]
opensearch-node2       |        at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) ~[?:?]
opensearch-node2       |        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) ~[?:?]
opensearch-node2       |        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
opensearch-node2       |        at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596) ~[?:?]
opensearch-node2       |        at org.opensearch.cluster.metadata.Metadata$Builder.buildIndicesLookup(Metadata.java:1804) ~[opensearch-2.16.0.jar:2.16.0]
opensearch-node2       |        at org.opensearch.cluster.metadata.Metadata$Builder.buildMetadataWithRecomputedIndicesLookups(Metadata.java:1712) ~[opensearch-2.16.0.jar:2.16.0]
opensearch-node2       |        at org.opensearch.cluster.metadata.Metadata$Builder.build(Metadata.java:1572) ~[opensearch-2.16.0.jar:2.16.0]
opensearch-node2       |        at org.opensearch.cluster.ClusterState$Builder.metadata(ClusterState.java:655) ~[opensearch-2.16.0.jar:2.16.0]
opensearch-node2       |        at org.opensearch.snapshots.RestoreService$1.execute(RestoreService.java:663) ~[opensearch-2.16.0.jar:2.16.0]
opensearch-node2       |        at org.opensearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:67) ~[opensearch-2.16.0.jar:2.16.0]
opensearch-node2       |        at org.opensearch.cluster.service.MasterService.executeTasks(MasterService.java:912) ~[opensearch-2.16.0.jar:2.16.0]
opensearch-node2       |        at org.opensearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:464) ~[opensearch-2.16.0.jar:2.16.0]
opensearch-node2       |        at org.opensearch.cluster.service.MasterService.runTasks(MasterService.java:324) ~[opensearch-2.16.0.jar:2.16.0]
opensearch-node2       |        at org.opensearch.cluster.service.MasterService$Batcher.run(MasterService.java:228) ~[opensearch-2.16.0.jar:2.16.0]
opensearch-node2       |        at org.opensearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:210) ~[opensearch-2.16.0.jar:2.16.0]
opensearch-node2       |        at org.opensearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:252) ~[opensearch-2.16.0.jar:2.16.0]
opensearch-node2       |        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:882) [opensearch-2.16.0.jar:2.16.0]
opensearch-node2       |        at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:283) ~[opensearch-2.16.0.jar:2.16.0]
opensearch-node2       |        at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:246) ~[opensearch-2.16.0.jar:2.16.0]
opensearch-node2       |        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
opensearch-node2       |        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
opensearch-node2       |        at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
opensearch-node2       | [2025-03-09T11:14:00,311][DEBUG][o.o.r.t.i.IndexReplicationTask] [opensearch-node2] [cooker-dc1] Cluster metadata listener invoked on index task...
opensearch-node2       | [2025-03-09T11:14:00,312][DEBUG][o.o.r.t.i.IndexReplicationTask] [opensearch-node2] [cooker-dc1] {REPLICATION_LAST_KNOWN_OVERALL_STATE=RUNNING} from the cluster state
opensearch-node2       | [2025-03-09T11:14:00,314][ERROR][o.o.r.t.i.IndexReplicationTask] [opensearch-node2] [cooker-dc1] Moving replication[IndexReplicationTask:370][reason=Unable to initiate restore call for cooker-dc1 from my-connection-alias:cooker-dc1] to failed state`




**How can one reproduce the bug?**
Steps to reproduce the behavior:
1. Follow the steps mentioned here https://www.elastic.co/blog/bi-directional-replication-with-elasticsearch-cross-cluster-replication-ccr
2. Then create autofollow patterns for replication.

It will throw an error that more than one writeIndex is enabled for the alias during replication.

**What is the expected behavior?**
When indexes are getting replicated, it should not copy the writeIndex metadata from leader cluster. The replicated index is always read only, so it is not correct to copy the writeIndex attribute. **Either you set or not** the writeIndex attribute of replicated index, the replicated index is always read only.

**What is your host/environment?**
 - OS: Windows
 - Version 2.16.0
 - Plugins - cross-cluster-replication

**My analysis**
I tried to test by changing the plugin code to make writeIndex always to false. But before that at the first step during snapshot restore  in class **RestoreService.java** inside Opensearch repository, it copies all the aliases by default. This is reason, why it always copies the aliases with the writeIndex attribute if there.  So, even if I turn the writeIndex to false in cross-cluster-plugin during sync it will not work. We need to update in the opensearch repository to fix this change and also in cross cluster replication plugin during monitoring and other phases.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Bidirectional Cross Cluster Replication Using alias for both read and writeIndex with true cause replication fails #1511

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Bidirectional Cross Cluster Replication Using alias for both read and writeIndex with true cause replication fails #1511

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions