Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUGFIX] Add cancellation_time_millis to resolve Strict Dynamic Mapping issue in .tasks index #16201

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

inpink
Copy link

@inpink inpink commented Oct 5, 2024

Related Issues

Resolves #16060

Description

[The result of the execution]

  • Click the image to view it in full size.
Before After
before after
StrictDynamicMappingException occurred not occurred
  • When applying and then deleting the auto follow rule in the CCR Plugin(cross-cluster-replication), the .tasks index is properly updated without StrictDynamicMappingException.

[Background]

  • OpenSearch allows a Follower Cluster to replicate indexes from a Leader Cluster through the Opensearch CCR Plugin.
  • However, when cancelling the auto follow rule in the CCR Plugin, a StrictDynamicMappingException was previously encountered:
[2024-09-28T12:01:16,819][WARN ][o.o.r.t.a.AutoFollowTask ] [5460424aac92][my-connection-alias] Error storing result StrictDynamicMappingException[mapping set to strict, dynamic introduction of [cancellation_time_millis] within [task] is not allowed]
2024-09-28 21:01:16     at org.opensearch.index.mapper.DocumentParser.parseDynamicValue(DocumentParser.java:876)
2024-09-28 21:01:16     at org.opensearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:722)
2024-09-28 21:01:16     at org.opensearch.index.mapper.DocumentParser.innerParseObject(DocumentParser.java:461)
2024-09-28 21:01:16     at org.opensearch.index.mapper.DocumentParser.parseObjectOrNested(DocumentParser.java:419)
2024-09-28 21:01:16     at org.opensearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:524)
2024-09-28 21:01:16     at org.opensearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:545)
2024-09-28 21:01:16     at org.opensearch.index.mapper.DocumentParser.innerParseObject(DocumentParser.java:447)
2024-09-28 21:01:16     at org.opensearch.index.mapper.DocumentParser.parseObjectOrNested(DocumentParser.java:419)
2024-09-28 21:01:16     at org.opensearch.index.mapper.DocumentParser.internalParseDocument(DocumentParser.java:138)
2024-09-28 21:01:16     at org.opensearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:93)
2024-09-28 21:01:16     at org.opensearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:256)
2024-09-28 21:01:16     at org.opensearch.index.shard.IndexShard.prepareIndex(IndexShard.java:1178)
2024-09-28 21:01:16     at org.opensearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:1135)
2024-09-28 21:01:16     at org.opensearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:1052)
2024-09-28 21:01:16     at org.opensearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:625)
2024-09-28 21:01:16     at org.opensearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:471)
2024-09-28 21:01:16     at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
2024-09-28 21:01:16     at org.opensearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:535)
2024-09-28 21:01:16     at org.opensearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:416)
2024-09-28 21:01:16     at org.opensearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:125)
2024-09-28 21:01:16     at org.opensearch.action.support.replication.TransportWriteAction$1.doRun(TransportWriteAction.java:275)
2024-09-28 21:01:16     at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1005)
2024-09-28 21:01:16     at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
2024-09-28 21:01:16     at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
2024-09-28 21:01:16     at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
2024-09-28 21:01:16     at java.base/java.lang.Thread.run(Thread.java:1583)
  • When a task is cancelled, the .tasks index should be updated
  • The .tasks index has strict dynamic mapping enabled, but the cancellation_time_millis field is missing from task-index-mapping.json. As a result, this was causing a StrictDynamicMappingException.

[PR contents]

  • Added the cancellation_time_millis field to task-index-mapping.json:
...
          "cancellation_time_millis": {
            "type": "long"
          },
...
  • This prevents the StrictDynamicMappingException from occurring.

[Test]

  • To test this bug, I needed a setup with two custom OpenSearch 3.0.0v clusters and the CCR Plugin 3.0.0v installed. Due to the complexity of the environment, writing a test directly in the OpenSearch repository was not feasible.
  • I found it difficult to set up two CCR-enabled clusters directly within the OpenSearch repository. Existing CCR tests such as CrossClusterSearchIT and CrossClusterSearchUnavailableClusterIT seem to mock virtual clusters rather than deploying real ones.
  • Instead, I built a local testing environment using Docker. If you know of a better way to test this, I would be happy to incorporate your suggestions.
  1. I set up two clusters with the CCR Plugin installed. I cloned the OpenSearch GitHub repository and modified the task-index-mapping.json.
  2. After modifying OpenSearch, I assemble it, generating the opensearch-min-3.0.0-SNAPSHOT-linux-arm64.tar.gz file.
  3. I then cloned and assembled the OpenSearch CCR GitHub repository, which generated opensearch-cross-cluster-replication-3.0.0.0-SNAPSHOT.zip.
  4. Using a Dockerfile, I built a Docker image. In the same directory as the Dockerfile, I included the following files:
    opensearch-min-3.0.0-SNAPSHOT-linux-arm64.tar.gz, opensearch-cross-cluster-replication-3.0.0.0-SNAPSHOT.zip, opensearch.yml, opensearch-docker-entrypoint.sh, opensearch-onetime-setup.sh
image
  1. I used Docker Compose to create two clusters.
  2. Finally, I installed the CCR plugin on each cluster:
docker exec -it [container] /bin/bash
/usr/share/opensearch/bin/opensearch-plugin install file:/usr/share/opensearch/opensearch-cross-cluster-replication-3.0.0.0-SNAPSHOT.zip
  1. I registered an auto follow rule and then canceled it to verify the behavior. Set up a cross-cluster connection, get-started-with-auto-follow:
curl -XPUT -k -H 'Content-Type: application/json'  'http://localhost:9200/_cluster/settings?pretty' -d '
{
  "persistent": {
    "cluster": {
      "remote": {
        "my-connection-alias": {
          "seeds": ["localhost:9300"]
        }
      }
    }
  }
}'
curl -XPUT -k -H 'Content-Type: application/json'  'http://localhost:9201/leader-01?pretty'
curl -XPOST -k -H 'Content-Type: application/json' -u 'admin:Yhj99!009' 'https://localhost:9200/_plugins/_replication/_autofollow?pretty' -d '
{
   "leader_alias" : "my-connection-alias",
   "name": "my-replication-rule",
   "pattern": "movies*",
   "use_roles":{
      "leader_cluster_role": "all_access",
      "follower_cluster_role": "all_access"
   }
}'
curl -XDELETE -k -H 'Content-Type: application/json'  'http://localhost:9200/_plugins/_replication/_autofollow?pretty' -d '
{
   "leader_alias" : "my-connection-alias",
   "name": "my-replication-rule"
}'
  1. I checked the logs to confirm if StrictDynamicMappingException occurred. After modifying the task-index-mapping.json correctly, I observed that instead of the previous StrictDynamicMappingException, the task status was successfully updated.
replication-node22-orin: {"type": "server", "timestamp": "2024-10-05T04:37:02,764Z", "level": "INFO", "component": "o.o.r.t.a.AutoFollowTask", "cluster.name": "follower-cluster", "node.name": "6b9ce94a6ee6", "message": "[my-connection-alias] Going to mark AutoFollowTask:97 task as completed", "cluster.uuid": "6afNBEthSgaKVY55dxNk8Q", "node.id": "0Zrg1aZFS_Sov4YAJjCDAg"  }
replication-node22-orin: {"type": "server", "timestamp": "2024-10-05T04:37:02,766Z", "level": "INFO", "component": "o.o.r.t.a.AutoFollowTask", "cluster.name": "follower-cluster", "node.name": "6b9ce94a6ee6", "message": "[my-connection-alias] Completed the task with id:97", "cluster.uuid": "6afNBEthSgaKVY55dxNk8Q", "node.id": "0Zrg1aZFS_Sov4YAJjCDAg"  }
replication-node22-orin: {"type": "server", "timestamp": "2024-10-05T04:37:02,770Z", "level": "INFO", "component": "o.o.r.t.a.AutoFollowTask", "cluster.name": "follower-cluster", "node.name": "6b9ce94a6ee6", "message": "[my-connection-alias] Successfully persisted task status", "cluster.uuid": "6afNBEthSgaKVY55dxNk8Q", "node.id": "0Zrg1aZFS_Sov4YAJjCDAg"  }

Below are the files I used for testing, along with their sources:

�Name File Source
DockerFile link "opensearch-build" al2023.dockerfile
docker-compose link "opensearch" distribution docker-compose
log4j2.properties link "opensearch-build" log4j2.properties
opensearch-docker-entrypoint-default.x.sh link "opensearch-build" entrypoint
opensearch-onetime-setup.sh link "opensearch-build" onetime
opensearch.yml link "opensearch" distribution yml
opensearch-min-3.0.0-SNAPSHOT-linux-arm64.tar.gz link Generated by cloning the OpenSearch repository and assembling the project.
opensearch-cross-cluster-replication-3.0.0.0-SNAPSHOT.zip link Generated by cloning the OpenSearch CCR Plugin repository and assembling the project.

My test environment was Mac OS M2. If you are using a different operating system, replace the .tar.gz file with the version that matches your system.

If needed, I am happy to provide the Docker image used in my test. Please feel free to request it if required.

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

github-actions bot commented Oct 5, 2024

❌ Gradle check result for 839a1bc: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Oct 7, 2024

❌ Gradle check result for 7946ad9: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

…g issue in .tasks index

- Fixed issue where `.tasks` index failed to update due to StrictDynamicMappingException when a task was cancelled.
- Added missing `cancellation_time_millis` field to `task-index-mapping.json`.
- Ensured proper handling of task cancellation events in Cross-Cluster Replication (CCR) by updating the mappings.
- Verified by creating and deleting an auto follow rule without StrictDynamicMappingException.

Signed-off-by: inpink <[email protected]>
Copy link
Contributor

github-actions bot commented Oct 8, 2024

❌ Gradle check result for c15c90a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Other
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] cancellation_time_millis is not getting added in .tasks index
1 participant