Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/elastic store update #5210

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

dylanturn
Copy link

@dylanturn dylanturn commented Apr 5, 2022

Description

Bumped the version of elastic4s to 7.10.3 and updated the various classes that implemented this library so that new versions of Opensearch and Elasticseach can be used as activation stores.

Related issue and scope

My changes affect the following components

  • API
  • Controller
  • Message Bus (e.g., Kafka)
  • Loadbalancer
  • Scheduler
  • Invoker
  • Intrinsic actions (e.g., sequences, conductors)
  • Data stores (e.g., CouchDB)
  • Tests
  • Deployment
  • CLI
  • General tooling
  • Documentation

Types of changes

  • Bug fix (generally a non-breaking change which closes an issue).
  • Enhancement or new feature (adds new functionality).
  • Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

  • I signed an Apache CLA.
  • I reviewed the style guides and followed the recommendations (Travis CI will check :).
  • I added tests to cover my changes.
  • My changes require further changes to the documentation.
  • I updated the documentation where necessary.

@dylanturn
Copy link
Author

For the record, I'm trying to work out how to sign the CLA.

@rabbah
Copy link
Member

rabbah commented Apr 5, 2022

Thanks @dylanturn for the contribution 🎉

Copy link
Member

@rabbah rabbah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo the GHA path.

@@ -0,0 +1,36 @@
# This workflow uses actions that are not certified by GitHub.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the goal here to run GHA instead of using Travis?
In any case this change seems orthogonal to the rest of the PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created that because I don't have a Travis or Jenkins environment to build this in and my personal laptop has an M1 chip in it. Additionally, I wanted to be sure that I had fixed the issue before I submitted the PR but left it in because I felt it could be beneficial to others who might want to build the project but don't have much Jenkins, Travis, or Gradle experience.

The action only runs when triggered manually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are some unittests for es in tests/src/test/scala/org/apache/openwhisk/core/database/elasticsearch/ElasticSearchActivationStoreTests.scala which is using testcontainers, so I think you can use it to run tests on your laptop?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It failed when trying to download one of the dependencies. I'll try again so I can get the dependency name.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick update, I tried to build the project locally again and got the following error:

Execution failed for task ':core:scheduler:generateProto'.
> Could not resolve all files for configuration ':core:scheduler:protobufToolsLocator_protoc'.
   > Could not find protoc-osx-aarch_64.exe (com.google.protobuf:protoc:3.4.0).
     Searched in the following locations:
         https://repo.maven.apache.org/maven2/com/google/protobuf/protoc/3.4.0/protoc-3.4.0-osx-aarch_64.exe

I must admit, this is my first time with Scala and Gradle, and while I think I might know what happened, I'm still trying to work out how to go about fixing it.

@dylanturn
Copy link
Author

dylanturn commented Apr 5, 2022

For the record, I'm trying to work out how to sign the CLA.

Quick update. I've submitted a signed CLA.

Update Edit: The CLA has been received and filed!! 🎉

@@ -89,8 +89,8 @@ dependencies {
compile "io.reactivex:rxjava:1.3.8"
compile "io.reactivex:rxjava-reactive-streams:1.2.1"
compile "com.microsoft.azure:azure-cosmosdb:2.6.2"
compile "com.sksamuel.elastic4s:elastic4s-client-esjava_${gradle.scala.depVersion}:7.10.3"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this compatible with elasticsearch 6.x version?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked that 6.x is not compatible with this PR because 6.x requires a type parameter for index and deleteByQuery method while 7.x doesn't want it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned about the license.
I didn't look into it deeply but from a certain version of ES-7.x they would no longer support it with the Apache license.
Not sure we can upgrade the ES version.

Copy link
Author

@dylanturn dylanturn Apr 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look into it deeply but from a certain version of ES-7.x they would no longer support it with the Apache license.

What wouldn't they support? Elastic changed the license model sometime after 7.10.x, so this is still entirely covered Apache 2.0.

It's also worth mentioning we don't have to use Elasticsearch. This update makes it possible to use Opensearch, which is covered entirely by Apache 2.0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new scheduler relies on multiple filtered aliases against one ES index.
I am unclear ES 7.x support this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new scheduler relies on multiple filtered aliases against one ES index. I am unclear ES 7.x support this.

I'm not sure what to tell you, I've been using OpenWhisk with my ES 7.10 (Opensearch) cluster for the last week without any issues...

For what it's worth I'm working on updating the tests so they work correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you using FPCScheduler too?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I don't think I am.

That said, I can't find anything in the Elasticsearch documentation or release notes that suggests "multiple filtered aliases against one ES index" would have been removed.

Was it something you read that gave you cause for concern?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@style95 I think new scheduler doen't rely on multiple filtered aliase, and even not on elasticsearch

we are using multiple filtered aliase to organize activations by namespaces in our downstream, we may need to consider changing that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm..
If we don't use multiple filtered aliases, it would create a huge number of indices for all namespaces even if they rarely invoke any actions.
It is necessary to have shared indices for namespaces without less invocation.

@codecov-commenter
Copy link

Codecov Report

Merging #5210 (87b3dbd) into master (829e734) will decrease coverage by 39.69%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##           master   #5210       +/-   ##
==========================================
- Coverage   44.24%   4.55%   -39.70%     
==========================================
  Files         238     238               
  Lines       13957   13956        -1     
  Branches      570     565        -5     
==========================================
- Hits         6175     635     -5540     
- Misses       7782   13321     +5539     
Impacted Files Coverage Δ
...e/elasticsearch/ElasticSearchActivationStore.scala 0.00% <0.00%> (-58.83%) ⬇️
...scheduler/queue/ElasticSearchDurationChecker.scala 0.00% <0.00%> (-80.77%) ⬇️
.../main/scala/org/apache/openwhisk/core/WarmUp.scala 0.00% <0.00%> (-100.00%) ⬇️
...ain/scala/org/apache/openwhisk/spi/SpiLoader.scala 0.00% <0.00%> (-100.00%) ⬇️
...scala/org/apache/openwhisk/core/FeatureFlags.scala 0.00% <0.00%> (-100.00%) ⬇️
...la/org/apache/openwhisk/http/BasicRasService.scala 0.00% <0.00%> (-100.00%) ⬇️
...pache/openwhisk/http/LenientSprayJsonSupport.scala 0.00% <0.00%> (-100.00%) ⬇️
...ache/openwhisk/core/database/DocumentFactory.scala 0.00% <0.00%> (-100.00%) ⬇️
...che/openwhisk/core/invoker/LogStoreCollector.scala 0.00% <0.00%> (-100.00%) ⬇️
...he/openwhisk/core/invoker/NamespaceBlacklist.scala 0.00% <0.00%> (-100.00%) ⬇️
... and 120 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 829e734...87b3dbd. Read the comment docs.

@dylanturn
Copy link
Author

Looking at the latest Travis CI build results:

Failed Tests
20219.2 Unit Tests | 20219.4 Multi-Runtime Tests | 20219.7 Scheduler Tests
I checked every play execution on each test and they all looked like they ran just fine. The error seems happen when trying to upload the results.
Error Message: "File upload failed."

Errored Tests
20219.3 System Tests
All the tests (including :core:scheduler:processScoverageResources and :core:scheduler:scoverageClasses) seem to run fine up to :tests:testCoverageLean which failed with a very long error message. Here is an excerpt:

system.basic.WskConductorTests > Whisk conductor actions should invoke a conductor action with a continuation FAILED

    org.scalatest.exceptions.TestFailedException: error waiting for activation f4847a3c88384251847a3c8838925132 for 120 seconds: No activation record for'f4847a3c88384251847a3c8838925132'

It's not immediately obvious why the error was thrown, I'll have to do some more investigation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants