Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(tracking): Test demos for 25.3.0 #187

Closed
13 tasks done
Tracked by #686
NickLarsenNZ opened this issue Mar 25, 2025 · 13 comments
Closed
13 tasks done
Tracked by #686

chore(tracking): Test demos for 25.3.0 #187

NickLarsenNZ opened this issue Mar 25, 2025 · 13 comments

Comments

@NickLarsenNZ
Copy link
Member

NickLarsenNZ commented Mar 25, 2025

Release Demo Testing

Part of stackabletech/issues#686

Tip

This step is mostly to check that images pull.

This is testing that the new release demos work as documented from scratch.

Note

Record any issues or anomalies during the process in a comment on this issue.
Eg:

:green_circle: **airflow-scheduled-job**

The CRD had been updated and I needed to change the following in the manifest:
...

Replace the items in the task lists below with the applicable Pull Requests (if any).

Note

At this point, the new release docs are still versioned as nightly.

25.3 from Scratch Testing Instructions

These instructions are for deploying and completing the 25.3 demo from scratch.

Tip

Be sure to select the nightly docs version on https://docs.stackable.tech/home/nightly/demos/.

stackablectl demo install <DEMO_NAME> --release 25.3

# --- IMPORTANT ---
# Run through the nightly demo instructions (refer to the tasklist above).

# If you need to fix anything, you will need to clone the repo and use local files:
# git checkout release-25.3
# git pull
# stackablectl --stack-file=stacks/stacks-v2.yaml --demo-file=demos/demos-v2.yaml demo install <DEMO_NAME>
@NickLarsenNZ
Copy link
Member Author

I removed the upgrade testing, but here is the original content:

Outgoing Stable to new 25.3 Upgrade Testing Instructions

These instructions are for deploying and completing the outgoing stable demo, and then
upgading operators, CRDs, and products to the nightly versions well as upgrading
the operators and CRDS.

[!TIP]
Be sure to select the stable docs version on https://docs.stackable.tech/home/stable/demos/.

stackablectl demo install <DEMO_NAME> --release 24.11

# --- IMPORTANT ---
# Run through the stable demo instructions (refer to the tasklist above).

# Get a list of installed operators
stackablectl operator installed --output=plain

# --- OPTIONAL ---
# Sometimes it is necessary to upgrade Helm charts. Look for other Helm Charts
# which might need updating.

# First, see which charts are installed. You can ignore the stackable-operator
# charts, or anything that might have been installed outside of this demo.
helm list

# Next, add the applicable Helm Chart repositories. For example:
helm repo add minio https://charts.min.io/
helm repo add bitnami https://charts.bitnami.com/bitnami

# Finally, upgrade the Charts to what is defined in `main`.
# For example:
helm upgrade minio minio/minio --version x.x.x
helm upgrade postgresql-hive bitnami/postgresql --version x.x.x
# --- OPTIONAL END ---

# Uninstall operators for the stable release (OUTGOING_STABLE)
stackablectl release uninstall 24.11

# At this point, we assume release.yml has been updated with the new 25.3 release.
# if it hasn't, you will need to point stackablectl at a locally updated file using --release-file

# Update CRDs to nightly version (on release-25.3)
# Repeat this for every operator used by the demo (use the list from the earlier step before deleting the operators)
kubectl replace -f https://raw.githubusercontent.com/stackabletech/commons-operator/release-25.3/deploy/helm/commons-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/...-operator/release-25.3/deploy/helm/...-operator/crds/crds.yaml

# Install new release operators (use the list from the earlier step before deleting the operators)
stackablectl operator install commons=25.3.0 ...

# Optionally update the product versions in the CRDs (to the latest non-experimental version for the new release), e.g.:
kubectl patch hbaseclusters/hbase --type='json' -p='[{"op": "replace", "path": "/spec/image/productVersion", "value":"x.x.x"}]' # changed

@NickLarsenNZ NickLarsenNZ changed the title chore(tracking): Test demos upgrades to 25.3.0 chore(tracking): Test demos for 25.3.0 Mar 25, 2025
@Techassi
Copy link
Member

🟢 logging

Already tested as part of local release.yaml testing. Works without any issues.

@maltesander
Copy link
Member

🟢 airflow-scheduled-job

No issues.

@Techassi
Copy link
Member

Techassi commented Mar 25, 2025

🟢 signal-processing

All pods come up and all jobs succeed, but the Grafana dashboards don't display any data. I encountered a bunch of errors when running the notebook. This is due to the data not being available yet through NiFi.

The demo works fine after waiting a couple of minutes, but this delay was not present when previously testing this demo.

@maltesander
Copy link
Member

maltesander commented Mar 25, 2025

🟢 end-to-end-security

Need to downgrade hive to 4.0.0 #189 and #190 cherry-pick to main (otherwise spark job fails)
No issues otherwise, row filter / column masking / authz etc still works.

@adwk67
Copy link
Member

adwk67 commented Mar 25, 2025

🟢 hbase-hdfs-load-cycling-data

No issues.

@adwk67
Copy link
Member

adwk67 commented Mar 25, 2025

🟢 jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data

No issues.

@maltesander
Copy link
Member

🟢 trino-iceberg

No issues.

@NickLarsenNZ
Copy link
Member Author

NickLarsenNZ commented Mar 25, 2025

🟢 trino-taxi-data

Got errors when running the SQL statement in Superset, but it seems to have worked fine once rerunning it.

Image

@NickLarsenNZ
Copy link
Member Author

NickLarsenNZ commented Mar 25, 2025

🟢 spark-k8s-anomaly-detection-taxi-data

Hive errors
spark 2025-03-25T16:05:36,457 INFO [Thread-4] org.apache.hadoop.hive.conf.HiveConf - Found configuration file null
spark 2025-03-25T16:05:36,617 INFO [Thread-4] hive.metastore - Trying to connect to metastore with URI thrift://hive-iceberg:9083
spark 2025-03-25T16:05:36,632 INFO [Thread-4] hive.metastore - Opened a connection to metastore, current connections: 1
spark 2025-03-25T16:05:36,668 INFO [Thread-4] hive.metastore - Connected to metastore.
spark Traceback (most recent call last):
spark   File "/spark-scripts/spark-ad.py", line 12, in <module>
spark     spark.sql("CREATE SCHEMA IF NOT EXISTS prediction.ad LOCATION 's3a://prediction/anomaly-detection'")
spark   File "/stackable/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 1631, in sql
spark   File "/stackable/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
spark   File "/stackable/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", line 179, in deco
spark   File "/stackable/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value
spark py4j.protocol.Py4JJavaError: An error occurred while calling o73.sql.
spark : java.lang.RuntimeException: Failed to create namespace ad in Hive Metastore
spark     at org.apache.iceberg.hive.HiveCatalog.createNamespace(HiveCatalog.java:499)
spark     at org.apache.iceberg.spark.SparkCatalog.createNamespace(SparkCatalog.java:481)
spark     at org.apache.spark.sql.execution.datasources.v2.CreateNamespaceExec.run(CreateNamespaceExec.scala:47)
spark     at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
spark     at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
spark     at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
spark     at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:107)
spark     at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125)
spark     at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
spark     at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108)
spark     at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
spark     at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66)
spark     at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:107)
spark     at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
spark     at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:461)
spark     at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76)
spark     at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:461)
spark     at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32)
spark     at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
spark     at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
spark     at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
spark     at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
spark     at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:437)
spark     at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:98)
spark     at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:85)
spark     at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:83)
spark     at org.apache.spark.sql.Dataset.<init>(Dataset.scala:220)
spark     at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
spark     at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
spark     at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
spark     at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:638)
spark     at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
spark     at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:629)
spark     at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:659)
spark     at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
spark     at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
spark     at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
spark     at java.base/java.lang.reflect.Method.invoke(Unknown Source)
spark     at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
spark     at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
spark     at py4j.Gateway.invoke(Gateway.java:282)
spark     at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
spark     at py4j.commands.CallCommand.execute(CallCommand.java:79)
spark     at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
spark     at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
spark     at java.base/java.lang.Thread.run(Unknown Source)
spark Caused by: MetaException(message:Failed to create external path s3a://prediction/anomaly-detection for database ad. This may result in access not being allowed if the StorageBasedAuthorizationProvider is enabled: null)
spark     at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_database_result$create_database_resultStandardScheme.read(ThriftHiveMetastore.java:26660)
spark     at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_database_result$create_database_resultStandardScheme.read(ThriftHiveMetastore.java:26628)
spark     at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_database_result.read(ThriftHiveMetastore.java:26562)
spark     at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:88)
spark     at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_database(ThriftHiveMetastore.java:753)
spark     at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_database(ThriftHiveMetastore.java:740)
spark     at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createDatabase(HiveMetaStoreClient.java:725)
spark     at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
spark     at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
spark     at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
spark     at java.base/java.lang.reflect.Method.invoke(Unknown Source)
spark     at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:169)
spark     at jdk.proxy2/jdk.proxy2.$Proxy50.createDatabase(Unknown Source)
spark     at org.apache.iceberg.hive.HiveCatalog.lambda$createNamespace$12(HiveCatalog.java:488)
spark     at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:72)
spark     at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:65)
spark     at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:122)
spark     at org.apache.iceberg.hive.HiveCatalog.createNamespace(HiveCatalog.java:486)
spark     ... 45 more

Fixes:

@adwk67
Copy link
Member

adwk67 commented Mar 25, 2025

🟢 jupyterhub-keycloak

No issues.

@maltesander
Copy link
Member

maltesander commented Mar 25, 2025

🟢 nifi-kafka-druid-earthquake-data

No issues when running.

Documentation needs updating (@dervoeti volunteered :D):

@xeniape
Copy link
Member

xeniape commented Mar 25, 2025

🟢 nifi-kafka-druid-water-level-data

  • NiFi flow changed compared to the documentation, it starts out with the Process Group, which one has to enter first to see the expected flow from the documentation screenshot. Maybe worth to mention that.
  • Also there is a warning in the bulletin board:
    • 16:11:17 UTC WARNING nifi-node-default-0.nifi-node-default.default.svc.cluster.local:8443 Unable to write flowfile content to content repository container default due to archive file size constraints; waiting for archive cleanup. Total number of files currently archived = 19
    • and a warning on the process group (see screenshot):
      Image
    • not sure if those are real problems, but might confuse people using the demo
  • same as nifi-kafka-druid-earthquake-demo with the screenshots being from version 1.x in NiFi
  • some UI elements in Druid changed (Supervisor->Statistics->Supervisor->Task stats) , maybe worth updating screenshots and documentation
  • Maps not visible in Superset, I think @sbernauer already mentioned the reason somewhere in regard to the mapboxApiKey

All those errors (depending on if the warnings in NiFi are a real problem) were mostly visual, the functionality of the demo worked fine. It might be important to fix the maps since they make the superset part look faulty and taking a look at the map charts is a significant part of the demo documentation.

Outcome

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants