Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS access key are referenced in exported pipeline file #3164

Closed
harshad16 opened this issue Jun 21, 2023 · 2 comments
Closed

AWS access key are referenced in exported pipeline file #3164

harshad16 opened this issue Jun 21, 2023 · 2 comments
Labels
component:pipeline-editor pipeline editor component:pipeline-runtime issues related to pipeline runtimes e.g. kubeflow pipelines status:Needs Triage

Comments

@harshad16
Copy link
Contributor

Describe the issue

Exporting a pipeline from Elyra notebook pipeline editor as Python DSL or YAML reads S3 credentials from the cluster and saves them in plain text in the generated output.

https://elyra.readthedocs.io/en/v3.9.0/user_guide/pipelines.html#exporting-pipelines

The Exported file has an AWS s3 credential, can this be masked or an option to set this from the pipeline, instead of exposing them on the pipeline would help users upload their pipeline publically on git?

Or just referencing it from Kubernetes secret could be another possible solution.

To Reproduce
Steps to reproduce the behavior:

  1. Go to export from elyra
  2. Click on export the file as yaml
  3. Scroll down in the yaml and view AWS keys
  4. See error

Screenshots or log output
In the YAML

spec:
  pipelineSpec:
    tasks:
    - name: data-prep
      taskSpec:
        steps:
        - name: main
          env:
          - name: AWS_SECRET_ACCESS_KEY
            value: value  

Expected behavior
In the YAML

spec:
  pipelineSpec:
    tasks:
    - name: data-prep
      taskSpec:
        steps:
        - name: main
          env:
          - name: AWS_SECRET_ACCESS_KEY
            valueFrom:
               secretKeyRef: secret

Deployment information
Describe what you've deployed and how:

  • Elyra version: [e.g. 1.5.3]: 3.15.0
  • Operating system: [e.g. macos, linux] : linux (ubi9)
  • Installation source: [e.g. PyPI, conda, from source, official container image, custom container image]: pypi
  • Deployment type: [e.g. local installation, Docker, Kubernetes, Kubeflow [notebook server] , Open Data Hub]: opendatahub

Pipeline runtime environment
If the issue is related to pipeline execution, identify the environment where the pipeline is executed

  • Kubeflow Pipelines (provide version number, whether multi-user auth enabled)

Runtime configuration settings
If the issue is related to pipeline execution, document the runtime configuration settings from the Elyra UI, omitting confidential information.

@harshad16 harshad16 added component:pipeline-editor pipeline editor component:pipeline-runtime issues related to pipeline runtimes e.g. kubeflow pipelines status:Needs Triage labels Jun 21, 2023
@harshad16
Copy link
Contributor Author

Looked at the documentation:
https://elyra.readthedocs.io/en/latest/user_guide/runtime-conf.html#cloud-object-storage-authentication-type-cos-auth-type

Using the cos-auth-type KUBERNETES_SECRET would fix this issue.
closing the issue, thanks for the help.

@shalberd
Copy link
Contributor

shalberd commented Mar 5, 2024

@harshad16 yes, on the one hand, this way, the credentials do not appear in the pipeline processor, be it Airflow Git or when exporting pipelines, but instead referenced via secret, if it exists:

https://github.com/elyra-ai/elyra/blob/main/elyra/templates/kubeflow/v1/python_dsl_template.jinja2#L56
https://github.com/elyra-ai/elyra/blob/main/elyra/pipeline/airflow/processor_airflow.py#L298

But, the thing is, for sending Elyra info to S3, there are still username and password required, which kind of defeats the purpose of using secrets in the first place:

@kevin-bates I guess I am trying to pre-configure as much as possible in the form of a runtime json, with project access tokens git and non-personal S3 credentials, or even personal s3 credentials, in a K8s secret, so the user does not even have to explicitely specifiy that.

In both airflow and kfp, it's good to use secrets, even more so in airflow, cause there, the username and password would appear in gitlab, which is to be avoided. But even in kfp, storing COS username and password directly in the env var instead of getting the env var from the secret is bad practice, but currently possible. Context around

https://github.com/elyra-ai/elyra/blob/main/elyra/pipeline/kfp/processor_kfp.py#L817
https://github.com/elyra-ai/elyra/blob/main/elyra/templates/kubeflow/v1/python_dsl_template.jinja2#L78

https://github.com/elyra-ai/elyra/blob/main/elyra/pipeline/airflow/processor_airflow.py#L298

Bildschirmfoto 2024-03-05 um 18 55 55

and then, also, the username and password are visible in the runtime json of the running Jupyter container at .local/share/jupyter/metadata/runtimes, which might not always be desirable, whether non-personal or personal credentials are used for S3 access ...

In other words, it'd be nice to have the ability for k8s secret not just for the name of an existing Kubernetes secret in the target runtime environment, but also for the current jupyter environment.

current logic:

#2354

My ideal solution would be for Elyra to only use the COS credentials from the K8s/Openshift secret, but then that secret would of course be needed not only in the pipeline execution namespace (user namespace), but also in the Jupyterlab namespace . Thankfully, currently, when cos secret is defined, it's not used in the target pipeline

https://github.com/elyra-ai/elyra/blob/main/elyra/pipeline/processor.py#L483

but my point is, it should never be used verbatim as an env value, always from a secret.

envFrom, secretRef https://gist.github.com/troyharvey/4506472732157221e04c6b15e3b3f094 for Elyra communicating with S3.

possibly the best way in a K8S context would be for the creator of the Jupyterlab pod environment to add env variables cos_username and cos_password from a k8s secret, injecting them from the secret into the container, so Elyra can make use of those two variables from env vars. Or alternativly, mount in the whole runtime config as a file and volume from a configmap or secret. I don't like the values of cos_username and cos_password being verbatim in a runtime json in the user directory

{
  "display_name": "Airflow TST Instanz",
  "metadata": {
    "tags": [],
    "description": "Airflow Instanz im Zusammenspiel with Namespaces Workbenches TST",
    "display_name": "Airflow TST Instanz",
    "user_namespace": "my-airflow-namespace",
    "git_type": "GITLAB",
    "github_api_endpoint": "https://gitlab.mycompany.com/",
    "api_endpoint": "https://my-airflow-airflow-namespace.apps.tst.mycompany.com/,",
    "github_repo": "airflowstuff",
    "github_branch": "dags",
    "github_repo_token": "",
    "cos_auth_type": "KUBERNETES_SECRET",
    "cos_endpoint": "https://mys3.mycompany.com/",
    "cos_bucket": "my-elyra-bucket",
    "cos_secret": "my-cos-secret",
    "cos_username": "mycosusername",
    "cos_password": "mycospassword",
    "runtime_type": "APACHE_AIRFLOW"
  },
  "schema_name": "airflow"
}

for use by elyra when communicating with s3 https://github.com/elyra-ai/elyra/blob/main/elyra/util/cos.py#L68

opendatahub-io/odh-dashboard#2271

https://www.jeffgeerling.com/blog/2019/mounting-kubernetes-secret-single-file-inside-pod

related thought:

@lresende this current behavior with username and password verbatim probably has to do with non-K8S runtime environment support, doesn't it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:pipeline-editor pipeline editor component:pipeline-runtime issues related to pipeline runtimes e.g. kubeflow pipelines status:Needs Triage
Projects
None yet
Development

No branches or pull requests

2 participants