Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transition configuration of data service to env vars / .env file / docker secrets #568

Merged
merged 7 commits into from
Apr 16, 2024

Conversation

aaraney
Copy link
Member

@aaraney aaraney commented Apr 3, 2024

Drop dataservice CLI argument parsing and replace with a mechanism for configuring the service with environment variables, a .env file (does not need to be exported), docker secrets, or a combination of the three. The primary goal of this change is to simplify service configuration and ultimately deployment. As an added benefit, the use of pydantic to validate and coerce configuration options will also slightly harden the service.

See README.md in change set for a listing of configuration options, their types, and descriptions.

Additions

  • Service is now configured using either, environment variables, a .env file, docker style secrets (stored at /run/secrets/), or a combination of the three. Input variable names are case insensitive and are sourced in the following priority:
    1. environment variables
    2. variables loaded from a dotenv (.env) file.
    3. variables loaded from the secrets directory.
    4. default field values

Removals

  • Remove support for providing service configuration as CLI arguments.
  • FilesystemDatasetManager support has been dropped. The feature was only partially functional and can easily be added in the future once in feature parity with the existing ObjectStoreDatasetManager.

Changes

  • Docker deployment and entrypoint script updated to function with feature set.

Copy link
Contributor

@robertbartel robertbartel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the direction, but there are a few things that I at least want to chat about.

- PORT=${DOCKER_DATASERVICE_CONTAINER_PORT:-3015}
- CERT_PATH=/ssl/data-service/certificate.pem
- KEY_PATH=/ssl/data-service/privkey.pem
- SSL_DIR=/ssl/data-service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't gotten all the way through this yet, but if we're pulling apart the CERT_PATH and KEY_PATH into explicit values, do we still need SSL_DIR? And if we do, but we're separating those components, does that also need to be changed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question! I was thinking the same thing when I was refactoring this. I was not sure (and didn't lookup) if you can provide root and intermediate x509 certs in separate files along side the private and public key? Meaning, I didn't know if we dropped support for the SSL_DIR if this would required bundling the root and intermediate certs with the public key cert and if there were reasons we wouldnt want to do that.

If this isn't an issue, I would love get rid of it!

#- REDIS_PORT=
#- REDIS_USER=
- OBJECT_STORE_USERNAME=object_store_exec_user_name
- OBJECT_STORE_PASSWD=object_store_exec_user_passwd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little worried about how these were renamed. Now the contents aren't what the variable names describe, which are the Docker secret names. Again, still haven't gotten all the way through, so maybe that ends up being more clear in context than I expect, but it catches my eye here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that is totally fair. Thanks for mentioning this! Are you thinking adding exec into the configuration field names (e.g. OBJECT_STORE_EXEC_USERNAME)?

working_dir: /code
ports:
- "${DOCKER_DATASERVICE_HOST_PORT:-3015}:${DOCKER_DATASERVICE_CONTAINER_PORT:-3015}"
secrets:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

secrets is already configured before environment. Also you aren't using consistent secret names. I'll comment more on that below where the secrets are declared, but one whatever you decide needs to match with what's set within the applicable environment variables for the data service.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch!

docker/main/docker-deploy.yml Outdated Show resolved Hide resolved
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s,%(msecs)d %(levelname)s: %(message)s",
datefmt="%H:%M:%S"
datefmt="%H:%M:%S",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't datefmt get parsed as a tuple here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope! Trailing commas in argument lists are totally fine. I use the python formatter black and that is how it formats multiline sequence of arguments.


# TODO: consider making this a pydantic.SecretStr
object_store_username: str
"""Set name of the Docker secret containing the object store user access key"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably shouldn't engineer something at this level in a way that is quite so tailored to our use of Docker. I suggest we rename this and the other similar items to suggest that they are really the names of files (at least as far as the app is concerned), not only to be more clear, but so we can have separate attributes for the values. We may not always acquire the value in this manner in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably shouldn't engineer something at this level in a way that is quite so tailored to our use of Docker

Are you primarily suggesting that I should change the doc string to remove the reference to Docker? I did not make it clear in the source documentation that any defined field can be sourced programmatically (when initializing the object), from an environment variable, from a .env file, or from a file with the fields name stored in /run/secrets (following Docker's convention). I don't think we should do this now, but if we need to configure where secrets are sourced, there is a straight forward pathway to do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring was part of it. My general concern was that if object_store_username/object_store_exec_user_name stored the name of a Docker secret, then using that attribute name would be misleading unless we assumed certain things about having Docker in the deployment.

But from your response, it sounds like somehow ServiceSettings gets the content of secret files automatically as appropriate. I'm guessing that it knows to puts those contents into object_store_exec_user_name, instead of the value directly provided, when the provided value is actually something in the secrets dir. Alternatively, if the value within an .env file for object_store_exec_user_name is just a literal value for the user name, the ServiceSettings object can tell the difference and just use the literal value. I just don't quite see how that works.

As long as that is how it works, then this is fine (apart from the elsewhere discussed exec stuff for the name). And that would explain why the environment variables themselves aren't named to reflect that they are Docker secret names. Though the way things all fit together is perhaps a bit obfuscated and could use some clarifications (certainly it was with the previous docstring, though I see you've already changed that).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, you nailed it. That is pretty much exactly how it works. I tried to emphasize that in the README. Im sure ive missed the mark somewhere. Can you skim it and let me know holes i've left? I want to make sure it is clear how to configure the service.

@aaraney
Copy link
Member Author

aaraney commented Apr 10, 2024

@robertbartel, I think ive addressed all of you comments and this is ready for re-review. I would like reorder / squash the commits Ive added to address your comments before we merge, but for review sake I did not want to do that yet since it might make reviewing more difficult.

robertbartel
robertbartel previously approved these changes Apr 12, 2024
Copy link
Contributor

@robertbartel robertbartel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is probably fine, but I did have a couple of minor inline suggestions. We can discuss further, skip, or commit and re-approve.

My original issue was that it's not hard to accidentally conflate "configuration variable" with "environment variable" if you aren't paying close attention AND I got thrown off by the inclusion of object-store-related, secret-backed config variables as entries in the service config environment section in docker-deploy.yml. With the latter out of the way, I think things are made clear enough.

python/services/dataservice/README.md Outdated Show resolved Hide resolved

Provide listed service configuration variables as either environment variables, in a '.env' file, as a `secret` in `/run/secrets`, or as a mixture of the aforementioned options.

Configuration option names are case insensitive.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be helpful, but not necessary:

Suggested change
Configuration option names are case insensitive.
Configuration variable names are case insensitive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I went back and forth when writing this. We usually say a CLI tools command line options, not its command line variables. I think it depends on your background which term you prefer. I am going to leave it as is. Thanks for the careful read and feedback!

@aaraney
Copy link
Member Author

aaraney commented Apr 12, 2024

Thanks for the careful read, @robertbartel! I made several of the changes you suggested. I am going to squash / reorder some of the commits and force push just to make this clearer and request a re-review. None of the content from this point forward will change, just the git metadata.

aaraney and others added 7 commits April 12, 2024 16:26
- settings can be populated using env vars, a .env file, or "secrets"
  files stored at `/run/secrets`.
- add methods for printing out usage information.
- don't include support for file backed data store as it is not
  currently functional.
@aaraney aaraney force-pushed the move-dataservice-to-env-vars branch from 1d4a3d8 to fda3d95 Compare April 12, 2024 20:26
@aaraney aaraney requested a review from robertbartel April 12, 2024 20:27
@aaraney aaraney merged commit ec2396e into NOAA-OWP:master Apr 16, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants