-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make root directory (common prefix of all base_dir
s) customizable during setup
#346
Comments
I suggest storing the ROOT_DIR for IN2P3/CC and Cori in the I think the number of installations we're trying to support is limited to a finite number, so we can make significant progress at minimal user effort. It wouldn't have to be explicitly set during package installation. It just would be read at package initialization with a machine name lookup. If someone wanted to support their own location installation of |
I have similar thoughts about redefining to be a dict with 'NERSC' and 'IN2P3/CC' as keys. |
NERSC has asked we move off projecta and completely onto the new CFS soon - as in the next few weeks, exact schedule TBD. Hence, this issue has some higher priority. |
I can look into this. |
I'm wondering how this would work for a docker image. At install time, I have no idea where the image might be run, though currently I can safely assume it is at NERSC. I'd need an environment variable that can be set to adjust the ROOT_DIR at runtime. |
we need to assume that it will have to be deployed at NERSC and at CC. Whether with an image or else. I think investigating an env variable setup is the best way forward. |
Yes, an environment variable defined at installation time is probably the best way to go. |
My plan is
|
@JoanneBogart many thanks for posting your plan here! This is really helpful. I think the general direction sounds very reasonable, but I have a few specific suggestions to reduce the complexity. In particular, I would suggest (1) reducing the need of environment variables, and (2) resolve path at run time rather than at installation time. I will post my suggestions in detail by Monday at latest (sorry, has been difficult to find time). |
@yymao Simplification sounds good to me! I look forward to your comments. One clarification: I'm not suggesting that the paths be resolved at installation time, only that information about the site be cached at that time, so that the typical user doesn't have to specify anything (via environment variable or any other means) at run time. |
@JoanneBogart Here's my specific suggestions (some are the same as yours):
Here's the principles I am trying to adhere to when writing the above:
|
@yymao I think we're converging but are not quite of one mind yet. I agree it's better have register resolve the path, not the readers. I agree that users should be able to programmatically overwrite both ROOT_DIR and individual paths. I'm not sure that is sufficient for all use cases, though. We need a way to set up a sensible default. That means getting information about site to the package. (It does not mean resolving paths at that time; I never suggested that.) An advantage of caching site during installation is that anyone running off of a provided installation won't have to think about it at all unless they need to override. The alternative is to always require this information at run time. If site could be reliably determined from existing system variables I'd be in favor of that, but I don't think it can be. HOSTNAME wouldn't qualify. The pool of hosts is not under our control. If it changes, whatever parses the value could give the wrong answer. |
@JoanneBogart It sounds like we are in good agreement except for when and how to obtain site information. On "when": my proposal is to obtain site information every time when GCRCatalogs is imported, while yours is to obtain site information at installation time and cache. I think in both cases, For users who want to overwrite My preference in obtaining site information at import time comes from the idea that it seems more consistent (i.e., requires less explanation to new devs). Site information is only needed when setting The counter argument that I can think of, is that if site information cannot be obtained reliably every time when GCRCatalogs is imported, then of course we should cache it at installation time. So this gets us to "how"... On "how": my proposal is to obtain site information using >>> import socket
>>> socket.getfqdn()
'cori03.nersc.gov' One thing I am not sure is whether this works fine in docker/shifter images. If it does, I think this can be used instead of reading an user-set environment variable. |
@yymao I don't care that much about the "when" part. The only reason for attempting to determine site during installation is 1) if a significant fraction of (probably more casual) users do not do their own installation - I don't know if that's the case or not - and 2) if determination of site is not automatic. |
@johannct does the above means to determine site (via response to |
yes it seems ok also for CC, the target word would be in2p3 I guess. Can you guys show me how a catalog config would look like.... I got lost in the discussion. Then I would suggest to implement the minimal solution, and we'll need to use it to judge it concretely. |
I think in this proposal that @JoanneBogart and I converge on, the config files will not have much change except for updating all the paths to relative. Then we will need to revise the code so that it
|
What @yymao said. That is also my understanding. |
ok in my imagination I was contemplating the env variable to be written in the config files, so that |
I've considered that too, but later feel that's unnecessary, especially because |
right but it is less portable. I do not want to interfere, go ahead you thought this over much more than I did. |
I'm not sure whether this is a good thing or not but -- |
I guess it could be convenient when working with a new dataset which is not yet in the standard place. |
@johannct can you elaborate on the "less portable" point? I am not sure I get it. @JoanneBogart Indeed. And in fact that's exactly the behavior of |
Once we move to the new scheme config files with absolute paths should not be allowed in tagged versions of gcr-catalogs. |
@yymao If you don't have objections I will try to implement this. |
@JoanneBogart sounds good. If you open a draft PR even while you're still working on it, I'll be happy to provide early feedback. |
@yymao I pushed my branch, then realized I was supposed to do this via a fork. Sorry.
I haven't confirmed that all the possible path-like keywords are recognized and handled, but I believe it behaves correctly for the cases I've tested. |
@JoanneBogart using branches is perfectly fine! Thanks! I'll take a look and comment. Do you want to open a draft PR? This way it's clear that it is work in progress (and people won't be notified when you push), but it allows people to comment in the PR thread. |
In an effort to make gcr-catalogs more easily installable outside NERSC (#271), we need to allow the root directory (the common prefix of all
base_dir
s, after #345 is done) to be configurable during package setup time.There are a few ways to do this:
Require
base_dir
to always be a relative path. At run time, the reader prepends root directory path (which is set during package installation) tobase_dir
.Require
base_dir
to be in the form of<ROOT_DIR>/relative/path
. At run time, the reader replaces<ROOT_DIR>
with the actual root directory path (which is set during package installation).Require
base_dir
to be in the form of/global/projecta/projectdirs/lsst/production/catalogs/relative/path
. At run time, the reader replaces/global/projecta/projectdirs/lsst/production/catalogs
with the actual root directory path (which is set during package installation).(1) appears more elegant and less hack-y. (2) is more explicit to the user (i.e., the user would know what will happen). (3) has the benefit that for NERSC users, they can easily find the files without needing to find out what
ROOT_DIR
is.Comments on this are welcome.
The text was updated successfully, but these errors were encountered: