Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logs miss-displaying processing of SLURM variables #115

Open
matheuscteo opened this issue Oct 26, 2023 · 6 comments
Open

Logs miss-displaying processing of SLURM variables #115

matheuscteo opened this issue Oct 26, 2023 · 6 comments
Labels
bug Something isn't working

Comments

@matheuscteo
Copy link
Member

CC @bermudei

Related to #100 , when e.g. reprocessing --match (slurm variable) the logs do not indicate that the job started to run but instead display that no match was found.

This might lead to the user running several jobs by mistake.

@matheuscteo matheuscteo added the bug Something isn't working label Oct 26, 2023
@takluyver
Copy link
Member

This is what I see when doing this:

$ amore-proto reprocess --match zimuth 148
2023-10-26 16:11:25 INFO     extra_data.read_machinery              Found proposal dir '/gpfs/exfel/exp/FXE/202302/p004507' in 0.038 s
INFO:extra_data.read_machinery:Found proposal dir '/gpfs/exfel/exp/FXE/202302/p004507' in 0.0017 s
INFO:__main__:Using 0 variables (of 22) from context file
INFO:extra_data.read_machinery:Found proposal dir '/gpfs/exfel/exp/FXE/202302/p004507' in 0.0016 s
INFO:extra_data.read_machinery:Found proposal dir '/gpfs/exfel/exp/FXE/202302/p004507' in 0.0018 s
INFO:__main__:Writing 1 variables to 2 datasets in extracted_data/p4507_r148.h5
INFO:__main__:Writing 1 variables to 1 datasets in /tmp/tmpd02jlmxi/reduced.h5
2023-10-26 16:11:59 INFO     damnit.backend.extract_data            Reduced data has 1 fields
2023-10-26 16:12:00 INFO     damnit.backend.extract_data            Adding p4507 r148 to database, with 1 columns
2023-10-26 16:12:00 INFO     damnit.backend.extract_data            Sent Kafka update to topic 'amore-db-1959f8c5f91fae45c425b1dbb19d42c0e91d36b0'
2023-10-26 16:12:00 INFO     damnit.backend.extract_data            Launched Slurm job 3855048
 to calculate cluster variables

That doesn't seem so bad? Admittedly it says 'Using 0 variables', but the last line has 'Launched Slurm job'.

@JamesWrigley
Copy link
Member

I think the problem is that it doesn't specify that it's selecting 0 variables to process locally, and there might be more used in the slurm job. We could probably improve the logs in general by printing the names of all the variables that were selected, which would be particularly useful with --match.

@matheuscteo
Copy link
Member Author

Yes, the phrase 'Using 0 variables' was what generated the confusion today.

@takluyver
Copy link
Member

OK, got it. I'm hoping to work on logs today (if other stuff doesn't get in the way, as it often does)

@takluyver
Copy link
Member

This should now show up as:

INFO:__main__:Using 0 variables (of 22) from context file (cluster variables will be processed later)

Are we happy enough with that message to close this issue?

@bermudei
Copy link

bermudei commented Nov 9, 2023

As James commented, is it possible to print the names of the variables being used? both locally and in slurm.
I know it might be too much specially when "all variables" are being processed but I think it would be very useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants