Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quick-question board. #23

Open
jreniel opened this issue May 12, 2021 · 25 comments
Open

Quick-question board. #23

jreniel opened this issue May 12, 2021 · 25 comments

Comments

@jreniel
Copy link
Member

jreniel commented May 12, 2021

Please use this issue tracker to post "quick" questions about pyschism.


For bug reports please open a new issue.

@jreniel jreniel pinned this issue May 12, 2021
@ivicajan
Copy link

Hi Jaime,
thanks for putting all the nice functionality needed for SCHISM model into the python.

I am testing some of it for my super fine scale local (small geo region) grid and not sure how to properly specify ERA5 subsetting bounding box. If using standard way with bbox = hgrid.get_bbox('EPSG:4326', output_type='bbox') then I have only 1 ERA5 lon stripe point saved in sflux file. However, I belive we need at least 2 for proper convex hull interpolation.

Tried to change in era.py condition which use predefined xoffset/yoffset (set to era5 grid resolution 0.25 deg):
'area': [self._bbox.ymax+yoffset, self._bbox.xmin-xoffset, self._bbox.ymin-yoffset, self._bbox.xmax+xoffset],
as well as (>= replaced with > etc):
def _modified_bbox_indexes(self):
lat_idxs = np.where((self.lat > self._bbox.ymin)
& (self.lat < self._bbox.ymax))[0]
lon_idxs = np.where((self.lon > self._bbox.xmin)
& (self.lon < self._bbox.xmax))[0]
return lon_idxs, lat_idxs
The end result I got with this modification is OK in the er() extracted structure, but somehow it is still not written in the forcing files (which still have only 1 lon stripe).

er.inventory.lon, er.inventory.lat
(masked_array(data=[115.335 , 115.585335, 115.83567 , 116.086 ],
mask=False,
fill_value=1e+20,
dtype=float32),
masked_array(data=[-31.621, -31.871, -32.121, -32.371],
mask=False,
fill_value=1e+20,
dtype=float32))

but netcdf file is:
data:

lon =
115.8357,
115.8357,
115.8357 ;

lat =
-31.871,
-32.121,
-32.371 ;
}

Cheers,
Ivica

@SorooshMani-NOAA
Copy link
Contributor

SorooshMani-NOAA commented Jun 1, 2022

Are there any plans for publishing pyschism on PyPI more regularly? Having it as a repo makes it harder for packages that rely on it to specify it as dependency. PyPI seems to not allow direct reference (i.e. by url) for packages that are uploaded to it.

@SorooshMani-NOAA
Copy link
Contributor

@cuill if you don't have plans for publishing on PyPI can I create a fork and publish with a name such as pyschism-unofficial or something like that?

@josephzhang8
Copy link
Member

josephzhang8 commented Jun 2, 2022 via email

@TPCollings
Copy link

Hi there, thanks very much for releasing this package, it's incredibly useful. Are there any plans to release some more updated and complete documentation in the near future?

Cheers,

Tom

@TPCollings
Copy link

Also the example scripts are incomplete and outdated. Does anyone have an example script they could share with me that just show's the implementation of the model using Pyschism?

Thanks

@cuill
Copy link
Member

cuill commented Sep 12, 2022

@TPCollings
Here are the most updated examples:
https://pyschism.readthedocs.io/en/latest/api/index.html

@TPCollings
Copy link

@cuill Ah thanks, I've already found, they are really useful. I'm more looking for some examples of how to initiate the model once the input files have been generated.

@cuill
Copy link
Member

cuill commented Sep 12, 2022

@TPCollings
The SCHISM online manual probably can answer your question:
https://schism-dev.github.io/schism/master/getting-started/running-model.html

@SorooshMani-NOAA
Copy link
Contributor

@cuill I see that gfs2.GFS and hrrr3.HRRR only generate a single .nc file. How can I get the valid SCHISM sflux file from here? Should I use a custom script or is there any function in PySCHISM to generate the valid sflux for me?

@cuill
Copy link
Member

cuill commented Sep 16, 2022

@SorooshMani-NOAA
Copy link
Contributor

@cuill I just returns a hrrr_<datestamp>.nc file but not sflux_air_1.[XXXX].nc, sflux_prc_1.[XXXX].nc and sflux_rad_1.[XXXX].nc. Also GFS only gives me an error. This is my test code:

hrrr3.HRRR(start_date=datetime.now()-timedelta(days=15), pscr='/tmp/testhrrr3', rnday=2, record=1, bbox=Bbox([[-80, 20], [-75, 35]]))

and

gfs2.GFS(start_date=datetime.now()-timedelta(days=15), pscr='/tmp/testgfs2', rnday=2, record=1, bbox=Bbox([[-80, 20], [-75, 35]]))

For GFS I get:

...
File ~/workarea/sandbox/pyschism/pyschism/forcing/nws/nws2/gfs2.py:116, in gen_sflux()
    114 def gen_sflux(self, date, record, pscr):
--> 116     inventory = AWSGrib2Inventory(date, record, pscr)
    117     grbfiles = inventory.files
    118     cycle = date.hour

File ~/workarea/sandbox/pyschism/pyschism/forcing/nws/nws2/gfs2.py:49, in __init__()
     47 for page in pages:
     48     pprint(page)
---> 49     for obj in page['Contents']:
     50         pprint(obj)
     51         data.append(obj)

KeyError: 'Contents'

Should I create a ticket for it?

@cuill
Copy link
Member

cuill commented Sep 16, 2022

The file name hrrr_.nc follows Dan's workflow, so that he doesn't need to change his shell scripts. Users can link the needed files to the run directory.

I don't have issues running gfs2.GFS. Are you using the latest version of pyschism? Line 116 should be this:
path = pathlib.Path(date.strftime("%Y%m%d"))

@SorooshMani-NOAA
Copy link
Contributor

@cuill yes, I'm using the latest. The reason behind difference in line no is that I added some print statements to see what is going on. So right now you can execute the gfs2 line I wrote above without any issues? That's strange!

About the hrrr_.nc files, is there any script to convert that to sflux air, prc and rad files? Can Dan's script please be shared on the repo as an example if it's already not?

@cuill
Copy link
Member

cuill commented Sep 16, 2022

@SorooshMani-NOAA

No, I was using the example script.

I think the issue with your script is start_date format.

@cuill
Copy link
Member

cuill commented Sep 16, 2022

@SorooshMani-NOAA
In your case, you can create start_date as follows:

date = datetime.now() - timedelta(days=1)
start_date = datetime(now.year, now.month, now.day)

@SorooshMani-NOAA
Copy link
Contributor

@cuill Thanks for your help. By the way, I just tested the same gfs2 code from above on a different machine and it worked without any changes to the start_date argument. I guess on my own machine there's a package inconsistency causing some of the issues.

In any case, would it be possible to share the script that Dan is using to convert .nc files to proper sflux format?

@SorooshMani-NOAA
Copy link
Contributor

Actually it was pure lucky timing. But the easier way is to just send date object, i.e. datetime.now().date() as start_date

@cuill
Copy link
Member

cuill commented Sep 16, 2022

@SorooshMani-NOAA
Here is the script to link sflux to the run directory:
https://github.com/schism-dev/pyschism/blob/main/examples/examples_sflux/link_sflux.py

@SorooshMani-NOAA
Copy link
Contributor

@cuill thank you. So it's OK if sflux files have additional variables, as long as they have the required ones for each of the rad, prc and air, right? Because the script is just linking the same file for all the three variables for a given time.

@cuill
Copy link
Member

cuill commented Sep 16, 2022

@SorooshMani-NOAA
Yes.

@SorooshMani-NOAA
Copy link
Contributor

@cuill I'm trying to use HRRR original implementation for 2 day forecast setup and I'm running into an issue. Without getting into details about my issue, I was wondering why the following lines were modified? Was TemporaryDirectory causing any issues?

#self._tmpdir = tempfile.TemporaryDirectory(
# prefix=appdirs.user_cache_dir()
#)
self._tmpdir = pathlib.Path(f"./{self.start_date.strftime('%Y%m%d')}")
self._tmpdir.mkdir(exist_ok = True, parents = True)

My issue will be resolved if temporary directory location (i.e. /tmp) is used (e.g. by using tempfile.TemporaryDirectory) instead of current directory (i.e. ./{self.start_date.strftime('%Y%m%d')}) for _tmpdir creation

@josephzhang8
Copy link
Member

josephzhang8 commented Oct 11, 2022 via email

@jreniel
Copy link
Member Author

jreniel commented Oct 11, 2022

Hi @josephzhang8,
There is nothing inherently unsafe about PyPi. There have been occasions when users have uploaded malicious packages to PyPi, but in order to be affected by that, you must have installed those packages. The problem then is not one of PyPi itself, but rather of the specific packages that you choose to use from PyPi.

PyPi hosts over 350,000 packages, and from those, security firms have identified a solid 20 of them containing malicious code. [Edit]: See comment below for a relatively recent PyPi purge of ~3000 typosquatting packages.

A PyPi package that has been initially published by a trusted author cannot be hijacked and changed by an untrusted source. This is because all of the packages in PyPi have unique names, and once a name has been registered, no one except the person holding the authority keys to that name can change the content hosted in PyPi. So you can rest assured that if you trust the authors of the ``numpy'' PyPi name registration, you will always get packages that have been published exclusively by those authors.

Having said this, one of the ways people inject malicious code is by using similar names for a package, but it's very unlikely to fall into that trap for two reasons: first, packages like pyschism have a setup.py where the dependencies are explicitly set by the authors. That way the end users don't have to manually install any other package. And the second reason being that in the unlikely event that the users need to install something manually, they would have to make a typo that matches the name of some malicious package and import/execute the package with the typo ... This is very unlikely to happen.

So bottom line is, anyone can in principle publish anything to PyPi, including malware, but they cannot do it under names that have already been registered, so trusted authors (pypi names) can remain trusted indefinitely. So, as with every other piece of software, using common sense is your first line of defense.

@jreniel
Copy link
Member Author

jreniel commented Oct 12, 2022

For example, on this report https://www.theregister.com/2021/03/02/python_pypi_purges/ they talk about a malicious package named cupy-cuda112 (which was already removed because there is some level of audit in PyPI, contratry to what is stated in the 8-year old reddit link you posted in your question). But that name is not the same as cupy which is the original package that should be used if you would be interested in GPU computing using numpy.

This technique of using similar, but different names to deploy malicious code is called phishing typosquatting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants