Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download size with MB #85

Open
alexisahedo opened this issue Apr 7, 2022 · 9 comments
Open

Download size with MB #85

alexisahedo opened this issue Apr 7, 2022 · 9 comments

Comments

@alexisahedo
Copy link

alexisahedo commented Apr 7, 2022

Hello everyone,
Thanks a lot for all your hard work developing this tool, it has been very helpful to me in my work.

I’ll like to report a problem with the method “download_size” in the Class “Proyect.py” of “ost” directory, in my query I have a series of Sentinel-1 images that have less than 1 Gb of information, and the method cannot transform the str to a flout because mb is not deleted from the str, so it shot an error (or that's what I understand) and I’m not able to know the complete download size of my search.


Terminal error:

Traceback (most recent call last):
File "path/ost_prueba_01/ost_download_test01.py", line 120, in
ost_s1.download_size(ost_s1.refine_inventory())
File "path/envs/virtualenv/lib/python3.6/site-packages/ost/Project.py", line 330, in download_size
self.inventory["size"].str.replace(" GB", "").astype("float32").sum()
File "path/envs/virtualenv/lib/python3.6/site-packages/pandas/core/generic.py", line 5546, in astype
new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors,)
File "path/envs/virtualenv/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 595, in astype
return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
File "path/envs/virtualenv/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 406, in apply
applied = getattr(b, f)(**kwargs)
File "path/envs/virtualenv/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 595, in astype
values = astype_nansafe(vals1d, dtype, copy=True)
File "path/envs/virtualenv/lib/python3.6/site-packages/pandas/core/dtypes/cast.py", line 995, in astype_nansafe
return arr.astype(dtype, copy=True)
ValueError: could not convert string to float: '887.4 MB'


Script of "download_size":

def download_size(self, inventory_df=None):
"""Function to get the total size of all products when extracted in GB
:param inventory_df:
:return:
"""
if inventory_df is None:
download_size = (
self.inventory["size"].str.replace(" GB", "").astype("float32").sum()
)
else:
download_size = (
inventory_df["size"].str.replace(" GB", "").astype("float32").sum()
)

    logger.info(f"There are about {download_size} GB need to be downloaded.")

I’m not a programmer so I’m not able to give more information about the issue.
Thanks a lot.

Alexis A.

@KBodolai
Copy link
Contributor

Hey Alexis,

that output is enough, we just need to make it a bit more flexible, so it would replace anything at the end. I'd be happy to help you do a pull request, or to work on it myself in the coming weeks. Happy to point you in the right direction if y ou're feeling brave :)

Quick fix would be to add another .replace(" MB", "") after the one for GB, but perhaps something more flexible using, for instance, regex would be clearer.

Cheers,
Kristian.

@alexisahedo
Copy link
Author

Hello dear KBodolai,

Thanks a lot for the advice and for the good disposition to work into it in the future, I’ll give it a shot and try to make something with the sort of “if else” to round the images with Mb suffix to 1, so that the final sum be the closest as possible; I’ll also check out RegEx and try to apply it if it’s not too time consuming.

It will be very important for me because I’ll star to download soon a hole year of Sentinel-1 images for a national project and the space is a big deal.

Saludos.

Alexis A.

@BuddyVolly
Copy link
Collaborator

Dear Alexis,

I am implementing the fix suggested by KBodolai, so it won't throw an error.
However the calculation of the full download size will be wrong, as it will assume those MBs as GBs.
I'll let you know, when a new version containing this fix is available.
Saludos, BV

@alexisahedo
Copy link
Author

alexisahedo commented Apr 28, 2022

Dear @BuddyVolly,

As always, I’m really thankful in all the work and effort everyone invests in this code, it really helps me a lot.
As @KBodolai suggested I tried really hard to implement a more flexible option using RegEx and came with a possible solution to transform MB to GB and make the proper sum. I’ll star a pull request (my first time ever, so hopefully I do it correctly); I’m going to add the mentioned script modification next:

    def download_size(self, inventory_df=None):
        """Function to get the total size of all products when extracted in GB and MB

        :param inventory_df:
        :return:
        """
        download_size = 0.0
        if inventory_df is None:
            inventory_size = self.inventory["size"]
            for size in inventory_size:
                size_gb = re.search(' GB', str(size))
                if size_gb:
                    size_gb_n = re.sub(' GB', '' , (size))
                    download_size = (float(size_gb_n)) + download_size
                else:
                    size_mb_n = re.sub(' MB', '', (size))
                    size_mb_togb = float(size_mb_n)/1024
                    download_size = size_mb_togb + download_size
        else:
            inventory_size = inventory["size"]
            for size in inventory_size:
                size_gb = re.search(' GB', str(size))
                if size_gb:
                    size_gb_n = re.sub(' GB', '' , (size))
                    download_size = (float(size_gb_n)) + download_size
                else:
                    size_mb_n = re.sub(' MB', '', (size))
                    size_mb_togb = float(size_mb_n)/1024
                    download_size = size_mb_togb + download_size

        logger.info(f"There are about {round(download_size, 3)} GB need to be downloaded.")

Thanks both of you for your kind help.

Saludos cordiales.
Alexis A.

@BuddyVolly
Copy link
Collaborator

Hi Alexis, no need for a PR. I think it can be simplified, but that piece of code already helps me

@BuddyVolly
Copy link
Collaborator

BuddyVolly commented Apr 28, 2022

can you check if this works for you?I only have GBs in my inventory

def download_size(self, inventory_df=None):
        """Function to get the total size of all products when extracted in GB and MB

        :param inventory_df:
        :return:
        """
           
        if inventory_df is None:
            size = self.inventory["size"]
        else:
            size = inventory_df["size"]
            
        size = size.apply(
            lambda x: re.sub(' GB', '' , (x)) if re.search(' GB', str(x)) else re.sub(' MB', '' , (x))/1024
        ).astype('float32')
                                                      
        print(f"There are about {round(size.sum(), 3)} GB need to be downloaded.")

@alexisahedo
Copy link
Author

alexisahedo commented Apr 28, 2022

I'm sorry, I already made a PL, should I close it?.

When I tried to compile my working script it showed me the next error:

Traceback (most recent call last):
  File "/api_pruebas_01/ost_prueba_02/ost_Sentinel1Scene_test.py", line 112, in <module>
    ost_s1.download_size(ost_s1.refine_inventory())
  File "/anaconda3/envs/venv_ost/lib/python3.8/site-packages/ost/Project.py", line 359, in download_size
    size = size.apply(
  File "/python3.8/site-packages/pandas/core/series.py", line 4433, in apply
    return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
  File "/python3.8/site-packages/pandas/core/apply.py", line 1082, in apply
    return self.apply_standard()
  File "/python3.8/site-packages/pandas/core/apply.py", line 1137, in apply_standard
    mapped = lib.map_infer(
  File "pandas/_libs/lib.pyx", line 2870, in pandas._libs.lib.map_infer
  File "/anaconda3/envs/venv_ost/lib/python3.8/site-packages/ost/Project.py", line 361, in <lambda>
    lambda x: re.sub(' GB', '' , (x)) if re.search(' GB', str(x)) else re.sub(' MB', '' , (x))/1024
TypeError: unsupported operand type(s) for /: 'str' and 'int'

@12rambau
Copy link
Collaborator

that's normal you're trying to divide a "str" by 1024. The result of the re.sub is a string so it needs to be casted to int first. Also why using re.sub insted of str.replace ?

 size = size.apply(
        lambda x: x.replace("GB", "") if "GB" in x else int(x.replace("MB", ""))/1024
).astype('float32')

@KBodolai
Copy link
Contributor

yes, my idea for regex was avoiding the if / else statement by just pattern matching to the string bit, but hadn't thought about the scale factor, so it ends up not providing that much of an advantage, in which case just the string method is probably best, sorry for the confusion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants