Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to implement makeCopy of shared file #53

Closed
SebastianMoyano opened this issue Aug 23, 2020 · 17 comments
Closed

how to implement makeCopy of shared file #53

SebastianMoyano opened this issue Aug 23, 2020 · 17 comments
Labels

Comments

@SebastianMoyano
Copy link

SebastianMoyano commented Aug 23, 2020

I want to implement the makeCopy function that it can be used in google script, i tried doing this

file2 = drive.CreateFile({'id': '1ybbyhC2BhypcMnttQHFcHA1pCCbLLLLv'})
file2.Upload() # Files.insert()

But it didn't work, also how can I upload a specific file?
thanks in advance!

@shcheklein
Copy link
Member

@SebastianMoyano just to clarify - you want to copy a file? (I'm not sure if makeCopy or shared are important here).

Unfortunately, I don't see a https://developers.google.com/drive/api/v2/reference/files/copy implementation at the moment - we can make this ticket a feature request, it looks like it's not that hard to add. I can guide someone through the PR process.

It's possible to move the file though (just in case that would be enough).

@SebastianMoyano
Copy link
Author

Yes, sorry for the late response, what I was looking for was to make a copy of a drive file which I have access to, this is because sometimes files that have been shared get a limit of downloads and the way to get around that is to make a copy into your own drive, at least for this specific situation it would help to have a feature like that.
Cheers!

@abubelinha
Copy link

You should take a look to the code, comments and links in this issue of the original PyDrive project.
That implementation works well.

I take the opportunity to ask if there is at least a simple documentation page which reflects improvements of this PyDrive2 project over the original PyDrive. It looks like PyDrive2 is not documented at all.
I have some old scripts based on PyDrive and I wonder if PyDrive2 would be backwards compatible or not. I am a bit lost.

@shcheklein
Copy link
Member

@abubelinha it should be backward compatible to almost 100%, we tried to keep the same API. I would PyDrive2 for now is about making PyDrive stable and performant- a lot of bug fixes, we also fixed and improved tests and run it regularly on CI, we improved performance of things (e.g. download previously was reading the whole file into memory- this crazy and unacceptable for real cases).

@ikarlmarx
Copy link

ikarlmarx commented Jan 3, 2022

    try:
        file1 = drive.CreateFile({"id": source_file_id})
        file1.FetchMetadata(fields="*")
        if file1.get("mimetype") == MIME_SHORTCUT_TYPE:
            file_id = file1.get("shortcutDetails").get("targetId")
            file1 = drive.CreateFile({"id": file_id})
            file1.FetchMetadata(fields="*")
            file_id = file1.get("id")
        if file1.get("mimeType") == MIME_FOLDER_TYPE:
            text = "This is a folder and cannot be copied."
        else:
	    body = {
		    "parents": [{"kind": "drive#fileLink",
		                 "id": target_folder_id}],
		    'title': newtitle if newtitle else file1.get("title")
		   }
	    return drive.auth.service.files().copy(fileId=source_file_id,supportsAllDrives=True, body=body).execute()
    except ApiRequestError as e:
        if e.error.get("code") == 404:
            text = (f"Source file {file_id} not exist.")
        else:
            text = e
    except errors.HttpError  as e:
        if e.error.get("code") == 404:
            text = (f"Source file {file_id} not exist.")
        else:
            text = e
    except Exception as e:
        text = e 
        
    print(text)
    return None

@ikarlmarx
Copy link

MIME_SHORTCUT_TYPE = 'application/vnd.google-apps.shortcut'
MIME_FOLDER_TYPE = 'application/vnd.google-apps.folder'

def copy_file_remote(drive, source_file_id, target_folder_id, newtitle=None):

@shcheklein
Copy link
Member

Fixed by #188

@abubelinha
Copy link

abubelinha commented Jul 25, 2022

Thanks a lot to @simone-viozzi for Copy()!
I tried this with a simple file and it worked.
I have some related questions after looking at the new function:

def Copy(self, target_folder=None, new_title=None, param=None):
        """Creates a copy of this file. Folders cannot be copied.
        :param target_folder: Folder where the file will be copied.
        :type target_folder: GoogleDriveFile, optional
        :param new_title: Name of the new file.
        :type new_title: str, optional
        :param param: addition parameters to pass.
        :type param: dict, optional
        :raises ApiRequestError
        :return: the copied file
        :rtype: GoogleDriveFile
        """

Where it says "folders cannot be copied", is that a limitation imposed by this function simple implementation?
Or is that a limitation imposed by Google API? (so it is not possible to copy folders using a simple API call, as it is not possible to do it using the web interface).

Anyway, to copy a whole folder I figure out the workaround would be creating my own function to create an empty target-copy folder, and then list the files inside the folder to be copied and make successive calls to this Copy function to copy them all.

So, here are my questions:

(1) Do I have to take care of hitting Drive limits (because of too many successive api calls), or does PyDrive2 somehow take care of this itself? If not ... how much time lapse do you suggest to I implement between Copy() calls?

(2) Google Drive web interface permits to download a whole folder content as a zipped file (which contains all files and subfolders, so you can recreate them locally when unzipping). Should I open a new feature request issue, or would this already be possible using PyDrive2 too? (haven't found any examples or mentions to this when searching issues)

(3) What about the opposite? (reconstruct a file structure in Gdrive, starting from a local zipped file) I guess it's not possible (this is the closest example I found in Google, but it uses Colab notebook, not a local Python terminal).

My objective is to find the best PyDrive2 approach to copy structured contents (folders and files inside) from one Gdrive place to another. I am guessing the zip download (2) could be faster, but only if zip reupload (3) and unzip is also possible. Otherwise, option (1) is the only alternative I can figure out.

Many thanks in advance for your suggestions.

@shcheklein
Copy link
Member

Thanks a lot! I tried this and it worked.

Thanks to @simone-viozzi :)

Or is that a limitation imposed by Google API?

I think it is a limitation in the API + it's not even available in the UI.

Do I have to take care of hitting Drive limits (because of too many successive api calls), or does PyDrive2 somehow take care of this itself?

PyDrive2 doesn't take care about this. But there is a simple wrapper that you could use:

https://github.com/iterative/PyDrive2/blob/main/pydrive2/test/test_util.py#L46-L54

Google Drive web interface permits to download a whole folder content as a zipped file

hmm, from trying to Google it a bit - that's the best I could find https://github.com/tanaikech/ZipFolder ... but I'm not sure there is a built-in API of that kind. I don't guarantee it, I would personally try to do a bit more research.

My objective is to find the best PyDrive2 approach to copy structured contents (folders and files inside) from one Gdrive place to another. I am guessing the zip download (2) could be faster, but only if zip reupload (3) and unzip is also possible. Otherwise, option (1) is the only alternative I can figure out.

My feeling it that using the fsspec interface of PyDrive2 should be the easiest way.

https://github.com/iterative/PyDrive2#fsspec-filesystem
https://docs.iterative.ai/PyDrive2/fsspec/

It handles retries also.

But unfortunately, copy() is implemented naively still there (with in-memory copy first), and is not using the newly implemented Copy() by @simone-viozzi .

https://github.com/iterative/PyDrive2/blob/main/pydrive2/fs/spec.py#L536-L543

And it doesn't implement the recursive logic. So you would have to do os.walk and copy files one by one for now. Still it can be more convenient. Give it a try and we might even implement more things if we see that it fits your workload.

@abubelinha
Copy link

Thanks a lot @shcheklein for all your comments.
I'll see what I can do. fsspec is something completely new for me, but if I understand you correctly, using the new PyDrive2.Copy() function all the copying work would be done at Google servers, whereas fsspec.copy() would download the file and use my machine memory?

@simone-viozzi
Copy link
Contributor

Yes you are correct.

I plan on doing a PR to let the user chose if they want to do a server side copy (with GoogleDriveFile.Copy) or download and re-upload the file (behavior of fsspec.copy).

@shcheklein
Copy link
Member

@simone-viozzi

thanks!

or download and re-upload the file

What would be the reason to support this?

@simone-viozzi
Copy link
Contributor

@shcheklein it's the current behavior of fs.spec.cp_file:

def cp_file(self, lpath, rpath, **kwargs):
"""In-memory streamed copy"""
with self.open(lpath) as stream:
# IterStream objects doesn't support full-length
# seek() calls, so we have to wrap the data with
# an external buffer.
buffer = io.BytesIO(stream.read())
self.upload_fobj(buffer, rpath)

it's equivalent to download and re-upload.

@shcheklein
Copy link
Member

Yep, I know, but we don't need to preserve it - this should be an implementation detail.

@abubelinha
Copy link

abubelinha commented Jul 30, 2022

Not sure if my last question was misunderstood.
I want to copy a whole folder but I want to avoid usage of my machine memory as much as possible.

So if inner files have to be copied one by one, current behaviour is indeed what I want for my use case (let Gdrive server make each file copy by itself, without downloading anything).

I would only interested in downloading and reuploading if Gdrive api lets do it this way:

  1. api method to download a zipped folder (containing all the structured recursive contents of that folder)
  2. api method to upload and expand one of those zip files to a different Gdrive place (so Gdrive reconstructs the original tree structure and contents in that destination).

But I am afraid none of those functionalities are possible.
So I think for now the best option is designing a new CopyFolder function which accepts source and destination folders, and parses all inner files and folders recursively, creating destination folder and subfolders and copying all folders by using the new Copy method).

I think it would be interesting to have such a function available as a PyDrive2 method. Hopefully, implementing some kind of inner control to avoid hitting Gdrive api usage limits (i.e. a parameter with a default time-lapse between Copy method calls).

def CopyFolder(srcFolderId, destParentId, destFolderName, timeLapse)

Unfortunately I am totally git-illiterate so I cannot afford it (PRs and all that).
But if you think it's useful I can open an issue asking for this new feature request.

@elisevansbbfc
Copy link

Can I make Copy() overwrite or raise exceptions in cases where the destination file path already exists?

For example, suppose I am trying to copy a file file_A.csv (keeping the same name) within a folder dir_A to a folder dir_B, but there already exists such a file dir_B/file_A.csv. Currently Copy just writes a 2nd file dir_B/file_A.csv so that if I go to Google Drive UI I will see 2 instances of this file.

But can I use Copy so that it overwrites that file? Or so that it gives an error in such a case?

@shcheklein
Copy link
Member

@elisevansbbfc I don't think it's possible, you would have to run a query, e.g. fs.exists from the fsspec implementation - which is an alternative high level API to use to deal with google drive in this package.

Copy() itself is implemented using https://developers.google.com/drive/api/reference/rest/v3/files/copy. And I don't see any flags one can pass to avoid duplicate titles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants