Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling Google Documents (V2) #1

Open
cjnaz opened this issue Jul 15, 2018 · 32 comments
Open

Handling Google Documents (V2) #1

cjnaz opened this issue Jul 15, 2018 · 32 comments
Labels
enhancement New feature or request

Comments

@cjnaz
Copy link
Owner

cjnaz commented Jul 15, 2018

This is a copy from https://github.com/cjnaz/RCloneSync/issues/22 to move the issue onto the V2 baseline.

Goggle document files are created via the web interface to Google Drive, and appear without any file
extension on the web interface. They are not stored as standard files, and thus cannot be natively
copied off of and back onto the Google Drive directory tree. A native Google document file can only
exist on the Google Drive. Instead, they may be exported in various formats that are supported by
third party tools, such as Microsoft Excel (.xlsx), or OpenDocument format (.odt). Within the
web-based Google document tool (such as Google Sheets) the document may be saved to a normal file
in various formats via File > Download as > …, which creates a real file in the browser's download
directory.

An rclone lsl of the Google Drive for a native Google document will show up as having an extension
appropriate for the document type, such as .xlsx, .docx, etc (based on the --drive-formats switch),
while the Google Drive web interface shows no file extension. Also, the Google document will show
no size on the web interface, while rclone lsl returns a size of -1. An rclone copy of a Google
document will effectively export the document to a normal file, again based on the --drive-formats
switch. The exported file will have a real size and have the exact same date-stamp as the original
Google document file.

The problem comes up if we attempt to upload a modified exported document, with the same name,
back to Google Drive: "Failed to copy: can't update a google document". Similarly, deleting the
local copy and then rclone syncing local to Drive: results in "Couldn't delete: can't delete a google
document".

Proposed behavior for rclonesync:

A new --export-google-docs switch will export (copy to Local) every found Google document as
_export.. The final rclone sync will put a copy of the exported document on Google
Drive as well. The date-stamp on the file will match the original Google document.

Later, with --export-google-docs, If the native Google document datestamp is newer, and the
existing exported file is unchanged (relative to the prior rclonesync), then the document will
be exported again, replacing the current _export file.

If the exported file is changed (newer) then it will be synchronized as usual/normal, but the
native Google document cannot be updated. Effectively, the file version and the native version
are out of sync.

If the exported file had previously been changed, and now the native Google document is changed,
then --export-google-docs will blast/replace the file version. User beware. There is no reliable
way to detect that the file version and the native Google document version have both been
modified. The user is advised to rename the modified file version if he/she plans to subsequently
edit the native Google document via the web interface.

If --export-google-docs switch is not set then all Google document files (as indicated by
file size -1) will be ignored. Previously exported documents (which are now regular files)
will be synchronized as usual.

DOES THIS LOOK LIKE THE BEST / PROPER HANDLING? COMMENTS PLEASE.

@cjnaz
Copy link
Owner Author

cjnaz commented Jul 15, 2018

@Fabian42 's response:

The official Google Drive client for Windows just creates a link to the Google Document with the correct icon (doc, sheet, etc.). I would prefer this, because I use Google Docs for document creation and editing (I don't even have a program installed for that). For offline copies of Google documents, the Chrome extension by Google can be used that makes Google Docs work offline almost as if you had a connection. It's officially supported and already handles edit conflicts.

Remark about the importance of this issue: Currently RCloneSync does not work for me at all because of this, because --firstSync fails. I don't know how it behaves for others, it should be the same for everyone who has a Google document on his account.

@cjnaz
Copy link
Owner Author

cjnaz commented Jul 15, 2018

Using the mechanisms provided by rclone, I don't have any better ideas.

@ncw ... What do you think?

@cjnaz
Copy link
Owner Author

cjnaz commented Jul 25, 2018

@Fabian42, please open a new issue at https://github.com/ncw/rclone. This is feature is best handled by rclone.

@Fabian42
Copy link

Fabian42 commented Aug 2, 2018

They replied: rclone/rclone#1349 (comment)
Could RCloneSync maybe just use the --drive-skip-gdocs option always? Or is it supposed to be as similar to the official client on Windows as possible?

@cjnaz
Copy link
Owner Author

cjnaz commented Aug 2, 2018

Since rclonesync cannot actually sync a Google Doc, currently V2 outputs a noisy log message and skips/ignores the Google Doc. The Noisy Log message serves to make the user aware that files have been skipped. Rather than hard coding --drive-skip-gdocs into rclonesync, I recommend that the user create an environment var to directly modify rclone's behavior. That way the user is explicitly skipping docs rather that rclone sync hiding the fact that these files are not synced.

@Fabian42
Copy link

Fabian42 commented Aug 2, 2018

But isn't that not very user-friendly? You wouldn't expect to have to set an environmental variable to make the program behave properly. And I don't see a reason not to have the program use it, since it literally does not work without it. The alternatives are:

  • use --drive-skip-gdocs
  • fail syncing or building index
  • implement own solution

Of course the user could be notified, but just failing is not good behaviour.

@cjnaz
Copy link
Owner Author

cjnaz commented Aug 2, 2018

With the latest commit the sync will not fail. It's simply a question of whether the user wants to see the warnings in the output log. If not, set the environment var. Note that a pending enhancement for rlcone is to allow remote specific switches, such as --drive-skip-gdocs to be placed in the rclone config file, instead of setting environment vars.

If rclonesync simply always asserted --drive-skip-gdocs then the user wouldn't have any idea that certain files are not being synced.

So since the sync does not fail (the log is a bit noisy) do you need to implement your own solution?

@Fabian42
Copy link

Fabian42 commented Aug 3, 2018

It does fail:
2018/08/03 19:29:41 ERROR : documents/google_docs/youtube_script.xlsx: Couldn't delete: can't delete a google document
And at the end:

2018/08/03 19:45:49 ERROR : Google drive root '': not deleting directories as there were IO errors
2018/08/03 19:45:49 ERROR : Attempt 3/3 failed with 8 errors and: failed to delete 8 files
2018/08/03 19:45:49 Failed to sync: failed to delete 8 files
2018-08-03 19:45:49,534/:    WARNING  rclone sync try 2 failed.           - /home/fabian/drive/
2018-08-03 19:45:49,535/:    ERROR    rclone sync failed.  (Line 384)     - /home/fabian/drive/
2018-08-03 19:45:49,560/:  ***** Critical Error Abort - Must run --FirstSync to recover.  See README.md *****
2018-08-03 19:45:49,567/:  >>>>> All done.

@cjnaz
Copy link
Owner Author

cjnaz commented Aug 3, 2018 via email

@cjnaz
Copy link
Owner Author

cjnaz commented Aug 3, 2018

One possible scenario that may fail... Had you previously exported the youtube_script Goggle sheet to an .xlsx file, and then synced to your local filesystem, then deleted it on the local filesystem? I think rclone sync would try to delete the Google doc on the Drive:, which would fail as shown.

If you are running V2.1, try eliminating all exported Google Doc files on your local filesystem (or renaming them or putting them in a different local directory) so that they do not conflict with the Google Docs, then do a --first-sync.
Another experiment is to replace the ['--min-size', '0'] on line 445 with ['--drive-skip-gdocs'] to see if the results are any different.

@Fabian42
Copy link

Fabian42 commented Aug 4, 2018 via email

@cjnaz
Copy link
Owner Author

cjnaz commented Aug 5, 2018

I did delete all Google Docs locally, then used rclone drive->local
with --drive-skip-gdocs to set my local filesystem to a copy of the cloud
except for Google docs, then used RCloneSync.

Effectively, that's what an rclonesync --first-sync does, so you didn't need to manually copy the files before running the --first-sync.

The output says "2018-08-04 07:23:23,749: >>>>> --first-sync copying any
unique Path2 files to Path1. Does that mean that it copies everything from
the second path that I enter to the first one? That's unintuitive and the
opposite of what rclone does.

>>>>> --first-sync copying any unique Path2 files to Path1 - Only unique files are copied to Path1. Since you had manually made the two filesystems match then no files were copied at this step. In all cases, rclonesync only copies file changes.
That's unintuitive and the > opposite of what rclone does. - Please see the NOTE ON CHANGING TO VERSION 2.1 in the README.md. You are correct that rclone copy src dest has a fixed direction from the left path to the right path. However, in the case of rclonesync, both the left path and the right path will be made to have the same content. rclonesync does this by making Path1 correct (by copying or deleting files based on the changes on Path2) then using rclone sync to make Path2 match the updated Path1. Per the NOTE ON CHANGING TO VERSION 2.1, it may be more efficient if Path1 is the local path, but the Path1/Path2 order functionally does not matter and the result will be the same. Please read the README.md carefully.

Now I have to try some other cases. If it runs as a cron job regularly, it
can often happen that the system shuts down while it's running, was that
tested? What if it happens in the middle of a download?
What if I make changed to Google Drive or my local copy while the script is
running? I have some program data on Google Drive that can likely be
modified on my Windows PC with the official Google Drive client and on my
laptop with RCloneSync, sometimes even at the same time.

Lots of cases here. I'll speak to each:

  1. System shutdown (assuming you do not mean sleep, where the process would resume later) - If you shutdown during the sync then the run will be incomplete. The LSL files will be left in a state as if the run did not happen, and the next run should run correctly.
  2. System sleep with resume later - Later, when the run continues, the state of files may not be the same. If a file that rclonesync wants to copy was deleted in the interim then rclonesync will have a Critical Error abort, which requires a --first-sync to recover from.
  3. System shutdown or sleep during a file copy or rclone sync - The file being transferred may be corrupted on the destination. This is not an rclonesync issue. On a later run the corrupted file would be copied again, so permanent damage is low risk.
  4. Changes made on Path2 up to V2.1 - If the file was identified as changed at the beginning of the rclonesync run (the Initial LSL files) and the file was changed again before it was copied to the other Path1, then the newer file will be copied. If the file was not identified as changed before the beginning of the run then the change will be overwritten when the final rclone sync of Path1 to Path2 is done at the end of the run. This is a problem.
  5. Changes made on Path1 up to V2.1 - Files changed on Path1 will always be pushed to Path2 by the rclone sync at the end of the run.

It would be really nice if it was possible to actually listen for local and
remote changes and only apply them. How does the official client on Windows
do that?

Understood. The official clients have file change listener agents/processes and can react immediately on change. rclone does not have a monitoring function (that I'm aware of).

It would save loads of bandwidth and power and make syncing much
faster. I already have this problem on my phone where FolderSync used 3.3GB
this week and that's only from syncing certain folders.
...
Alternatively, couldn't RCloneSync only check files that were changed after
the last sync

rclonesync only transfers changed files. No bandwidth (aside from getting the LSL) is used if there are no file changes.

2018/08/04 08:27:53 ERROR :
programs/.minecraft/resourcepacks/1.9/assets/minecraft/structures/endcity:
error listing: couldn't list directory: googleapi: Error 403: Rate Limit
Exceeded, rateLimitExceeded

This is a limit imposed by Google. Search the rclone forums for this topic. rclone addresses this with retries. See TROUBLESHOOTING.md.

Also, it took nearly half an hour from the second to last line "2018-08-04
08:46:26,316: 2 file change(s) on Path1: 1 new, 0 newer, 0
older, 1 deleted" to the last line "2018-08-04 09:15:08,172: >>>>>
Successful run. All done.". What did it do in that time?

Turn on --verbose to see each operation. I'm guessing that the file is big, and your upload bandwidth is limited. In V2.1, the two Path1 changes are pushed to Path2 using an rclone sync command. Turn on rclone's verbose (using the rclonesync --rc-verbose switch) to see it's operation log.

Lastly, please read and understand the README.md and TROUBLESHOOTING.md documentation. I have put a lot of work into trying to make it clear how rlonesync works, and what you need to know. RTFM. You'll ask more insightful questions.

PS: For handling file changes during the rclonesync run, I am defining some changes for how the tool works, which I'll open a separate issue for.

@Fabian42
Copy link

Fabian42 commented Aug 5, 2018 via email

@cjnaz
Copy link
Owner Author

cjnaz commented Aug 5, 2018

Try rclone lsl Drive: > drivelist.txt. How long does it take, and how big is the output file?

@Fabian42
Copy link

Fabian42 commented Aug 7, 2018 via email

@cjnaz
Copy link
Owner Author

cjnaz commented Aug 7, 2018 via email

@Fabian42
Copy link

Fabian42 commented Aug 7, 2018 via email

@cjnaz
Copy link
Owner Author

cjnaz commented Aug 7, 2018 via email

@Fabian42
Copy link

Fabian42 commented Aug 7, 2018 via email

@cjnaz
Copy link
Owner Author

cjnaz commented Aug 7, 2018

One thing puzzled me. You say you have ~14900 files, but the lsl file has ~21400 lines. These two numbers should match. Can you identify what the extra ~7000 lines are? Perhaps you are seeing line wraps in your line counting.

I'll check with @ncw on how rclone sync gets the source and destination files lists, and if there is a better way to do this than lsl.

I stumbled into https://syncthing.net/, which might be what you are looking for.

@Fabian42
Copy link

Fabian42 commented Aug 7, 2018 via email

@cjnaz
Copy link
Owner Author

cjnaz commented Aug 7, 2018 via email

@Fabian42
Copy link

Fabian42 commented Aug 8, 2018 via email

@cjnaz
Copy link
Owner Author

cjnaz commented Aug 8, 2018 via email

@Fabian42
Copy link

Fabian42 commented Aug 8, 2018 via email

@Fabian42
Copy link

Fabian42 commented Sep 2, 2018

There were a lot of changes in rclone: rclone/rclone#2479
Does that fix the handling of Google Docs also for RCloneSync?

@ncw
Copy link

ncw commented Sep 4, 2018

rclone/rclone#2479 isn't merged yet - it needs testers :-)

@Fabian42
Copy link

Fabian42 commented Oct 4, 2018

It has been merged now. I haven't tested it myself so far.

@cjnaz
Copy link
Owner Author

cjnaz commented Oct 5, 2018

I see it in the beta, and quickly read the documentation for --drive-import-formats. It will need some experimentation to figure out how it will play with rclonesync. It does note that the conversion can be lossy.

@adempewolff
Copy link

I'm not sure I understand exactly what is going on, but currently rclone copy and rclone sync appear to create msoffice extension versions of the gdocs to sync locally. Is it possible to turn off rclonesyncv2's skipping behavior to take advantage of this?

@mlaverdiere
Copy link

Same question as above: since current rclone default really does a good job of copying/syncing google documents (converting them to docs documents), is there a way to modify the rclonesync script to just rely on this default rclone behaviour? That's the only thing I miss right now to have a complete satisfying syncing solution with rclonesync... (btw: thanks for this great work).

@pinpins
Copy link

pinpins commented Jan 9, 2021

I think it requires change to below line of code

LINE_FORMAT = re.compile(r'\s*([0-9]+) ([\d\-]+) ([\d:]+).([\d]+) (.*)')

where regexp should be fixed to capture also size -1.

  1. and secondly just using latest rclone which does take care of extensions for gDocs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants