Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sabre\DAV\Exception\NotFound exception in PROPFIND response when file does not have a read permission bit #40460

Open
mdusher opened this issue Oct 31, 2022 · 19 comments
Assignees
Labels

Comments

@mdusher
Copy link
Contributor

mdusher commented Oct 31, 2022

I have been unable to find out how the affected file ends up with no read permission bit set on it to start with, but have been able to replicate it by manually removing the read permission that is set in the oc_filecache.

Steps to reproduce

  1. Create a directory
  2. Upload a file (multiple is better)
  3. Remove the read permission from the file
  4. Perform a PROPFIND on the directory

Example using owncloud/server:10.11 docker image

# start the owncloud container and wait for it to start
docker run --rm --name ocpropfind -d -e OWNCLOUD_DOMAIN=localhost:8080 -p8080:8080 owncloud/server:10.11
# run a propfind so that the user's skeleton storage get's created
docker exec ocpropfind curl -s --user admin:admin http://127.0.0.1:8080/remote.php/webdav/Photos -X PROPFIND >/dev/null 2>&1
# remove the read permission from a file
docker exec ocpropfind sqlite3 /mnt/data/files/owncloud.db 'update oc_filecache set permissions=26 where path="files/Photos/Lake-Constance.jpg";'
# perform a propfind to get an exception in the response
docker exec ocpropfind curl -s --user admin:admin http://127.0.0.1:8080/remote.php/webdav/Photos -X PROPFIND

Expected behaviour

There should be no exception in the PROPFIND response.

Actual behaviour

A Sabre\DAV\Exception\NotFound exception is printed mid-PROPFIND response making the response invalid.

The behaviour of the client varies.

  • Chrome will list files up to the exception.
  • Firefox does not list the parent directory at all.
  • ownCloud sync client presents an "Unknown error" and does not continue syncing files.
<?xml version="1.0"?>
<d:multistatus xmlns:d="DAV:" xmlns:s="http://sabredav.org/ns" xmlns:oc="http://owncloud.org/ns"><d:response><d:href>/remote.php/webdav/Photos/</d:href><d:propstat><d:prop><d:getlastmodified>Mon, 31 Oct 2022 01:11:44 GMT</d:getlastmodified><d:resourcetype><d:collection/></d:resourcetype><d:quota-used-bytes>1011464</d:quota-used-bytes><d:quota-available-bytes>-3</d:quota-available-bytes><d:getetag>&quot;635f20d04519e&quot;</d:getetag></d:prop><d:status>HTTP/1.1 200 OK</d:status></d:propstat></d:response><?xml version="1.0" encoding="utf-8"?>
<d:error xmlns:d="DAV:" xmlns:s="http://sabredav.org/ns">
  <s:exception>Sabre\DAV\Exception\NotFound</s:exception>
  <s:message/>
</d:error>

Server configuration

Example above uses a fresh owncloud/server:10.11 docker image with no changes.

Logs

Web server error log

[Mon Oct 31 01:11:42.398393 2022] [mpm_prefork:notice] [pid 161] AH00163: Apache/2.4.41 (Ubuntu) configured -- resuming normal operations
[Mon Oct 31 01:11:42.398437 2022] [core:notice] [pid 161] AH00094: Command line: '/usr/sbin/apache2 -f /etc/apache2/apache2.conf -D FOREGROUND'
127.0.0.1 - admin [31/Oct/2022:01:11:43 +0000] "PROPFIND /remote.php/webdav/Photos HTTP/1.1" 207 2810 "-" "curl/7.68.0"
127.0.0.1 - admin [31/Oct/2022:01:12:00 +0000] "PROPFIND /remote.php/webdav/Photos HTTP/1.1" 207 1750 "-" "curl/7.68.0"

ownCloud log (data/owncloud.log)

{"reqId":"fKqktKk5TSB79JpvZAZw","level":3,"time":"2022-10-31T01:12:00+00:00","remoteAddr":"127.0.0.1","user":"admin","app":"PHP","method":"PROPFIND","url":"\/remote.php\/webdav\/Photos","message":"Cannot modify header information - headers already sent by (output started at \/var\/www\/owncloud\/lib\/composer\/sabre\/http\/lib\/Message.php:117) at \/var\/www\/owncloud\/lib\/composer\/sabre\/http\/lib\/Sapi.php#66"}
{"reqId":"fKqktKk5TSB79JpvZAZw","level":3,"time":"2022-10-31T01:12:00+00:00","remoteAddr":"127.0.0.1","user":"admin","app":"PHP","method":"PROPFIND","url":"\/remote.php\/webdav\/Photos","message":"Cannot modify header information - headers already sent by (output started at \/var\/www\/owncloud\/lib\/composer\/sabre\/http\/lib\/Message.php:117) at \/var\/www\/owncloud\/lib\/composer\/sabre\/http\/lib\/Sapi.php#70"}
{"reqId":"fKqktKk5TSB79JpvZAZw","level":3,"time":"2022-10-31T01:12:00+00:00","remoteAddr":"127.0.0.1","user":"admin","app":"PHP","method":"PROPFIND","url":"\/remote.php\/webdav\/Photos","message":"Cannot modify header information - headers already sent by (output started at \/var\/www\/owncloud\/lib\/composer\/sabre\/http\/lib\/Message.php:117) at \/var\/www\/owncloud\/lib\/composer\/sabre\/http\/lib\/Sapi.php#70"}
{"reqId":"fKqktKk5TSB79JpvZAZw","level":3,"time":"2022-10-31T01:12:00+00:00","remoteAddr":"127.0.0.1","user":"admin","app":"PHP","method":"PROPFIND","url":"\/remote.php\/webdav\/Photos","message":"Cannot modify header information - headers already sent by (output started at \/var\/www\/owncloud\/lib\/composer\/sabre\/http\/lib\/Message.php:117) at \/var\/www\/owncloud\/lib\/composer\/sabre\/http\/lib\/Sapi.php#70"}
@mdusher
Copy link
Contributor Author

mdusher commented Oct 31, 2022

I have been able to mitigate the issue by skipping the file in the PROPFIND response if it is unreadable, but I'm not sure if this causes any other issues.

Commit: mdusher@3e971a6

@pako81
Copy link

pako81 commented Nov 4, 2022

@jvillafanez what do you think?

@cdamken
Copy link
Contributor

cdamken commented Nov 4, 2022

2. Upload a file (multiple is better)

Uploading over the WebUI or over the sync client?

manually removing the read permission that is set in the oc_filecache.
3. Remove the read permission from the file

Why do you have to remove the read permissions? - This is not a normal procedure and no normal user has the possibility to access the DB to modify the permissions.
Why would you like to upload a file and remove the read permits?

Those are not normal steps to reproduce. why this file is uploaded and changed the behavior to read-only "automagically"
Is the folder shared with other users?

Commit: mdusher@3e971a6

The pull request just ignores files that exist and won't be shown in the WebUI and the sync clients, this will generate appearing and disappearing files without a reason when you run a occ files:scan
The user won't see the missing file and will try to upload again and oC will say the file exists already.

@cdamken
Copy link
Contributor

cdamken commented Nov 4, 2022

I checked in an instance, I have arrond 500K files with other permisions that 27:

root@snoopy4:~# occ_db "select distinct(permissions) from oc_filecache where permissions <> 27"
+-------------+
| permissions |
+-------------+
|          23 |
|          31 |
|          21 |
|          17 |
|           1 |
|           5 |
|           3 |
|           7 |
|          15 |
|          11 |
|          25 |
|           9 |
+-------------+

But with 26 I have none in about 3 million files


root@snoopy4:~# occ_db "select count(*) from oc_filecache where permissions = 26"
+----------+
| count(*) |
+----------+
|        0 |
+----------+
root@snoopy4:~# 

@mdusher Could you please check how many files have permissions with 26?

@cdamken
Copy link
Contributor

cdamken commented Nov 4, 2022

Could it be race condition?

1.- You upload a lot of files
2.- The files are being uploaded in the EOS storage, and at the moment they are being replicated are marked as read-only and so are saved in the oc_filecache
3.- The permits from the storage have been changed after the replication.


Step 4- Running an occ files:scan reads the files again and repairs the permits as they should be.

@jvillafanez
Copy link
Member

streaming vs no streaming

I think ownCloud is streaming the responses, at least for common webdav endpoints. This leads to faster response time because the data is sent as soon as it's available. However, as shown in this ticket, errors aren't properly handled because part of the data has already been sent. Response headers are already sent with a 207 status, and some data has reached the client. You can't change the status code to a 404 and ignore all the data sent.

There should be already an config option to set the streaming option off (I'm not sure if we have an option or it's hardcoded). If the streaming is off, all the data needs to be available in the server before sending it to the client. This leads to a very slow response time for large responses, specially with an infinity-depth propfind because the server need to check all the files. However, if an error happens, since no data has been sent yet, a proper error response could be sent.

I think the sabreDav should have a plan on what to do with the errors that happens in the response if streaming is active. I quite convinced that we depend on what sabreDav provides to deal with this issue, if it provides anything.

read permission missing

As far as I know, that comes from the FS. There is no reason for ownCloud to remove the permission, specially when no user can fiddle with permissions except for shares (which have their own permissions).

Without valid steps to reproduce the missing permission I don't think we can do anything. Manually changing the permission in the DB is fine to reach a similar state in order to fix a different issue, which would be the streaming one, but the streaming issue is much bigger and needs a different approach. What's more, fixing one issue won't fix the other.

about the provided patch

I don't think it's a good idea. The actual storage implementation should be the one deciding whether the file should be skipped entirely or not. Skipping a file in the sabre connector seems a bad choice because any other component below the connector, and in particular the ownCloud FS, will still consider the file as valid but without read permission.
Note that the storage implementation should be the lowest piece of the dependency tree, so if the storage implementation says the file doesn't exist, then it doesn't exist for the whole ownCloud.

The sabre connector might need to deal with the error somehow. The main responsibility should be to transform the average FS view and actions into a sabreDav tree node. There should be no room to manipulate the data in a different way other than presentation


From my side, I think there are 2 problems: the "NotFound" exception happening (which should be handled somehow), and the exception (in fact any exception) breaking the XML format and causing issues to the clients.

@mdusher
Copy link
Contributor Author

mdusher commented Nov 4, 2022

@jvillafanez's summary covers what I would like to see resolved

From my side, I think there are 2 problems: the "NotFound" exception happening (which should be handled somehow), and the exception (in fact any exception) breaking the XML format and causing issues to the clients.

And thank you for taking the time to give feedback on the work around. I do agree that the sabre connector should just be for presentation - my line of thinking was mostly about reducing the blast radius for the user (ie. only thinking 1 file is missing, rather than multiple). Users tend to panic when they can't see their data :)


@cdamken To answer your questions:

Why do you have to remove the read permissions?

The reason for removing the read permissions is that it is the only way I know of to replicate the NotFound exception. I do not have an understanding of how these permission are initially set, when they may be modified or what might modify them so this was the quickest way.

Uploading over the WebUI or over the sync client?

The user who reported the issue to me was using the sync client (2.11.1).

Is the folder shared with other users?

Yes, it is part of a shared folder.

Could you please check how many files have permissions with 26?

26 is just an example - there is about 240 files in the filecache that do not have a read permission bit set out of ~460 million files.

MariaDB [owncloud]> select permissions, count(permissions) from filecache where permissions > 0 and BIN(permissions) & BIN(1) = 0 group by permissions;
+-------------+--------------------+
| permissions | count(permissions) |
+-------------+--------------------+
|           4 |                  2 |
|           8 |                  2 |
|          10 |                  3 |
|          14 |                  1 |
|          16 |                  7 |
|          18 |                  2 |
|          24 |                 20 |
|          26 |                201 |
|          28 |                  1 |
|          30 |                  1 |
+-------------+--------------------+

Could it be race condition?

It is entirely possible it is a race condition. I do not think it would be due to the replication as my understanding is that this is done asynchronously after the file is created. There has also been instances where we have run occ files:scan over a problematic file for the problem to reoccur at a later time (not sure on how much later, as we only become aware of it once a user reports it).

@IljaN
Copy link
Contributor

IljaN commented Nov 7, 2022

I think the sabreDav should have a plan on what to do with the errors that happens in the response if streaming is active. I quite convinced that we depend on what sabreDav provides to deal with this issue, if it provides anything.

@DeepDiver1975 any hints or ideas? See #40460 (comment)

@cdamken
Copy link
Contributor

cdamken commented Nov 7, 2022

@mdusher could you please check which kind of files are all that are in the list?

Knowing that:

1.- The files should be reachable by the apache user (are added and removed only over the owncloud)
2.- the storage is EOS and the only error happens over the database,
3.- There is no reason that the file should be not readable for the apache user (permission 26)

A "quick" test o workaround only for @mdusher could be that if the permit is 26 we force as 27 when the file is added.

@IljaN @jvillafanez But what I can see is that all these files can't be read:

> +-------------+--------------------+
> | permissions | count(permissions) |
> +-------------+--------------------+
> |           4 |                  2 |
> |           8 |                  2 |
> |          10 |                  3 |
> |          14 |                  1 |
> |          16 |                  7 |
> |          18 |                  2 |
> |          24 |                 20 |
> |          26 |                201 |
> |          28 |                  1 |
> |          30 |                  1 |
> +-------------+--------------------+

They have the 1 (First bit) in 0 and if you compare with #40460 (comment)

@mdusher
Copy link
Contributor Author

mdusher commented Nov 8, 2022

@cdamken These are the mimetypes of the files:

MariaDB [owncloud]> select t.mimetype, count(*) from (select (select mimetypes.mimetype from mimetypes where id=filecache.mimetype) as mimetype from filecache where permissions > 0 and BIN(permissions) & BIN(1) = 0) t group by t.mimetype;
+-------------------------------------------------------------------------+----------+
| mimetype                                                                | count(*) |
+-------------------------------------------------------------------------+----------+
|                                                                         |        2 |
| application/dicom                                                       |       65 |
| application/javascript                                                  |        1 |
| application/json                                                        |        2 |
| application/msword                                                      |        2 |
| application/octet-stream                                                |       85 |
| application/pdf                                                         |        9 |
| application/postscript                                                  |        2 |
| application/vnd.ms-excel.sheet.macroEnabled.12                          |        1 |
| application/vnd.openxmlformats-officedocument.wordprocessingml.document |        3 |
| application/x-gzip                                                      |        1 |
| application/x-tar                                                       |        1 |
| application/x-tex                                                       |        4 |
| application/xml                                                         |        2 |
| httpd/unix-directory                                                    |       22 |
| image/bmp                                                               |        1 |
| image/jpeg                                                              |        5 |
| image/png                                                               |       13 |
| image/tiff                                                              |        2 |
| text/csv                                                                |        2 |
| text/html                                                               |        4 |
| text/plain                                                              |        4 |
| text/x-java-source                                                      |        3 |
| text/x-python                                                           |        1 |
| text/x-shellscript                                                      |        1 |
| video/quicktime                                                         |        1 |
+-------------------------------------------------------------------------+----------+

I checked the permissions and owner of all the files and all but 8 of the files are readable by the apache user, 7 of those no longer exist and 1 is a file I was using to attempt to replicate this issue so is not in a normal state.
All files that existed were owned by the apache user.

@cdamken
Copy link
Contributor

cdamken commented Nov 9, 2022

@jvillafanez could you build in the patch #40480 some logging where we can see: the owner of the file or folder, the user who accessed it and the name of the file or folder?

@mdusher

| | 2 |

This looks like an error or the mime types are not defined.

| httpd/unix-directory | 22 |

are those folders in a subfolder from a share or is the share?

but 8 of the files are readable by the apache use

This looks like an EOS problem, or who could put files directly with a different user that is not apache? - workaround change the permits to owner apache, group apache and 750 to the file or folder.

7 of those no longer exist

We can clean with an occ files:scan or remove directly the entry in the DB (not always recommended)

and could you try the patch, and see if the problem does not happen again?

@jvillafanez
Copy link
Member

@jvillafanez could you build in the patch #40480 some logging where we can see: the owner of the file or folder, the user who accessed it and the name of the file or folder?

I don't think it's possible. The storage doesn't know who is accessing to it. The owner of the storage is unreliable because the external storages don't have owner or the owner is considered to be he active user (so it could change when another user access the storage); I think we'll end up showing misleading information.
For the path. I don't think it's an useful information. You should know where you're accessing already. In addition, adding a log there could spam the log too much.

@cdamken
Copy link
Contributor

cdamken commented Nov 9, 2022

Ok, then we need that @mdusher tests our patch

@mdusher
Copy link
Contributor Author

mdusher commented Nov 11, 2022

@cdamken

We are hesitant to apply this patch as it seems to have a bit of risk associated with it. As per @jvillafanez's note on the pull request:

Note that file metadata such as comments and tags will be either removed or disconnected because the file will be removed. If the file regains the read permission at a later point, it will be considered a new file with a new fileid.

If I understand this correctly, with the #40480 patch applied - if the storage was to become temporarily unavailable and this condition was met, the metadata would be removed from the filecache. This would result in shares becoming disconnected from the file and eventually being removed which would be a huge support overhead for us should we be required to restore them.

As I've mentioned previously, the problem we would like to see resolved is the XML format becoming invalid due to exceptions being thrown in the streamed response and this still occurs with the patch applied.

@DeepDiver1975
Copy link
Member

There should be already an config option to set the streaming option off (I'm not sure if we have an option or it's hardcoded). If the streaming is off, all the data needs to be available in the server before sending it to the client. This leads to a very slow response time for large responses, specially with an infinity-depth propfind because the server need to check all the files. However, if an error happens, since no data has been sent yet, a proper error response could be sent.

I think the sabreDav should have a plan on what to do with the errors that happens in the response if streaming is active. I quite convinced that we depend on what sabreDav provides to deal with this issue, if it provides anything.

It is quite known that in error cases the streamed propfind response is useless. Has bee discussed quite often and agreed on 'being okay' as the clients will fall back to non-infinite-depth propfind mode.
Not the coolest solution I ever built. In an ideal world this should be much nicer.

What would need to be done is that errors are caught and added to the propfind response with the correct http status code.
And yes this might be something we need to add to sabre itself.

Needs some analysis .....

@IljaN
Copy link
Contributor

IljaN commented Nov 14, 2022

Did some digging in sabre/dav. Somehow I can't step in to this method:
https://github.com/sabre-io/dav/blob/7cbc3156c9d9f29b38a6d844bd5ee88e047bcf58/lib/DAV/CorePlugin.php#L327

Also receiving "Can not traverse a closed generator" exception with a debugger attached which leads to a 500. Maybe XDebug evaluates generators accidentally?

@jvillafanez
Copy link
Member

It seems we need to deal with the exception inside the generator (inside https://github.com/sabre-io/dav/blob/ee802da2ad998be6c6773ddcb7d4700cc7dc27a7/lib/DAV/Server.php#L959).

As as simple example:

<?php

function gen2() {
  for ($i=0;$i<10;$i++) {
    if ($i === 5) {
      throw new Exception('ooppseee!!!!');
    } else {
      yield $i;
    }
  }
}

$ff = gen2();
$ff->rewind();
while($ff->valid()) {
  $key = $ff->key();
  $value = $ff->current();

  echo "$key -> $value\n";
  try {
    $ff->next();
  } catch (Exception $e) {
    echo "ignoring {$e->getMessage()}";
  }
}

the code runs until the number 5 (echoing the exception message) but it doesn't proceed with the rest of the numbers, so when the generator leaks the exception it seems it's automatically closed and the valid method returns false.
So if the generator throws an exception, then it's finished and there aren't any additional result to get from the generator.

Note that even jumping to the next item by calling the next method twice or more doesn't change anything because the valid method will still return false.

I've also checked that there is no value returned from the key and current methods after the exception is triggered (in case we bypass the valid check somehow and try to get more data from the generator).

Taking this limitation into account, a "classic" foreach loop (as it's already done) will be better because we can't handle the exception and the loop would break anyway.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 10 days if no further activity occurs. Thank you for your contributions.

@github-actions
Copy link

This issue has been automatically closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants