Slow LIST performance with mountpoint #945

bradyforcier1 · 2024-07-17T13:25:28Z

Tell us more about this new feature.

Background

Testing with latest version v1.7.2, I've noticed the performance of LISTs is significantly slower than other methods.

Test Setup

Test setup is recursively listing a prefix hierarchy with ~16,000 objects total

Mountpoint command: sudo mount-s3 --read-only --allow-other --max-cache-size 50000 --cache /tmp/mtpt_cache --metadata-ttl 300 $BUCKET /tmp/mtpt_test
goofys command: sudo /usr/local/bin/goofys --type-cache-ttl 60s --stat-cache-ttl 60s --file-mode 0555 --dir-mode 0555 -o ro -o allow_other $BUCKET /tmp/goofys_test

awscli

time aws s3 ls --recursive s3://$BUCKET/$PREFIX
real	0m4.577s
user	0m3.085s
sys	0m0.128s

mountpoint (caching is enabled, but it seems like LIST responses aren't cached so subsequent lists are still slow)

time find /tmp/mtpt_test/$PREFIX -type f
real    0m38.283s
user    0m0.011s
sys 0m0.079s

goofys

time find /tmp/goofys_test/$PREFIX -type f
real    0m2.333s
user    0m0.006s
sys 0m0.053s

The text was updated successfully, but these errors were encountered:

monthonk · 2024-07-24T14:13:24Z

Hey, I can confirm that Mountpoint doesn't cache any LIST responses today, so every readdir operation would go directly to S3. The metadata cache is mainly used for lookup operation. I didn't expect the result to be this much worse though, because we may need to do readdir only once for each directory while traversing through them.

It would be really helpful to understand access pattern of the find command so we will need debug logs from your test. Also, I would like to understand more about the structure of your bucket, like how many levels of subdirectory are under the test prefix. Could you share more information about that?

bradyforcier1 · 2024-07-24T16:45:51Z

so we will need debug logs from your test

I would not be comfortable sharing the debug logs as they will contain details about the bucket names/path which may contain sensitive data. But the command is just using the bash find utility to report all files recursively underneath a prefix

Also, I would like to understand more about the structure of your bucket, like how many levels of subdirectory are under the test prefix. Could you share more information about that?

In this case, the content of the root prefix we're recursively listing looks like:

   ├── prefix1
    │   ├── a
    │   ├── b
    │   └── prefix1.1
    │       ├── a
    │       ├── b
    │       ├── c
    │       ├── d
    │       ├── e
    │       ├── f
    │       └── g
   ├── prefix2
   ...

Where there are ~100 prefixes and each prefix contains ~145 objects spread across the subdirs. In this test there were a total of 1900 prefixes traversed

monthonk · 2024-07-25T09:55:00Z

Thanks for sharing the structure. Seems like the problem will show up only when there is a lot of prefixes to traverse since I didn't face the same issue when trying to reproduce it with a few subdirectories. I will bring it back to the team and find out how we can test this and make directory listing more performant.

bradyforcier1 added the enhancement New feature or request label Jul 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow LIST performance with mountpoint #945

Slow LIST performance with mountpoint #945

bradyforcier1 commented Jul 17, 2024 •

edited

Loading

monthonk commented Jul 24, 2024

bradyforcier1 commented Jul 24, 2024 •

edited

Loading

monthonk commented Jul 25, 2024

Slow LIST performance with mountpoint #945

Slow LIST performance with mountpoint #945

Comments

bradyforcier1 commented Jul 17, 2024 • edited Loading

Tell us more about this new feature.

Background

Test Setup

monthonk commented Jul 24, 2024

bradyforcier1 commented Jul 24, 2024 • edited Loading

monthonk commented Jul 25, 2024

bradyforcier1 commented Jul 17, 2024 •

edited

Loading

bradyforcier1 commented Jul 24, 2024 •

edited

Loading