Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 24 additions & 10 deletions .github/bin/free-disk-space.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ function free_up_disk_space_ubuntu()
'google-chrome-*'
'libmono-*'
'llvm-*'
'man-db'
'mysql-server-core-*'
'powershell*')

Expand All @@ -22,16 +23,29 @@ function free_up_disk_space_ubuntu()
sudo apt-get autoclean -y

echo "Removing toolchains"
sudo rm -rf \
/usr/local/graalvm \
/usr/local/lib/android/ \
/usr/share/dotnet/ \
/opt/ghc/ \
/usr/local/share/boost/ \
"${AGENT_TOOLSDIRECTORY}"

echo "Prune docker images"
sudo docker system prune --all -f
directories_to_be_removed=(
"/usr/local/graalvm/"
"/usr/local/lib/android/"
"/usr/share/dotnet/"
"/opt/ghc/"
"/usr/local/share/boost/"
"${AGENT_TOOLSDIRECTORY}")

delete_directories_with_rsync "${directories_to_be_removed[@]}"

echo "Prune docker images"
sudo docker system prune --all -f
}

function delete_directories_with_rsync()
{
sudo mkdir /tmp/empty
for dir in "$@"; do
echo "Deleting contents of $dir using rsync"
sudo rsync --delete -a /tmp/empty/ "$dir"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would we want to use rync when all we want is rm?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the number of files in a directory is large, them rm -rf might take sometime, which leads to increase in the execution time crossing the limit i.e 15mins

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rsync --delete -a /tmp/empty/ "$dir" is more of mimicking rm -rf

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would rsync be faster than rm? is it because it does different system calls, it runs things in threads (good for networks), or?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked the ai and give context "work with data heavy systems", it answer:

Because you work with data-heavy systems (databases, lakes, large file trees) you’ll likely encounter directories with very many files and subdirectories. In such cases:
	•	Using rm -rf triggers lots of individual filesystem operations (unlink, rmdir) in what may be a sub-optimal order; the overhead (and metadata cost) becomes dominant.
	•	Using rsync -a --delete (with an empty source) shifts the pattern: rsync walks the destination tree, determines all items present, then issues deletions in a more efficient pattern (for many files) which tends to reduce metadata churn and avoid worst-case filesystem behaviour.
	•	Thus for very large file trees, rsync may “appear” much faster in wall-time, even though conceptually you’re still deleting everything.
	•	Given your context (large datasets, perhaps S3 or other object stores, etc) you may adapt: for local filesystem large-scale deletion this trick is a practical tool.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my ai insist there is no system call to delete files in bulk... either can be wrong (or both)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't delete them in bulk - In short we found rm -rf on android libs were taking some time more than 15 mins which caused the failure so was exploring rsync which is to used to sync directories and we tweak the arguments a bit so as to support delete - From the rmdir executed successfully we could assert the directory is empty which means the files are deleted - The disk usage was reduced from 75% to 40% like always

sudo rmdir "$dir"
done
sudo rmdir /tmp/empty
}

echo "Disk space usage before cleaning:"
Expand Down