Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for large files in CircleCI #1142

Merged
merged 3 commits into from
May 22, 2024
Merged

Check for large files in CircleCI #1142

merged 3 commits into from
May 22, 2024

Conversation

matthew-white
Copy link
Member

@matthew-white matthew-white commented May 8, 2024

This PR makes progress on getodk/central#672. It updates CircleCI to check for large files. Any large files will be listed.

What has been done to verify that this works as intended?

CircleCI passes currently, but if I reduce the limit from 1 MB to 100 kB, then it fails. See https://app.circleci.com/pipelines/github/getodk/central-backend/2186/workflows/e2e9689b-4e3a-4c28-80bd-392837ccb9d3/jobs/3046 for an example of that.

Why is this the best possible solution? Were any other approaches considered?

I'll add a few line comments.

Before submitting this PR, please make sure you have:

  • run make test and confirmed all checks still pass OR confirm CircleCI build passes
  • verified that any code from external sources are properly credited in comments or that everything is internally sourced

Comment on lines 17 to 19
set +e
find . -size +1000000c -not -path './.git/*' | grep .
(( $? == 1 ))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find will output matching files. grep . will match any file, so if find outputs any files, grep will return with an exit status of 0. We want the opposite though: we want there to be no matching files. In that case, grep returns with an exit status of 1. In other words, we want the command as a whole to succeed only if grep returns 1.

It looks like CircleCI runs the command with -e, which would cause the command to fail if grep returned with an exit status of 1. That's why I run set +e.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(( $? == 1)) seems unidiomatic compared to e.g. [[ $? -eq 1]].

An alternative to using grep might be:

find . -size +1000000c -not -path './.git/*' \
  -execdir bash -c '(echo "Big file(s) found: $@" && exit 1)' -- {} + ;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've replaced [[ ... ]] with (( ... )). 👍

I feel less sure about -execdir bash .... At least to me, it doesn't feel much clearer than grep.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1M looks (roughly) equivalent to +1000000c, and may be easier to read.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my first choice as well and worked for me locally. You can see +1Mc in the first commit above. However, I saw this message in CircleCI:

find: Invalid argument `+1Mc' to -size

@matthew-white matthew-white requested a review from ktuite May 8, 2024 21:16
@alxndrsn
Copy link
Contributor

alxndrsn commented May 14, 2024

Would it be helpful to add this as a make target or separate script file? It could then optionally be included in other targets, e.g. make test.

@matthew-white
Copy link
Member Author

Would it be helpful to add this as a make target or separate script file? It could then optionally be included in other targets, e.g. make test.

That sounds reasonable to me. 👍 Would that work OK with your local testing of S3 things?

@alxndrsn
Copy link
Contributor

alxndrsn commented May 15, 2024

Would it be helpful to add this as a make target or separate script file? It could then optionally be included in other targets, e.g. make test.

That sounds reasonable to me. 👍 Would that work OK with your local testing of S3 things?

Good point. Here's a possible alternative that only checks files tracked by git:

! (git ls-files | xargs du -b | awk '$1 > 1000000 { printf("%s %.2f MB\n", $2, $1/1048576) }' | grep .)

Example output when threshold set to 100,000 bytes, and showing final status code:

$ ! (git ls-files | xargs du -b | awk '$1 > 100000 { printf("%s %.2f MB\n", $2, $1/1048576) }' | grep .); echo $?
docs/api.yaml 0.52 MB
package-lock.json 0.76 MB
test/integration/api/datasets.js 0.21 MB
test/integration/api/entities.js 0.15 MB
test/integration/api/odata.js 0.10 MB
test/integration/api/submissions.js 0.23 MB
1

@alxndrsn
Copy link
Contributor

grep and negation can also be avoided:

git ls-files |
  xargs du -b |
  awk '
    $1 > 1000000 {
      found=1;
      printf("%s %.2f MB\n", $2, $1/1048576)
    }
    END { exit found }
  '

@matthew-white
Copy link
Member Author

Interesting! I think it'd be good to get something merged, so I'm going to go ahead and merge what's been approved here. Feel free to follow up with another PR though.

@matthew-white matthew-white merged commit fecb067 into master May 22, 2024
5 checks passed
@matthew-white matthew-white deleted the large-file-check branch May 22, 2024 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants