Skip to content

Conversation

Subham-KRLX
Copy link
Contributor

Yes I have read the Contributor Rules and Contributor Development Guide

Yes and the PR name follows the format [GH-1995] Add pre-commit hook to auto-insert Apache license headers in shell scripts. Closes #1995

This PR introduces a pre-commit hook configuration to automatically insert the Apache License header in all shell script files (*.sh). It includes:

Addition of the shell license insertion hook in .pre-commit-config.yaml utilizing the existing Lucas-C/pre-commit-hooks insert-license hook.

Removal of inconsistent license headers and TODO license comments from current shell scripts.
Adjustment to insert the license header immediately after the shebang line to maintain script correctness.
Verification that the license header does not duplicate on multiple runs.
Updates to various shell scripts to comply with the licensing standard.
Locally ran pre-commit run insert-license --all-files ensuring no duplicate headers and correct placement.
Manual verification of header insertion after shebang in affected shell script files.
Verified all other pre-commit hooks pass successfully.
Committed and pushed the changes to the feature branch allowing GitHub Actions to validate build and tests.

No this PR does not affect any public API so no need to change the documentation.

@Subham-KRLX
Copy link
Contributor Author

I am working on adding a pre-commit hook to insert Apache license headers in shell scripts but the insert-license hook keeps changing docker/install-sedona.sh docker/install-spark.sh and some Scala files with TODO license comments causing a commit loop. I removed TODO lines and old license parts ran the hook and fixed EOF but the problem continues with the same files failing. Please advise if there is a process I missed or if maintainer help is needed.

@jbampton
Copy link
Member

You can exclude files from being checked.

See the exclude key for pre-commit. Example:

exclude: ^docs/index\.md$|^\.github/pull_request_template\.md$|\.github/issue_template\.md$|^docs/blog/.*\.md$

@Subham-KRLX
Copy link
Contributor Author

Thanks for your suggestion about using the exclude key in .pre-commit-config.yaml I tried adding exclude patterns but insert-license hook still modifies the same files causing commit loops with TODO license comments .Is there anything else I should do to exclude these files properly and stop the repeated changes.

@jbampton
Copy link
Member

Normally when you get a license TODO that means the licenses were not inserted properly for those files.

@Subham-KRLX Subham-KRLX force-pushed the add-shell-license-header branch from 719d4c8 to 3863962 Compare July 27, 2025 04:01
@Subham-KRLX
Copy link
Contributor Author

Reopened the PR with all required updates. License headers are now correctly inserted in shell scripts without duplication. .gitignore and affected script files have been cleaned up. Thanks for the guidance John.

Co-authored-by: John Bampton <[email protected]>
@jbampton
Copy link
Member

The pre-commit workflow is failing for the insert-license hook for the shell scripts. I can see the problem and it is very subtle. Have a look at the next two example files:

This first link is the correct license template and you will see in this that there are 7 lines at the top and 6 lines at the bottom and in between is the "apache url":

https://github.com/apache/sedona/blob/master/.github/workflows/license-templates/LICENSE.txt

So then look at:

https://github.com/apache/sedona/blob/master/docker/install-sedona.sh

this is one of the shell scripts that is failing with a "TODO" license. So you can see that in this file the license header has 6 lines at the top and 5 lines at the bottom between the "apache url".

So the license headers are not the same.

Which means to fix this issue you need to go to all the shell scripts that have failed with "TODO" licenses and remove all the TODO licenses and then try running pre-commit again and it should insert the correct license into the shell scripts.

Otherwise you can manually insert the correct license template in to the shell files and then run pre-commit to make sure the new shell license hooks passes.

@Subham-KRLX
Copy link
Contributor Author

Thank you for pointing out the subtle difference in the license headers. I will go through all the shell scripts showing TODO licenses and remove those headers so pre-commit can add the correct template as per LICENSE.txt. If any issues persist I will update the headers manually to match exactly. Appreciate your help!

@Subham-KRLX Subham-KRLX requested a review from jbampton July 29, 2025 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pre-commit: auto insert license headers for Shell files
2 participants