Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copyright Header not applied to files with the word copyright in the first ten lines of the file #47

Open
menge101 opened this issue Aug 7, 2018 · 6 comments

Comments

@menge101
Copy link

menge101 commented Aug 7, 2018

Hello,
I created a rake task for running the copyright header add in my project, and I was surprised to find that it was ignoring the file that contained my rake task. After some digging, I found this in the copyright-header code:

def has_copyright?(lines = 10)
   @contents.split(/\n/)[0..lines].select { |line| line =~ /(?!class\s+)([Cc]opyright|[Ll]icense)\s/ }.length > 0
end

So, this method return true on the presence of the word 'copyright' or 'license' in the first 10 lines of code, which is kind of weak for verifying that the license code is present.

Also, if the wrong license is present, but contains the words copyright or license, it will still return true.

I think my team is willing to do the work to make this more robust, but I also think some discussion is needed as to what the proper solution should be.

@osterman
Copy link
Collaborator

osterman commented Aug 7, 2018

Hey @menge101 ! Agree that it's very weak criteria. Not sure how to solve it in a more robust way. Also, I must confess it's been many years since I've taken a serious look at this code, so I would be pretty amenable to recommendations, should your team be able to help out with their implementation.

@menge101
Copy link
Author

menge101 commented Aug 7, 2018

I think the implementation we were thinking of would be to look for a match on the first line of the header file in the first n lines of every file.

Then if the first line is found, grab the following m lines, where m is the number of lines in the header, and do a full comparison against the header.

This would potentially add a new failure state, "Wrong Header", which I'm not sure if we'd want to then drop the correct header on top of it, and let the owner clean the bad header off, or just mark the file as "does not have the proper header" in some way.

@osterman
Copy link
Collaborator

osterman commented Aug 7, 2018

I think the implementation we were thinking of would be to look for a match on the first line of the header file in the first n lines of every file.

I think the reason I didn't do that was certain files in a repo may be licensed by another license. For example, sometimes repos have pulled in projects under a vendor/ folder that are under a different license. The risk is attempting to relicense something that one is not authorized to modify.

What about this simple/stupid approach: just make /(?!class\s+)([Cc]opyright|[Ll]icense)\s/ an optional argument. This way, the caller can use a more precise regex to avoid the issue you're having?

@menge101
Copy link
Author

menge101 commented Aug 8, 2018

I think the reason I didn't do that was certain files in a repo may be licensed by another license. For example, sometimes repos have pulled in projects under a vendor/ folder that are under a different license. The risk is attempting to relicense something that one is not authorized to modify.

Right, so in this case, rather than add the copyright header I would puts something like "SKIP project/vendor/file.rb WRONG LICENSE" and make no change to the file. I guess this would require keeping something like the existing implementation to determine that a license of some sort exists.

As is, I wouldn't describe this tool as being designed with multi-license scenarios in mind. It feels off, to me, to make a decision with that in mind. Also, being mindful of the fact that the caller specifies the paths to use, so they can just not include their vendored libraries in that set of paths.

I do like the idea of making the regex an argument, it makes me wonder about the idea of passing a proc or lambda or maybe even just taking a block in order to specify logic rather than just a regex.

I think aiming toward flexibility without being overly complicated is the right design goal. I'll take a look at how just making the regex an argument affects our problem, and if that is "good enough" then maybe we'll just leave it at that.

@osterman
Copy link
Collaborator

I do like the idea of making the regex an argument, it makes me wonder about the idea of passing a proc or lambda or maybe even just taking a block in order to specify logic rather than just a regex.

So long as there's an easy way to also pass a simple regex via command-line, I don't mind if you generalize it.

@nathan-menge-arcadia
Copy link

nathan-menge-arcadia commented Oct 19, 2018

FYI, I still am planning to work on this, its on the backlog, but as soon as words "we have time to do this" were uttered several things jumped up to consume our time.

(Just noticed I posted using my employer github account, rather than my personal one, but I am the same person as the original issue writer)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants