This copyright notice is helpful to automatically get a first idea about the people that were involved in developing a given portion of code and can be considered as copyright holders. It is part of the the work with the SPDX report generation tool that you find at http://triplecheck.de/download
Detecting copyright notices is not an easy task. There exist a myriad of different combinations and variations to consider. Nevertheless, it was needed to start from some point and was decided to attempt detecting common cases, such as "Copyright (c) 1981-2014 Nuno Brito".
After some testing, this is the regular expression that was used:
String patternString = "" + "(\\((C|c)\\) |)" // detect a (c) before the copyright text + "(C|c)opyright" // detect the copyright text + "( \\((C|c)\\)|) " // sometimes with a (c) + "([0-9]|)" // optionally with the year + "+" + "[^\\n\\t\\*]+\\.?";It can detect the following cases:
Copyright (C) 2006-2014 Josefina Jota
Copyright (c) 2012 Manel Magalhães
Copyright (C) 2003 by Tiago Tavares <tiago@tavares.pt>
Copyright (C) 1993, 1994 Ricardo Romão <ricardo@romão.pt>
(C) Copyright 2000-2013, by Oscar Alho and contributors.
It is not perfect. There is no support for cases where the copyright credits extend for more than a single line nor for the cases where "copyright" is not even used as identifiable keyword. Last but not least, there are false positives that I already noted, such as:
copyright ownership.
copyright notice
Currently I don't have a better solution other than specifically filtering out these false positives.
You find the working code in Java at https://github.com/triplecheck/reporter/blob/master/tool.iml/run/triggers/CopyrightDetector.java
And you find a simple test case for the regular expression at https://github.com/triplecheck/reporter/blob/master/tool.iml/test/trigger/TestTriggerCopyright.java
This detection could certainly be improved and the code is open source. Suggestions are welcome. :-)
 
No comments:
Post a Comment