Java: pluralizer

Every now and then one needs to output quantities in plural and singular forms. In English language it is pretty much straightforward, just add an "s" to the end and you get a plural.

However, doing it programatically adds up a few lines of code that tend to make things less elegant (and simple) than they ought to be. For example, when listing the number of files inside a folder it is annoying to see a text saying "1 files", knowing that this is not grammatically correct.

To solve these cases, I've wrote a simple method.

    /**
     * This method simplifies showing values with associated terms when they
     * occur either in plural or singular manner. For example, solves the issue
     * of output "1 files" onto the correct "1 file"
     * @param value The value to output
     * @param text The text that will be "pluralized"
     * @return The pluralized text
     */
public static String pluralize(int value, String text){
   if(value == 1){
      return value + " " + text;
   }else{
    return value + " " + text + "s"; 
   }
}

As you can see, very simple code. From there I can rest assured that the correct form will be used according to the value that is used.

Java RegEx: detecting copyright string inside source code files

Recently, one of my goals was to detect and index the copyright notices that can be found inside source code files.

This copyright notice is helpful to automatically get a first idea about the people that were involved in developing a given portion of code and can be considered as copyright holders. It is part of the the work with the SPDX report generation tool that you find at http://triplecheck.de/download

Detecting copyright notices is not an easy task. There exist a myriad of different combinations and variations to consider. Nevertheless, it was needed to start from some point and was decided to attempt detecting common cases, such as "Copyright (c) 1981-2014 Nuno Brito".

After some testing, this is the regular expression that was used:

String patternString = ""
             + "(\\((C|c)\\) |)"    // detect a (c) before the copyright text
             + "(C|c)opyright"      // detect the copyright text
             + "( \\((C|c)\\)|) "   // sometimes with a (c)
             + "([0-9]|)"           // optionally with the year
             + "+"                 
             + "[^\\n\\t\\*]+\\.?";
It can detect the following cases:
Copyright (C) 2006-2014 Josefina Jota
Copyright (c) 2012 Manel Magalhães
Copyright (C) 2003 by Tiago Tavares <tiago@tavares.pt>
Copyright (C) 1993, 1994 Ricardo Romão <ricardo@romão.pt>
(C) Copyright 2000-2013, by Oscar Alho and contributors.

It is not perfect. There is no support for cases where the copyright credits extend for more than a single line nor for the cases where "copyright" is not even used as identifiable keyword. Last but not least, there are false positives that I already noted, such as:
copyright ownership.
copyright notice


Currently I don't have a better solution other than specifically filtering out these false positives.

You find the working code in Java at https://github.com/triplecheck/reporter/blob/master/tool.iml/run/triggers/CopyrightDetector.java

And you find a simple test case for the regular expression at https://github.com/triplecheck/reporter/blob/master/tool.iml/test/trigger/TestTriggerCopyright.java

This detection could certainly be improved and the code is open source. Suggestions are welcome. :-)






Windows: single-line command to download and install software

I noted that users of Linux and OSX are sometimes greeted with a very nice feature. Sites like Bowery present on the front page a nice command line code that downloads their software and gets it running immediately.

This is great, however, this was the code provided for Windows:

curl -O download.bowery.io/downloads/bowery_2.1.0_windows_amd64.zip && sudo unzip bowery_2.1.0_windows_amd64.zip -d /usr/local/bin

If you're a Windows developer, you likely notice the above code gets stuck right on the first part of the code simply because "curl" is a command that is not available by default on Windows.

How can this work under Windows?

I was curious and decided to find a way of doing the same thing using only internal Windows commands. Took some digging but discovered bitsadmin to be a somewhat equivalent tool for this task.

And the one-line command that can be run from a Windows command prompt is:
bitsadmin /transfer t http://triplecheck.de/launch %temp%\x.bat&%temp%\x.bat

What does this code do?

bitsadmin is great because it comes inside any Windows machine since 2000 and above. There is a drawback, it is considerably slow. The first line of command will download a batch script and run this batch from the temporary folder.

The first action by the batch script is to create the needed folders (at c:\triplecheck) and then download wget.exe as the default downloader. This is a single and small sized executable that will speed-up the download process.

Then, we get the software. I didn't had much time to implement a way of extracting zip files under Windows from the command line and so decided to use the default cabinet archive format (.cab) for all packaging. My software runs on Java so I've added some checks to verify if there was Java available on the machine or simply download the Java runtimes from my own server.

At this point must say that the independence of Java tastes really great. Just download, unpack and Java is available. After all these steps are done, the script will download a shortcut that I created earlier and places this shortcut on the user desktop for his convenience when launching the tool.

Everything is finished by opening an Explorer window on the newly created folder and starting up the tool.

The full script code can be found at  http://triplecheck.de/launch


What are the advantages?

On my case this provides a one-line command to automatically download and deploy my software on Windows. It is lightning fast, if you already have Java installed then the whole process gets concluded in some 10 seconds on my machine and this is something impressive.


Disadvantages?

I'm using wget.exe and this will be a problem for certain Anti-virus which might not enjoy the fact that a downloader executable gets inside the system. A possible improvement is checking if wget.exe was in fact permitted to stay on the end-user's folder. If it was removed, then revert to bitsadmin as default downloader.

This bitsadmin tool is marked as deprecated, this is actually something that I don't find so often in the Windows world. Very surprised (and disappointed) to read the message. It seems that from Windows 2000 to Windows 8 machines will be possible to run this tool.

Does not run on Windows RT. The installation script does not take into consideration the newish tablets with Windows RT. Therefore the Java runtimes will not work on devices with an ARM processor. The script could be improved but not so many folks use WinRT for this kind of work.

Cabinet files are used, instead of standard zip files. It is possible to later improve this script for enabling the built-in Windows zip extraction but this wasn't something readily available. The drawback is having two maintain two different sets of archives when distributing my software.



Hope you find this useful.











Java: Sorting an hashmap according to its value

I had an HashMap composed with an object and an Integer value associated. By design, hashmaps are not ordered. Eventually, found around the web a nice method and modified the code to ensure it could be generic and ready to use with any kind of object.

Below you find the method ready to use. Attention to the copyright assignment (thanks WikiJava) where I'm referring the source from where the code derives.

 /**
     * Sort an hashmap according to its value.
     * @origin http://wikijava.org/wiki/Sort_a_HashMap
     * @date 2011-05-28
     * @modified http://nunobrito.eu
     * @date 2014-04-04
     */
private Map sortHashMap(HashMap input){
Map<Object,Integer> map = new LinkedHashMap<Object,Integer>();
List<Object> yourMapKeys = new ArrayList<Object>(input.keySet());
List<Integer> yourMapValues = new ArrayList<Integer>(input.values());
TreeSet<Integer> sortedSet = new TreeSet<Integer>(yourMapValues);
Object[] sortedArray = sortedSet.toArray();
int size = sortedArray.length;
for (int i=size-1; i>-1; i--) {
map.put
(yourMapKeys.get(yourMapValues.indexOf(sortedArray[i])),
(Integer) sortedArray[i]);
}
return map;
}

Inside your code, you can use the snippet below. Attention that "FileLanguage" is the name of my object, you can replace this with a String or any other object you wish.

// sort the result
Map<Object,Integer> map = sortHashMap(statsLanguagesFound);
// show the ordered results
for(Object langObj :map.keySet()){
FileLanguage lang = (FileLanguage) langObj;
int count = map.get(lang);
System.out.println(lang.toString() + " -> " + count);
}


The end result is the following:

File: busybox-1.21.1.spdx
C -> 796
UNSORTED -> 674
SCRIPT_LINUX -> 19
HTML -> 9
PERL -> 3

File: flyingsaucer-R8.spdx
JAVA -> 4
UNSORTED -> 1

File: jfreechart-1.0.16.spdx
JAVA -> 1035
UNSORTED -> 57
HTML -> 48


Hope you find it useful.  :-)

Java: counting how many times a string is repeated

Recently I needed to find a simple way to count how many times a specific keyword is repeated inside a large text. A regular expression would be possible but (besides the complication), it is very slow on large text files (>60 000 lines).

The solution, a very simple code that is crude but works with good enough performance:
int counter = (text.length() - text.replace(keyword, "").length()) / keyword.length();

Not intensively tested but functional.

Hope it helps you.