List of >230 file extensions in plain JSON format

I've collected over the last year some 230 file extensions and manually curated their descriptions so that whenever I find a file extension, it becomes possible to give the end-user a slight idea about what the extension is about.


Most of my code nowadays is written in Java but there is interest in porting some of this information to web apps. So I have exported a JSON list that you are welcome to download and use in your projects.

The list is available on GitHub at this link.

One thing to keep in mind is that I'm looking at extensions from a software developer perspective. This means that when the same extension is used for different programs, I usually favor the programs related to programming.

The second thing is that I collect more information about file extensions than the info you find on this JSON list. For example, I populate for each extension the applicable programming languages. Here is an example for .h source code files. Other values include information if the data is plain binary or text readable, the category to which the extension belongs (archive, font, image, sourcecode, ..) and other meta data values that are useful for file filtering and processing.


If you need help or would like to suggest something to improve the list, just let me know.

Updating the header and footer on static web sites using Java

This year was the first time that I've moved away from websites based on Wordpress, PHP and MySQL to embrace the simplicity of static HTML sites.

Simplicity is indeed a good reason. It means virtually no exploits as there is no database nor script interpretation happening. It means speed since there are no PHP, Java nor Ruby scripts running on the server and only direct files are delivered. The last feature that I was curious to try is the site hosting provided by Github, which is only supporting static web sites.

The first site to convert was the TripleCheck company site. It had been developed over a year ago and lagged a serious update. Was based on Wordpress and wasn't easy to make changes on the theme or content. The site was quickly converted and placed online using Github.

However, not all are roses with static websites. As you can imagine, one of the troubles is updating the text and links that you want to see on each page of the site. There are tools such as Jekyll that help to maintain blogs, but all that was needed here was a simple tool that would pick the header and footer tags to updated with whatever content was intended.

Easy enough, I've wrote a simple app for this purpose. You can download the binaries from this link and the source code is available at https://github.com/triplecheck/site_update/


How to get started?

Place the site_update.jar file inside the folder where your web pages are located. Then copy also the html-header.txt and html-footer.txt files and write inside the content you'd want to use as header and footer.

Inside the HTML pages that you want to change, you need to include the following tags:
<header></header>
<footer></footer>

Once you have this ready, from the command line run the jar file using:
java -jar site_update.jar

Check your HTML pages to see if the changes were applied.


What happens when it is running?

It will look for all HTML files with .html extension that are found on the same folder where the .jar file is located. For each HTML file it will look for the HTML tags that were mentioned above and replace whatever is placed between them, effectively updating your pages as needed.

There is an added feature. If you have pages on a sub-folder, this software will automatically convert the links inside the tags so that they keep working. For example, a link pointing to index.html will be modified to ../index.html and this way preserve the link structure. This is done also for images.

An example where this program used can be found at the TripleCheck website, whose code you find available on Github at https://github.com/triplecheck/triplecheck.github.io


Feedback, new features?

I'd be happy to help. Just let me know on the comment box here or write a post on Github.





List of 310 software licenses in JSON format

I've recently needed a list of licenses to use inside a web page. The goal was presenting the end-user with a set of software licenses to choose from. However, couldn't find one readily available as a JSON or some kind of format to be embbeded as part of Javascript code.

So I've created such a list, based on the nice SPDX documentation. This list contains 310 license variations and types. I'm explicitly mentioning "types" because you will find licenses called "Proprietary" to define some sort of terms that are customized and a "Public domain" type, which is not a license per se but in practice denotes the lack of an applicable license since copyright (in theory) is not considered as applicable for them.

In case you are ok with these nuances, you can download this json list from https://github.com/triplecheck/engine/blob/master/run/licenseList.js

The list was not crafted manually, I've wrote a few lines of Java code to output the file. You find this file at https://github.com/triplecheck/engine/blob/master/src/provenance/javascript/OutputLicenseList.java

If you find the list useful and have feedback or need an updated version, just let me know.





SSDEEP in Java

If you are familiar with similarity hashing algorithms (a.k.a. fuzzy hash matching) and need an SSDEEP implementation in Java code, it is available directly from my Github account at this location: https://github.com/nunobrito/utils/tree/master/Utils/src/utils/hashing/ssdeep

The original page for SSDEEP can be found at http://ssdeep.sourceforge.net/

On that page you find also the binaries for Windows.

Have fun.