brito: Java: Reading millions of text lines at top speed

There is one thing to say about Java (as a platform), its performance while reading I/O from the disk using the default classes is impressive.

I'm sharing code to read as fast as possible the lines from a large-sized text file from disk and to process each line through a custom method.

The implementation is very simple, has no external dependencies. You find the code for download at this link on github.

btw. The above link will get you the most up-to-date version of the source code file.

Performance on my laptop (i7 CPU, 8Gb RAM, 500Gb HDD, Linux) is measured with a text file containing ~30 million lines of text (around 300 characters each) that is read under 2 minutes.

A practical example of the code in real-world is at this link. Basically, just copy the class to your project and then use "extends FileReadLines". Let the IDE create the needed methods and off you go to process each line or adapt the progress messages.

That's it.

While it is true that anyone can use bufferedReader on its own. The fact is that I found myself repeating these kind of things, therefore created a library to keep this code on a single location. Hope you find it useful if you're struggling to tackle large scaled flat-files.

To the best of my knowledge, this is the fastest possible way of reading a massive number of lines that uses nothing more than the Java platform.

If you have suggestions on how to reach faster results, please place them on the comments box and I'll update this post accordingly. My thanks in advance.

Also, feel free to change the code on GitHub as you see fit.

brito

Java: Reading millions of text lines at top speed

No comments:

Post a Comment

do you like this blog?