Java: RandomAccessFile + BufferedReader = FileRandomReadLines

In the Java world when reading large text files you are usually left with two options:
  1. RandomAccessFile
  2. BufferedReader

Option 1) allows to read text from any given part of the file but is not buffered, meaning that it will be slow to read lines.

Option 2) is buffered, therefore fast but you need to read each line from the beginning of the text file until you reach where you want to really read data.

There are strategies to cope with these mutually exclusive options, one is to read data sequentially, another option is to partition data into different files. However, sometimes you just have that case where you need to resume some time consuming operation (think on a scale of days) where billions of default sized lines are involved. Neither option 1) nor option 2) will suffice.

Up to this point I was trying to improve performance, remove any IF's and any code that could squeeze a few more ounces of speed but the problem remained the same: we need an option 3) that mixes the best of both options. There wasn't one readily available that I could find around the Internet.

In the meanwhile I have found a hint that might be possible to feed a BufferedReader directly from a RandomAccessFile. Tested this idea and was indeed possible, albeit still with some rough edges.

For example, if we are already reading data from the BufferedReader and decide to change the file position on the RandomAccessFile object, the BufferedReader will get erroneous data on the buffer. The solution that I've applied is to simply re-create a new BufferedReader, forcing the buffer to be reset.


Now, I'm making available the code that combines the these two approaches. You find the RandomAccessFile class at https://github.com/nunobrito/utils/blob/master/Utils/src/utils/ReadWrite/FileRandomReadLines.java

Has no third-party dependencies, you are likely fine by just downloading and including it on your code. Maybe there is already similar implementation elsewhere published before, I didn't found one and tried as much as possible to find some ready-made code.

If you see any improvements possible, do let me know and I'll include your name on the credits.

No comments:

Post a Comment