29

Mar

Harnessing the power of Java in CFML
 

A few days ago,  Zac asked in the Lucee forum about an efficient way to read the end of a large file [1]. This is especially useful when you want to see the most recent entries in a large log file. In *nix systems, you can use the tail command [2] to do this quite easily.  In CFML, it's easy to read a file from the top down, but reading just the end of it efficiently — that piqued my interest, so I decided to look at different implementations.

Since it seems like a common problem that many developers have faced before, I was pretty sure that I could find an existing solution for it. When looking for such a solution, I often start in the Java realm rather than the CFML world. After all, Java has tons of libraries that are free and open source, and since Lucee is written in Java and integrates so well with it, we can easily take those Java code libraries and call their methods from CFML.

As it turned out, the Apache Commons IO library has a class called ReversedLinesFileReader [3]. The name already sounded promising, so I decided to check it out. An added benefit of using that library is that it already ships with Lucee, so using it will work "out of the box", without requiring the download of any additional JAR files.

I wrote a few Java methods so that I could test them out. The one using ReversedLinesFileReader (see below) turned out to be much much faster than reading the file in a regular way and keeping the last n lines (if you choose to compile the code yourself, be sure to use Java 1.8 or later so that you can use String.join()).

/** Java implementation of tail() */
public static String tail(String filename, int pagesize) throws IOException {
	List<String> lines = new ArrayList();
        File file = new File(filename);
        ReversedLinesFileReader reader = new ReversedLinesFileReader(file);

	while (lines.size() < pagesize) {
		String line = reader.readLine();
		if (line == null)
			break;
		lines.add(line);
	}
	
        reader.close();
	Collections.reverse(lines);     
	return String.join("\n", lines);
}

Testing on my laptop with a 25MB file and 250,000 lines (I concatentated a bunch of log files together), the method above took 0.1ms, compared with 40ms when looping over the file from beginning to end and keeping only the last 10 lines. That's a whopping 1:400 ratio!

Now that I had a proof of concept, it was time to rewrite the Java code above in CFML, so that I could use it easily in Lucee without needing to deploy my Java class file. Rewriting it in cfscript was simple, and you can see the similarities between that version and original Java form:

/** cfscript implementation of tail() */
function tail(filename, pagesize=10) localMode=true {
	lines  = [];
	file   = createObject("java", "java.io.File").init(arguments.filename);
	reader = createObject("java", "org.apache.commons.io.input.ReversedLinesFileReader").init(file);

	while (lines.len() < arguments.pagesize) {
		line = reader.readLine();
		if (isNull(line))
			break;
		lines.append(line);
	}

	reader.close();
	lines = lines.reverse();
	return lines.toList(chr(10));
}

And now it was time to put it to the real test. I created a simple script that tested that function on a 10MB log file with over 60,000 lines, and compared it with fileRead() which reads the whole file, without even parsing it to lines:

filename = "#logsDirectory#/bots.log";

timer type="debug" label="tail()" {
	tail = tail(filename);
}

timer type="debug" label="fileRead()" {
	file = fileRead(filename);
}

The results show that the tail() function is at least 60 times faster (the time unit of cftimer is currently hardcoded to milliseconds, something that we should fix in the future [4]):

Timer

And by the way, if you want to run tail continously, like the tail -f command, the Commons IO library provides that functionality as well. See the class org.apache.commons.io.input.Tailer [5]

So the next time you are facing a problem that seems common enough that someone likely has solved it already, don't be afraid to look for Java solutions and integrate them with your CFML code.  Lucee makes it easy to harness the power of Java in your CFML applications.

 

[1] https://dev.lucee.org/t/reverse-cfloop-over-a-file/3670

[2] https://www.gnu.org/software/coreutils/manual/html_node/tail-invocation.html

[3] https://commons.apache.org/proper/commons-io/javadocs/api-2.6/org/apache/commons/io/input/ReversedLinesFileReader.html

[4] https://luceeserver.atlassian.net/browse/LDEV-1772

[5] https://commons.apache.org/proper/commons-io/javadocs/api-2.6/org/apache/commons/io/input/Tailer.html


Social Media

FOLLOW US