8

Mar

Easy Parallelism in Lucee
 

One approach for increasing your application performance in Lucee is to take advantage of multi-threading - this allows you to improve throughput and take full advantage of multi-core processors. In this part 1 we will take a look at the concept of "parallelism" and how it can work in Lucee. In part 2 we will cover a couple of potential pitfalls, and how correct concurrent code can solve them.

While the terms Concurrent and Parallel are used many times interchangeably, they do not mean exactly the same thing. According to Brian Goetz [1], Java Language Architect at Oracle and author of the book “Java Concurrency in Practice”, concurrency describes the ability of a program to access shared resources in a correct and efficient manner, while parallelism describes its ability to utilize more resources in order to solve a problem faster. Writing correct concurrent code is difficult and error prone; making parallel code correct is comparatively simpler and safer.

Lucee makes parallelism simple with the use of the Each() [2] built-in function, and its "cousins" ArrayEach(), StructEach(), and QueryEach(), and their respective member methods, which all take the arguments parallel and maxThreads. The concept is simple: pass a collection and a closure to Each(), and the closure will be called on each of the elements in the collection. Pass true for parallel, and the closure will be called in parallel by multiple threads, which will be joined at the end.

Let's take a simple example of a function that takes an element and processes it. For the sake of simplicity, we'll use sleep of a random interval in order to simulate a slow process like a file, network, or database operation:

tc = getTickCount();

function process(element) {

    var calledAt = getTickCount() - tc;
    sleep(randRange(80, 120));                    // simulate a slow process
    var completedAt = getTickCount() - tc;

    echo("<br>element: #arguments.element#; called-at: #calledAt#; completed-at: #completedAt#");
}

Now let's say that we have a collection with 20 elements that we want to process. Let's first build an array of elements:

elements = [];

for (i=1; i<=20; i++){

    elements.append(i);
}

And now that we have that collection, let's call process() on each element using the built-in function Each():

each(elements, process);

That produces an output like so:

element:    1;        called-at:    0;        completed-at:  105
element:    2;        called-at:  105;        completed-at:  204
element:    3;        called-at:  204;        completed-at:  323
element:    4;        called-at:  323;        completed-at:  427
element:    5;        called-at:  427;        completed-at:  540
element:    6;        called-at:  540;        completed-at:  637
element:    7;        called-at:  637;        completed-at:  720
element:    8;        called-at:  720;        completed-at:  813
element:    9;        called-at:  813;        completed-at:  911
element:   10;        called-at:  911;        completed-at: 1017
element:   11;        called-at: 1017;        completed-at: 1102
element:   12;        called-at: 1102;        completed-at: 1184
element:   13;        called-at: 1184;        completed-at: 1295
element:   14;        called-at: 1295;        completed-at: 1377
element:   15;        called-at: 1377;        completed-at: 1490
element:   16;        called-at: 1490;        completed-at: 1598
element:   17;        called-at: 1598;        completed-at: 1711
element:   18;        called-at: 1711;        completed-at: 1795
element:   19;        called-at: 1795;        completed-at: 1895
element:   20;        called-at: 1895;        completed-at: 1992

        Set completed in 1,992ms;

As you can see, each element was called-at immediately after the completed-at of the previous element. So process() was called on element 2 at 105ms, which is exactly when the call on element 1 was completed-at. The total time was just shy of 2 seconds, which makes sense since we called sleep(~100ms) 20 times.

So now let's use multiple threads to process these elements. After all, if the processing of element 1 is sleeping, or waiting for some external resource, then there's no reason to wait until it completes before starting to process element 2. We will now call Each() and pass true to the parallel argument, which will use the default value of 20 for maxThreads:

each(elements, process, true);

The output produced now is:

element:    1;        called-at:    1;        completed-at:  118
element:    2;        called-at:    1;        completed-at:  101
element:    3;        called-at:    1;        completed-at:  113
element:    4;        called-at:    1;        completed-at:  114
element:    5;        called-at:    2;        completed-at:  119
element:    6;        called-at:    2;        completed-at:   98
element:    7;        called-at:    2;        completed-at:   97
element:    8;        called-at:    2;        completed-at:   98
element:    9;        called-at:    2;        completed-at:   94
element:   10;        called-at:    2;        completed-at:   87
element:   11;        called-at:    2;        completed-at:  105
element:   12;        called-at:    2;        completed-at:  117
element:   13;        called-at:    2;        completed-at:  108
element:   14;        called-at:    2;        completed-at:  115
element:   15;        called-at:    2;        completed-at:   93
element:   16;        called-at:    3;        completed-at:  111
element:   17;        called-at:    3;        completed-at:  108
element:   18;        called-at:    3;        completed-at:  115
element:   19;        called-at:    3;        completed-at:  107
element:   20;        called-at:    3;        completed-at:  105

        Set completed in 119ms;

This time, processing of element 2 did not wait for element 1 to complete. You can see that they were called at about the same time. In fact, since we used 20 threads, with 20 elements, no element waited for the previous one. Therefore, the whole operation completed in 119ms, which was the longest single processing time (element 5 in this case). Our code ran almost 20 times faster! (it's actually 16 times, but the sleep times are random so the two executions can not be compared exactly).

But sometimes you can't just use 20 threads. For example, if you try to send 20 emails at the same time through a server that is configured to only accept 10 concurrent connections from a single user, the first 10 emails will be processed successfully, but then a bunch of others will be rejected until emails in the first batch will complete and connections will become available.

In those cases, you have to set the maximum number of threads by passing the maxThreads argument. To avoid opening too may connections at the same time to our theoretical email server, let's call Each() with a value of 10 for maxThreads:

each(elements, process, true, 10);

And see what happens now:

element:    1;        called-at:    1;        completed-at:   89
element:    2;        called-at:    1;        completed-at:  109
element:    3;        called-at:    1;        completed-at:   91
element:    4;        called-at:    1;        completed-at:  108
element:    5;        called-at:    1;        completed-at:   91
element:    6;        called-at:    1;        completed-at:  106
element:    7;        called-at:    2;        completed-at:  112
element:    8;        called-at:    2;        completed-at:   90
element:    9;        called-at:    2;        completed-at:   93
element:   10;        called-at:    2;        completed-at:   97
element:   11;        called-at:   89;        completed-at:  193
element:   12;        called-at:   90;        completed-at:  170
element:   13;        called-at:   91;        completed-at:  200
element:   14;        called-at:   91;        completed-at:  197
element:   15;        called-at:   93;        completed-at:  183
element:   16;        called-at:   97;        completed-at:  214
element:   17;        called-at:  106;        completed-at:  194
element:   18;        called-at:  108;        completed-at:  217
element:   19;        called-at:  109;        completed-at:  225
element:   20;        called-at:  112;        completed-at:  200

        Set completed in 225ms;

Elements 1 through 10 were called at about the same time (1 or 2 milliseconds after execution started), while elements 11 through 20 were called after each call to process() completed and a thread became available. So element 11 started processing at 89ms, just after element 1 completed; element 12 started at 90ms, just after element 8 completed; and so on.

The whole operation completed in 225ms, which is about 10 times faster than linear execution, and is exactly what we were hoping for.

This example did not use any shared resources between the threads so it is inherently thread-safe and falls under the parallelism paradigm, but in this companion blog post we look at concurrency and see why it is important to coordinate access to shared objects when multiple threads are involved.

[1] https://www.ibm.com/developerworks/library/j-java-streams-4-brian-goetz/index.html
[2] http://docs.lucee.org/reference/functions/each.html


Social Media

FOLLOW US