Saturday, January 21, 2012

JEE6: Stress Testing Java EE 6 Applications, how to?

Use stress testing to identify application server configuration problems, potential bottlenecks, synchronization bugs, and memory leaks in Java EE 6 code.

Unit and integration tests are helpful for the identification of business logic bugs, but in the context of Java Platform, Enterprise Edition 6 (Java EE 6), they are meaningless. Both integration and unit tests access your application in a single-threaded way.

After the deployment, however, your code will always be executed concurrently.

Stop Talking, Start Stressing
It is impossible to predict all nontrivial bottlenecks, deadlocks, and potential memory leaks by having theoretical discussions. It is also impossible to find memory leaks with unit and integration tests. Bottlenecks are caused by locks and I/O problems that are hard to identify in a single-threaded scenario. With lots of luck and patience, memory leaks can be identified with an integration test, but they can be far more easily spotted under massive load.

The heavier the load, the greater is the probability of spotting potential concurrency and robustness problems. Cache behavior, the frequency of Java Persistence API (JPA) OptimisticLockException, and the amount of memory needed in production can also be evaluated easily with stress tests.


Even in the unlikely case of a perfect application without defects, your application server will typically be unable to handle the load with default factory settings. Stress tests are a perfect tool to learn the behavior of your application under load in the first iterations without any stress. Stress test-driven development is the right choice for Java EE.

Instead of applying optimizations prematurely, you should concentrate on implementing pure business logic and verifying the expected behavior with automated tests and hard numbers.

BUG SPOTTER
Java EE 6 applications are always executed concurrently. Even with unrealistic stress tests, you will learn a lot about system behavior and identify some bottlenecks or concurrency bugs.
Don’t Be Realistic
Load tests are configured by taking into account the expected number of concurrent users and realistic user behavior. To meet the requirements, “think times” need to be kept realistic, which in turn reduces the amount of concurrency in the system. The heavier the load, the easier it is to find problems.

Realistic load tests are usually performed too late in the development cycle and are useful only for ensuring that nonfunctional requirements are met. They are less valuable in verifying system correctness.

Instead of relying on realistic but lax load tests defined by domain experts, we should realize as much load as possible using developer driven stress tests. The goal is not to verify expected scalability or performance. Instead, it is to find bugs and learn about the behavior of the system under load.

Stressing the Oracle
My “oracle” application records predictions and returns them as a JavaScript Object Notation (JSON) string. (For more information on the “oracle” application, see “Unit Testing for Java EE.”) Sending a GET request to the URI http://localhost:8080/oracle/resources/predictions returns a Prediction entity serialized as {"prediction":{"result":"JAVA_IS_DEAD","predictionDate":"1970-01-01T19:57:39+01:00","success":"false"}}. Services provided by Java API for RESTful Web Services (JAXRS) are easily stress-testable; you need only execute several HTTP requests concurrently.

The open source load testing tool Apache JMeter comes with built-in HTTP support. After creating the ThreadGroup and setting the number of threads (and, thus, concurrent users), an HTTP request has to be configured to execute the GET requests (right-click, select Sampler, and then select HTTP Request). See Figure 1.
Figure 1
While the results can be visualized in various ways, the JMeter Summary Report is a good start (see Figure 2).
Figure 2
It turns out that the sample application is able to handle 1,700 transactions per second for five concurrent users out of the box.
Every request is a true transaction and is processed by an Enterprise JavaBeans 3.1 (EJB 3.1) JAX-RS PredictionArchiveResource, delegated to the PredictionAudit EJB 3.1 bean, which in turn accesses the database through EntityManager (with exactly one record).

At this point, we have learned only that with EJB 3.1, JPA 2, and JAX-RS, we can achieve 1,700 transactions per second without any optimization. But we still have no idea what is happening under the hood.

VisualVM Turns Night to Day
GlassFish Server Open Source Edition 3.1.x and Java DB (the open source version of Apache Derby) are Java processes that can be easily monitored with VisualVM. Although VisualVM is shipped with the current JDK, you should check the VisualVM Website for updates.

VisualVM is able to connect locally or remotely to a Java process and monitor it. VisualVM provides an overview showing CPU load, memory consumption, number of loaded classes, and number of threads, as shown in Figure 3.

Figure 3
The overview is great for estimating resource consumption and monitoring the overall stability of the system. We learn from Figure 3 that for 1,700 transactions per second, GlassFish Server Open Source Edition 3.1 needs 58 MB for the heap, 67 threads, and about 50 percent of the CPU. The other 50 percent was consumed by the load generator (JMeter).

Although this colocation is adequate for the purposes of this article, it blurs the results. The load generator should run on a dedicated machine or at least in an isolated (virtual) environment. Sometimes, you even have to run distributed JMeter load tests to generate enough load to stress the server. For internet applications, it might be necessary to deploy the load generators into the cloud.

In a stress test scenario, the plain numbers are interesting but unimportant. Stress tests do not generate a realistic load, but rather they try to break the system. To ensure stability, you should monitor the VisualVM Overview average values. All the lines should be, on average, flat.

An increasing number of loaded classes might indicate problems with class loading and can lead to an OutOfMemoryError due to a shortage
of PermGen space. An increasing number of threads indicates slow, asynchronous methods. A ThreadPool configured with an unbounded number of threads will also lead to an OutOfMemoryError. And a steady increase in memory consumption can eventually lead to an OutOfMemoryError caused by memory leaks.

VisualVM comes with an interesting profiling tool called Sampler. You can attach and detach to a running Java process with a little overhead and measure the most-expensive invocations or the size of objects (see Figure 4).
Figure 4
The sampling overhead is about 20 percent, so with an active sampler, you can still achieve 1,400 transactions per second. As expected, the application spends the largest amount of time communicating with the database.

How Expensive Is System.out.println?
A single System.out.println can lead to significant performance degradation. To measure the overhead, every invocation of the method allPredictions is logged with a System.out.println invocation, as shown in Listing 1.

  public List<Prediction> allPredictions(){
        System.out.println("-- returning predictions");
        return this.em.createNamedQuery(Prediction.findAll).getResultList();
    }
Listing 1

Instead of 1,700 transactions per second, we are able to perform only about 800 transactions per second, as shown in Figure 5.

Figure 5
Let’s take a look at the VisualVM Sampler output shown in Figure 6. More time is spent in ThreadPrintStream.println() than in the most expensive database operation.

Figure 6
The actual EJB 3.1 overhead is negligible. The $Proxy285.allPredictions() invocation is in the very last position and orders of magnitude faster than a single System.out.println.

Having a reference measurement makes identification of potential bottlenecks easy. You should perform stress tests as often as possible and compare the results. Performing nightly stress tests from the very first iteration is desirable. You will get fresh results each morning so you can start fixing potential bottlenecks.

Causing More Trouble
Misconfigured application servers are a common cause of bottlenecks. GlassFish Server Open Source Edition 3.1 comes with reasonable settings, so we can reduce the maximum number of connections from the Derby pool to two connections to simulate a bottleneck. With five concurrent threads (users) and only two database connections, there should be some contention (see Figure 7).

Figure 7
The performance is still surprisingly good. We get 1,400 transactions per second with two connections. The max response time went up to 60 seconds, which correlates surprisingly well with the “Max Wait Time: 60000 ms” connection pool setting in GlassFish Server Open Source Edition 3.1. A hint in the log files also points to the problem, as shown in Listing 2.

WARNING: RAR5117 : Failed to obtain/create connection from connection pool 
[ SamplePool ]. Reason : com.sun.appserv.connectors.internal.api.PoolingException: In-use 
connections equal max-pool-size and expired max-wait-time. Cannot allocate more connections.
WARNING: RAR5114 : Error allocating connection : [Error in allocating a connection. Cause: 
In-use connections equal max-pool-size and expired max-wait-time. 
Cannot allocate more connections.]
Listing 2

Also interesting is the Sampler view in VisualVM (see Figure 8).

Figure 8
The method JNDIConnector.connect() became the most expensive method. It even displaced the Reply.fill() method from its first rank.
The package org.eclipse.persistence is the JPA provider for GlassFish Server Open Source Edition 3.1, so it should give us a hint about the bottleneck’s location. There is nothing wrong with the persistence layer; it only has to wait for a free connection. This contention is caused by the artificial limitation of having only two connections available for five virtual users.

A look at the JNDIConnector.connect method confirms our suspicion (see Listing 3).

public Connection connect(Properties properties) throws DatabaseException, 
ValidationException {
   String user = properties.getProperty("user");
   DataSource dataSource = getDataSource();
   try{
      ...
      return dataSource.getConnection(user, password);
} catch (SQLException exception) {
            throw DatabaseException.sqlException(exception, true);
}}
Listing 3

In the method JNDIConnect.connect, a connection is acquired from a DataSource. In the case of an empty pool, the method will block until either an in-use connection becomes free or the Max Wait Time is reached. The method can block up to 60 seconds with the GlassFish Server Open Source Edition default settings. This rarely happens with the default settings, because the server ships with a Maximum Pool Size of 32 database connections.

How to Get the Interesting Stuff
The combination of JMeter and VisualVM is useful for ad hoc measurements. In real-world projects, stress tests should be not only repeatable but
also comparable. A history of results with visualization makes the resultant comparison and identification of hotspots easier.

VisualVM provides a good overview, but the really interesting monitoring information can be obtained only from an application server in a proprietary way. All major application servers provide extensive monitoring information via Java Management Extensions (JMX).

GlassFish Server Open Source Edition 3.1 exposes its monitoring and management data through an easily accessible Representational State Transfer (REST) interface.

To activate the monitoring, open the GlassFish Admin Console by specifying the Admin Console URI (http://localhost:4848). Then select Server, select Monitor, and then select Configure Monitoring. Then select the HIGH level for all components. Alternatively, you can activate monitoring by using the asadmin command from the command line or by using the REST management interface.

Now, all the monitoring information is accessible from the following root URI: http://localhost:4848/monitoring/domain/server. The interface is self-explanatory.

You can navigate through the components from a browser or from the command line.
The command curl -H "Accept:application/json" http://localhost:4848/monitoring/domain/server/jvm/memory/ usedheapsize-count returns the current heap size formatted as a JSON object (see Listing 4).

{"message":"","command":"Monitoring Data","exit_code":"SUCCESS",
"extraProperties":{"entity":{"usedheapsize-count":{"count":217666480,
"lastsampletime":1308569037982,"description":"Amount of used memory in 
bytes","unit":"bytes","name":"UsedHeapSize","starttime":1308504654922}},
"childResources":{}}}. 
Listing 4

The most interesting key is usedheapsize-count. It contains the amount of used memory in bytes, as described by the description tag. The good news is that the entire monitoring API relies on the same structure and can be accessed in a generic way.

Monitoring Java EE 6 with Java EE 6
GO FOR AVERAGE
In a stress test scenario, the plain numbers are interesting but unimportant. Stress tests do not generate a realistic load, but rather they try to break the system. To ensure stability, you should monitor the average values.
Executing HTTP GET requests from the command line still does not solve the challenge. To be comparable, the data has to be persistently stored.  A periodic snapshot between 1 and 30 seconds is good enough for smoke tests and stress tests. It turns out that you can easily persist monitoring data with a simple Java EE 6 application. JPA 2, EJB 3.1, Contexts and Dependency Injection (CDI), and JAX-RS reduce the task to only three classes.

The JPA 2 entity Snapshot holds the relevant monitoring data (see Listing 5).

@XmlAccessorType(XmlAccessType.FIELD)
@XmlRootElement
@Entity
public class Snapshot {
    @Id
    @GeneratedValue
    private long id;
    @Temporal(TemporalType.TIME)
    private Date monitoringTime;
   //...field declarations omitted
 
    public Snapshot(long usedHeapSize, int threadCount, int totalErrors, int currentThreadBusy, 
int committedTX, int rolledBackTX, int queuedConnections) {
        this();
        this.usedHeapSize = usedHeapSize;
//...
}
    public Snapshot() {
        this.monitoringTime = new Date();
    }
}
Listing 5

The entity Snapshot represents the interesting data, such as the number of busy threads, queuedConnections, errors, committed and rolled-back transactions, heap size, and the time stamp. You can extract such information from any other application server through different channels and APIs.

Listing 6 shows how to access REST services with Jersey. The managed bean DataProvider uses the Jersey client to access the GlassFish Server Open Source Edition’s REST interface and convert the JSON result into Java primitives. The fetchData method is the core functionality of DataProvider, and it returns the populated Snapshot entity.

public class DataProvider {
    public static final String BASE_URL = "http://localhost:4848/monitoring/domain/server/";
    public static final String HEAP_SIZE = "jvm/memory/usedheapsize-count";
private Client client;
 
    public Snapshot fetchData(){
        try {
            long usedHeapSize = usedHeapSize();
           //… other assignments omitted
   return new Snapshot(usedHeapSize, threadCount, totalErrors, 
currentThreadBusy, committedTX, rolledBackTX, queuedConnections);
        } catch (JSONException e) {
            throw new IllegalStateException("Cannot fetch monitoring data because of: "+ e);
        }
    }
    long usedHeapSize() throws JSONException{
        final String uri = BASE_URL + HEAP_SIZE;
        return getLong(uri,"usedheapsize-count");
    }
    //other accessors omitted    
    
    long getLong(String uri,String name) throws JSONException{
   ClientResponse result = client.resource(uri).
accept(MediaType.APPLICATION_JSON).
get(ClientResponse.class);
        return getJSONObject(result,name).getLong("count");
    
    }
//accessors omitted
    JSONObject getJSONObject(ClientResponse result,String name) throws JSONException {
        JSONObject response = result.getEntity(JSONObject.class);
        return response.getJSONObject("extraProperties").
                getJSONObject("entity").
                getJSONObject(name);
    }}
Listing 6

Every five seconds, the MonitoringController @Singleton EJB 3.1 bean shown in Listing 7 asks the DataProvider for a Snapshot and persists it. In addition, the persisted data is exposed through REST.

@Singleton
@Path("snapshots")
@Produces(MediaType.APPLICATION_JSON)
public class MonitoringController {
    @Inject
    DataProvider dataProvider;
    @PersistenceContext
    EntityManager em;
    @Schedule(minute="*",second="*/5",hour="*")
    public void gatherAndPersist(){
        Snapshot current = dataProvider.fetchData();
        em.persist(current);
    }
    @GET
    public List<Snapshot> all(){
        CriteriaBuilder cb = this.em.getCriteriaBuilder();
        CriteriaQuery q = cb.createQuery();
        CriteriaQuery<Snapshot> select = q.select(q.from(Snapshot.class));
        return this.em.createQuery(select).getResultList();
    }}
Listing 7

You can access all snapshots using http://localhost:8080/stm/resources/snapshots, and you will get Snapshot instances as a JSON object (see Listing 8).

{"snapshot":
[{"id":"1","monitoringTime":"1970-01-01T10:50:30+01:00","usedHeapSize":"158408536",
"threadCount":"131","totalErrors":"4","currentThreadBusy":"-1","committedTX":"23",
"rolledBackTX":"2","queuedConnections":"0"},{"id":"2",
"monitoringTime":"1970-01-01T10:50:35+01:00","usedHeapSize":"160421672","threadCount":"131",
"totalErrors":"4","currentThreadBusy":"-1","committedTX":"23","rolledBackTX":"2",
"queuedConnections":"0"}}

Listing 8

Interestingly, a Java EE 6 solution is significantly leaner than a comparable Plain Old Java Object (POJO) implementation. Periodic timer execution, transactions, and EntityManager bookkeeping are provided out of the box in Java EE 6 but must be implemented in Java Platform, Standard edition (Java SE).

Automating the Stress Test
Using the StressTestMonitor (STM) application, we can collect application server monitoring data systematically and persistently, and we can analyze and compare the stress test results after each run. This approach is not perfect, though, because both the stress test and the load generator must be started and stopped manually.

A continuous integration (CI) tool, such as Hudson or Jenkins, is capable of automating the whole lifecycle and can easily deploy STM and start the load generator automatically.

Hudson and Jenkins support a periodic build execution (for example, with a single configuration tag: @midnight). With Hudson or Jenkins, setting up a nightly executed stress test only takes minutes. You have to deploy the STM application first and launch the stress test generator afterward. For a GlassFish Server Open Source Edition deployment, this requires just a single line: asadmin deploy --force […]/StressTestMonitor.war.

JMeter can also be started without the GUI in headless mode. It requires another line: jmeter -n -t predictions.jmx -l predictions.jtl. The parameter -n prevents the GUI from appearing, the parameter -t specifies the configuration file (created with JMeter in GUI mode), and -l specifies the log file, which can be analyzed with JMeter after the test.

STRESS IS GOOD
Stress-test-driven development is the right choice for Java EE. Instead of applying optimizations prematurely, you should concentrate on implementing pure business logic and verifying the expected behavior with automated tests and hard numbers.
A Hudson or Jenkins “Free Style Software Project” build with two build steps (Execute Windows Shell Command or Execute Shell) does the job: Deploy STM and Execute Stress Tests. You don’t have to create a shell script or a batch file. The commands can be executed directly by Hudson or Jenkins.


VisualVM can still be used to monitor the application in real time or identify the hot spots. All the relevant monitoring data is persisted in a database table and can be analyzed easily after the test.


Nice-to-Haves
The solution described so far is good enough for getting started. The interesting parameters, such as JVM and application server monitoring data, are gathered and persisted automatically during a nightly job. Because of JSR-77 (the management and monitoring API), all application servers also provide access to the statistics of all deployed Java EE components.

Application-specific EJB beans (usually the façade to your business logic) can be monitored using exactly the same mechanism. You will get the number of concurrent requests, the current number of active instances (and, thus, the number of concurrent transactions accessing the bean), and business method runtime statistics, such as slowest and average execution times. For example, PredictionAudit EJB bean pool data is accessible under http://localhost:4848/monitoring/domain/server/applications/com.tm_TestingEJBAndCDI.war_1.0-SNAPSHOT/PredictionAudit/bean-pool.

All the monitoring data is stored in a single table, which makes the data available to tools such as JasperReports or Eclipse BIRT. Charts and reports are only a few clicks away. The Snapshot entity is exposed through JSON, which makes it “consumable” by all JavaScript and JavaFX applications as well.

Furthermore, the Snapshot entity is a Java class and can contain additional validation or processing logic. It is trivial to escalate “suspicious” Snapshot instances with CDI events, as shown in Listing 9.

    @Inject
    Event<Snapshot> escalationSink;
    @Schedule(minute="*",second="*/5",hour="*")
    public void gatherAndPersist(){
        Snapshot current = dataProvider.fetchData();
        em.persist(current);
        if(current.isSuspicious())
            escalationSink.fire(current);
    }
Listing 9

A suspicious Snapshot listener will only have to implement a method with an annotated parameter to receive the event public void onSuspiciousEvent(@Observes Snapshot snapshot){}.

All suspicious events can be analyzed in real time, sent using e-mail, or just aggregated and exposed with JMX.

Conclusion: No Excuses
Java EE 6 applications are always executed concurrently. Even with unrealistic stress tests, you will learn a lot about system behavior and identify some bottlenecks or concurrency bugs.

The earlier and more frequently stress tests are executed, the more you will learn about the application’s runtime behavior. Application server configuration problems, potential bottlenecks, synchronization bugs, and memory leaks can also be identified during stress tests.

References:
Java magazine 2011-11-12 , Article by Adam Bien


2 comments :

  1. Good article, thanks for writing it and good luck with your speech at java1

    ReplyDelete
  2. Hi, good article!

    I have one question! My application runs in a glassfish3.1 its a havy load application which getters some data from a IO(Sockets) and a lot of concurrent threads do a store of data to a mysql database. All runs ok but somewhere round 00:30 every day I get this exception com.sun.appserv.connectors.internal.api.PoolingException: In-use connections equal max-pool-size and expired max-wait-time. Cannot allocate more connections.

    What is really interesting is that in this period the load is the smallest, and the server recovures its self, Im getting 10-15 exception and then all runs normal...
    Any idea?!

    ReplyDelete