Thursday, June 19, 2014

REST API documentation- HTMLWadlGenerator for CXF

In Apache CXF, one can generate a Wadl for any registered resource by appending ?_wadl to the resource url. This Wadl provides an excellent source of real-time REST API documentation, but the output format is not reader-friendly.

I extended the default CXF WadlGenerator to support text/html mediaType using Wadl XSL stylesheet. 

Here is the code for HTMLWadlGenerator, which can be registered as a jaxrs:provider with the jaxrs:server. To see the output, append ?_wadl&_type=text/html  to the resource url.

This would give a nice looking HTML page of REST API documentation for the registered resources.

Sunday, June 8, 2014

Configure Solr -Suggester

Solr includes an autosuggest component, Suggestor. From Solr 4.7 onwards, the implementation of this Suggestor is changed. The old SpellChecker based search component is replaced with a new suggester that utilizes Lucene suggester module. The latest Solr download is preconfigured with this new suggester, but the documentation on the Solr wiki is still of the previous  SpellCheck version.

It took me sometime to understand the new suggester and get it working.

There are two configurations for suggester, a search component and a request handler:
<searchComponent name="suggest" class="solr.SuggestComponent">
            <lst name="suggester">
      <str name="name">mySuggester</str>
      <str name="lookupImpl">FuzzyLookupFactory</str>      <!-- org.apache.solr.spelling.suggest.fst -->
      <str name="dictionaryImpl">DocumentDictionaryFactory</str>     <!-- org.apache.solr.spelling.suggest.HighFrequencyDictionaryFactory -->
      <str name="field">cat</str>
      <str name="weightField">price</str>
      <str name="suggestAnalyzerFieldType">string</str>
    </lst>
  </searchComponent>

  <requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
    <lst name="defaults">
      <str name="suggest">true</str>
      <str name="suggest.count">10</str>
    </lst>
    <arr name="components">
      <str>suggest</str>
    </arr>
  </requestHandler>

To check the suggester, index few documents with good test values for  cat field, which is set as the suggestion field.

The url for getting suggestions


(use suggest.build=true for the first time)

In my case this returns 
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">13</int>
</lst>
<str name="command">build</str>
<lst name="suggest">
<lst name="mySuggester">
<lst name="A">
<int name="numFound">2</int>
<arr name="suggestions">
<lst>
<str name="term">A Clash of Kings</str>
<long name="weight">0</long>
<str name="payload"/>
</lst>
<lst>
<str name="term">A Game of Thrones</str>
<long name="weight">0</long>
<str name="payload"/>
</lst>
</arr>
</lst>
</lst>
</lst>
</response>

Since a default suggester  is not configured, suggest.dictionary is required, without it, you will get an exception: No suggester named default was configured

You can configure default suggestor in the SolrConfig.xml
  <requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
    <lst name="defaults">
      <str name="suggest">true</str>
      <str name="suggest.count">10</str>
                  <str name="suggest.dictionary">mySuggester</str>
    </lst>
    <arr name="components">
      <str>suggest</str>
    </arr>
  </requestHandler>

Now you should be able to get suggeston, without having to specify dictionary in the URL.



Sunday, April 13, 2014

Akka Java for large-scale event processing

We are designing a large scale distributed event-driven system for real-time data replication across transactional databases. The data(messages) from the source system undergoes a series of  transformations and routing-logic before reaching its destination. These transformations are multi-process and multi-threaded operations, comprising of smaller stateless steps and tasks that can be performed concurrently. There is no shared state across processes instead, the state transformations are persisted in the database, and each process pulls its work-queue directly from the database. 

Based on this, we needed a technology that supported distributed event processing, routing and concurrency on the  Java + Spring platform, the three options considered were, MessageBroker (RabbitMQ), Spring Integration and Akka

RabitMQ: MQ was the first choice because it is the traditional and proven solution for messaging/event-processing. RabbitMQ, because it is popular light-weight open source option with commercial support from a vendor we already use. I was pretty  impressed with RabbitMQ, it was easy to use, lean, yet supported advance distribution and messaging features. The only thing that it lacked for us, was the ability to persist messages in Oracle.  

Even though RabbitMQ is Open Source (free), for enterprise use, there is a substantial cost factor to it. As MQ is an additional component in the middleware stack, it requires dedicated staff for administration and maintenance, and  a commercial support for the product. Also, setup and configuration of MesageBroker has its own complexity and involves cross-team coordination.

MQs are primarily EAI products and provide cross-platform (multi-language, multi-protocol) support. They might be too bulky and expensive when used just as asynchronous concurrency and parallelism solution.

Spring Integration:  Spring has a few modules that provide scalable asynchronous execution.
Spring TaskExecutor  provides asynchronous processing with lightweight thread pool options.
Spring Batch  allows distributed asynchronous processing via the Job Launcher and Job Repository. 
Spring Integration extends it further by providing EAI features, messaging, routing and mediation capabilities.

While all three Spring modules have some of the required feature, it was difficult to get everything together. Like this user, I was expecting Spring Integration would have RMI-like remoting capability.

Akka Java: Akka is a toolkit and runtime for building highly concurrent, distributed, and fault tolerant event-driven applications on the JVM. It has a Java API and I decided to give it a try.

Akka  was easy to get started, I found Activator quite helpful. Akka is based on Actor Model, which is  a message-passing paradigm of achieving concurrency without shared-objects and blocking. In Akka, rather than invoking an object directly, a message is constructed and send it to the object (called an actor) by way of an actor reference. This design greatly simplifies  concurrency management. 

However, the simplicity does not mean that a traditional lock-based concurrent program (thread/synchronization) can be  converted into Akka with few code changes. One needs to design their Actor System by defining smaller tasks, messages and communication between the them.  There is a learning curve for Akka’s concepts and Actor Model paradigm. It is comparatively small, given the complexity of concurrency and parallelism that it abstracts.

Akka offers the right level of abstraction, where you do not have to worry about thread and synchronization of shared-state, yet you get full flexibility and control to write your custom concurrency solution.   

Besides  simplicity, I thought the real power of Akka is, remoting and its ability to  distribute actors across multiple nodes for high scalability. Akka's Location Transparency and Fault Tolerance make it easy to scale and distribute  application without code changes. 

I was able to build a PoC for my multi-process and multi-threading use-case, fairly easily.  I still need to work out Spring injection in Actors.

A few words of caution, Akka’s Java code has a lot of typecasting due to Scala’s type system and achieving object mutability could be tricky. I am tempted to reuse my existing JPA entities (mutable) as messages for reduced database calls.
Also, Akka community, is geared towards Scala and there is less material on Akka Java.

In spite of all this, Akka Java seems cheaper, faster and efficient option out of the three.

Thursday, February 13, 2014

Setup RabbitMQ and Pika Python client on MacOS

There is an excellent tutorial on RabbitMQ,  however, I thought it lacked detailed steps on installation and setup of RabbitMQ server and Pika Python client. I would like to share the steps on MacOS.


Installing and Running RabbitMQ


RabbitMQ is available in homebrew

brew install rabbitmq

once installed, run the server

sudo  /usr/local/sbin/rabbitmq-server

verify RabbitMQ is running 

http://localhost:15672/ should give a login prompt, login using guest/guest


Installing Pika

Install python if you don't already have it,

brew install python

The Python formula comes with Pip and setuptools, check

/Library/Python/<yourversion>/site-packages

install pika

sudo pip install pika=0.9.8

I also ran

sudo easy_install pika

it pulls-in additional packages for pika.

That is it, you should be able to use Pika in your python programs.

You should be all set now and can start with the RabbitMQ tutorial.


Sunday, June 30, 2013

Mocking REST services using Apache HTTPD

We needed the ability to run our application using mock-data, but due to the complexity of our data-sources, we could not mock data or data sources. We did however have RESTful web services that exposed the domain data as Web Resources, these services are invoked by the Application Tier to present UI. We found that it was easier to mock our REST services than having to create a new data source with mock data.

We configured Apache Webserver ( HTTPD) to simulate the REST resources with the resource URLs that matches the actual service URLs. For a service resource: http://servicehost:8080/services/mydomain/users/test01/profile, that returns a JSON object for the user's profile, we created a "profile" file, containing the desired JSON, under /htdocs/services/mydomain/users/test01. The file can be accessed via the URL: http://httpdhost:80/service/mydomain/users/test01/profile that matches the actual service URL except for hostname and port.

Apache HTTPD was configured to return response content-type as application/json instead of default text/plain for the required directories. This was done by setting the httpd conf ForceType for a Location to set a particular content type.
<Location /services>
  ForceType application/json
</Location>

So, by changing the external parameter for the services URL (hostname and port) to a mock service provider (HTTPD), one can run the application using mock data.

We found this simple approach particularly useful in our development environment where there is an independent UI development team. UI developers can now develop and test the UI using desired mock data without being dependent on services to be available or having to create a mock data at each service-call level. 
It's easier for them to manage and change JSON (JavaScript Object Notation) as oppose to mock-data approach where they would need to understand the underlying data source and data representation. Also, the mock configuration is managed as single switch ie an  externalized application parameter (service URL), as oppose to changing all the data-stores (service URLs) in the UI code.

I got some good insight and validation for the REST service design when I started creating the resources in htdocs directory structure.

Wednesday, May 15, 2013

Remote Client Web Performance Monitor

Selenium is an excellent open source tool for automated functional testing of web applications. We use it along with JBehave for automation and  Behavior Driven Development of our large-scale browser-based application.  The tests are run from Atlassian Bamboo and we use Sauce Labs for the cross browser testing. This setup allows for a full regression testing of the application on all supported browsers with very little manual testing. This automation, along with high-coverage unit-test suite (Junit/Mockito) and puppet-based one-click deployment, is core to our continues improvement & delivery platform.

We are also using selenium to measure Page Load time of our Single Page Web application (ExtJS4.0).  Selenium WebDriver provides more reliable page load time then Selenium RC, as it uses browser's native support for automation as oppose to injecting JavaScript.
This extends Selenium ability to act as an automated performance testing tool, it can  validate ( or just log) performance of the application, along with performing functional validation.

We went a step further with Selenium implementation, were we used its WebDriver to run tests on any remote client browser to understand the real performance of the application. The user is given a test URL that runs the application via Selenium webdriver on the client browser and collect required diagnostic data.

Selenium has a remote server, which allows the selenium tests to run on remote browser. However, the remote server needs to be installed and running on the client machine, and we did not want that. We needed a reverse solution, where selenium tests run on a remote server and the application runs on the local browser.

This was solved by creating a simple web app (servlet) that can identify  its browser (using user-agent request header) and based on browser type, invoke appropriate webDriver. It would then run the application and collect load time metrics. This metric is sent via HTTP/JSON to CouchDB. I am a big fan of  CouchDB, it provides rapid application development and  schema flexibility for tool’s feature extension. 

I am searching a way to reuse my existing Selenium scripts for Load Testing, maybe run them through JMeter and capture response time in CouchDB. For now, we are using HP LadRunner with AJAX extension and it is functioning well.