Nov 1, 2017

Fullstack development environment with Docker

I have been using Docker for building (compile/packaging) and running web applications for some time. Through this blog, I would like to share how I used docker for building and running a complete (Angular/SpringBoot) webapp in a local and production environments.


Docker for development environment

In large enterprise projects with distributed development teams, it becomes very difficult to manage and maintain a consistent development environment across developers. The option of VirtualBox/Vagrant/Chef/Puppet is too complex, bulky and resource intensive to be viable for this use-case.

Docker with its simplicity and light-weight containers is an excellent choice for developer environment. As Docker is lightweight, one can run multiple containers simultaneously for fine-grained components and tasks. Not to mention the benefit of having a production-like development environment.

Docker for building artifacts 

In typical production usage, Docker is plugged into the CI/CD pipeline, where the artifacts are already generated and docker machines are spun to run them. In a development environment, Docker can itself act as a build machine with the right build configuration to generate the artifacts. Those artifacts can then be deployed to other docker containers to run them.

The process of building and running the application in a Docker container is a big time saver for daily local builds, Frontend developers can use a Docker container to build and run backend services with the right version of code without having to install backend build and runtime tools locally on their machines.

Below you will find a sample Docker configuration for a complete stack (frontend/backend build and runtime) using existing published docker images. It took me some time to setup Angular CLI build container.

Project Structure

The sample project is Angular frontend running on Ngnix with SpringBoot(Tomcat) as backend services. Angular is built with Yarn/NPM/Angular CLI and backend with Gradle/Java8

As shown in the figure below, both backend and frontend have two docker files, one to build the component (Dockerfile_Build) and the second one to run it (Dockerfile). Docker-compose is used to tie the application together with the proper order of components and dependencies.

The local docker-compose will include the build steps to build the artifact and run them, whereas the production docker-compose will get QA certified artifacts from CI/CD process and would just run them in the containers.


sampleProject
      backend
            Dockerfile_build
            Dockerfile
            src
            build
                 libs
      frontend
            Dockerfile_build
            Dockerfile
            src
            nginx-config
            dist
        ops
            local
                 docker-compose.yml
            production
                 docker-compose.yml
                   

The Frontend Angular code (frontend/src) is built in frontend/dist folder using AngularCLI (ng build) by build container(../frontend/Dockerfile_build). The dist folder and Ngnix configurations are used by Ngnix container to run the Angular app.

Backend code (backend/src) is built using the Gradle container(../backend/Dockerfile_build) with output jar in backend/build/libs folder. The jar is then used by the Java container(../backend/Dockerfile) to run the SpringBoot app.

Backend Gradle Build Container

Java code is compiled and packaged using Gradle/Java8. I took the image file from Offical Docker image library: https://hub.docker.com/_/gradle/


FROM openjdk:8-jdk

CMD ["gradle"]

ENV GRADLE_HOME /opt/gradle
ENV GRADLE_VERSION 4.2.1

ARG GRADLE_DOWNLOAD_SHA256=b551cc04f2ca51c78dd14edb060621f0e5439bdfafa6fd167032a09ac708fbc0
RUN set -o errexit -o nounset \
 && echo "Downloading Gradle" \
 && wget --no-verbose --output-document=gradle.zip "https://services.gradle.org/distributions/gradle-${GRADLE_VERSION}-bin.zip" \
 \
 && echo "Checking download hash" \
 && echo "${GRADLE_DOWNLOAD_SHA256} *gradle.zip" | sha256sum --check - \
 \
 && echo "Installing Gradle" \
 && unzip gradle.zip \
 && rm gradle.zip \
 && mv "gradle-${GRADLE_VERSION}" "${GRADLE_HOME}/" \
 && ln --symbolic "${GRADLE_HOME}/bin/gradle" /usr/bin/gradle \
 \
 && echo "Adding gradle user and group" \
 && groupadd --system --gid 1000 gradle \
 && useradd --system --gid gradle --uid 1000 --shell /bin/bash --create-home gradle \
 && mkdir /home/gradle/.gradle \
 && chown --recursive gradle:gradle /home/gradle \
 \
 && echo "Symlinking root Gradle cache to gradle Gradle cache" \
 && ln -s /home/gradle/.gradle /root/.gradle

# Create Gradle volume
USER gradle
VOLUME "/home/gradle/.gradle"
WORKDIR /home/gradle

RUN set -o errexit -o nounset \
 && echo "Testing Gradle installation" \
 && gradle --version

Build this image from backend folder using Dockerfile_build, run using

docker run --rm -v "$PWD":/home/gradle/project -w /home/gradle/project im-build-backend gradle build

It generates the SpringBoot app as an executable jar in the building/libs folder.



Backend SpringBoot Container

This is a regular JDK container with SpringBoot app jar.

FROM frolvlad/alpine-oraclejdk8:slim

VOLUME /tmp

COPY build/libs/mySample.jar app.jar
RUN sh -c 'touch app.jar'

ENV JAVA_OPTS=""

ENTRYPOINT [ "sh", "-c", "java $JAVA_OPTS -jar /app.jar" ]


docker run -d -p 8080:8080 im-backend

Frontend Yarn/AngularCLI Build Container

This container has Yarn, Node, Angular CLI and NPM to build Angular app.

FROM alexsuch/angular-cli:1.4.8

Build/run this container from frontend project folder which has package.json and src folders.

docker run -it --rm -w /app -v $(pwd):/app im-frontend-build ng build


FrondEnd Ngnix Container

Copy the Ngnix confs and dist folder to the  Ngnix container to run the Angular app

FROM nginx
COPY nginx-config/nginx.conf /etc/nginx
COPY dist /var/www/im


docker run -it -p 80:80 im-frontend


DockerCompose local

Pull it all together in docker compose.


Run docker-compose up -d to build and run the frontend and backend services.

Output:


local romiawasthy$ docker-compose up -d
Creating network "local_default" with the default driver
Creating local_build_BackEnd_1 ... 
Creating local_build_BackEnd_1 ... done
Creating local_backend_1 ... 
Creating local_backend_1 ... done
Creating local_build_FrontEnd_1 ... 
Creating local_build_FrontEnd_1 ... done
Creating local_frontend_1 ... 
Creating local_frontend_1 ... done


DockerCompose production

In the production configuration, build images are excluded.


Run docker-compose up -d to run the frontend and backend services.

Output:

production romiawasthy$ docker-compose up -d
Starting production_backend_1 ... 
Starting production_backend_1 ... done
Creating production_frontend_1 ... 
Creating production_frontend_1 ... done


Conclusion

Docker can be a great option for consistent and shareable development environment across large teams.The pattern allows for using the same docker containers for running the services in both local and production. The pattern can be extended for different build and deployment tools.

Jan 7, 2016

Cookie blacklisting/whitelisting on Apache HTTPD

Apache HTTPD is often deployed as a reverse proxy, as part of which it has to support various security features, one of which is cookie filtering. I implemented cookie backlisting and whitelisting on Apache 2.4 using mod_header directive. It took me some time to figure out the right regex and syntax and would like to share the approach.

Blacklisting is relatively simple where you just need to identify the specific cookie (regex) and use RequestHeader edit to replace the cookie with a blank string.
Whitelisting is a little tricky where you need to extract/store the required cookies from the header, reset the cookie header and then add the whitelisted cookies back.

Before you start, make sure that you print cookies in the logs, just add \"%{Cookie}i\" in LogFormat definition.

Also, since we will be using  mod_header directive, make sure header_module is enabled in HTTPD conf

The request header edits can be added  directly in HTTPD.conf in any section in conf or virtual host.
For clarity, I created a separate conf file (cookie_blacklisting.conf) and included it in httpd.conf.

##Include custom conf for blacklisting
<IfModule headers_module>
                Include conf/custom/cookie_blacklisting.conf
</IfModule>

This is my cookie_blacklisting.conf file which removes blacklisted cookies (BACKLISTED_COOKIE_1 and BACKLISTED_COOKIE_2) from a specific uri (/protecteduri).

SetEnvIf Request_URI "^/ protecteduri " IsProtected

RequestHeader edit Cookie "(^BACKLISTED_COOKIE_1=[^;]*;|; BACKLISTED_COOKIE_1=[^;]*)" "" env= IsProtected
RequestHeader edit Cookie "(^BACKLISTED_COOKIE_2=[^;]*;|; BACKLISTED_COOKIE_2=[^;]*)" "" env= IsProtected

To test this, I created two pages /normal/index.html and /protecteduri/index.html and checked the cookie values in the access logs. The requests for /normal passed all the cookies whereas /protecteduri did not have blacklisted cookies.

This is my cookie_whitelisting.conf file, which tries to whitelist two cookies (WHITELISTED_COOKIE_1 and WHITELISTED_COOKIE_2)

##Get the values of WhileListing cookies
SetEnvIf Cookie "(^WHITELISTED_COOKIE_1=[^;]*| WHITELISTED_COOKIE_1=[^;]*)"  ENV_WHITELISTED_COOKIE_1=$1
SetEnvIf Cookie "(^WHITELISTED_COOKIE_2=[^;]*| WHITELISTED_COOKIE_2=[^;]*)"  ENV_WHITELISTED_COOKIE_2=$1

SetEnvIf Request_URI "^/protecteduri" IsProtected

###For IsProtected, unset cookies
RequestHeader unset Cookie env= IsProtected

###For IsProtected, append the whitelisted cookies

RequestHeader append Cookie "%{ENV_WHITELISTED_COOKIE_1}e; path=/;"  env= IsProtected
RequestHeader append Cookie "%{ ENV_WHITELISTED_COOKIE_2}e; path=/;"  env= IsProtected

I used the same /normal and /protecteduri pages to verify the whitelisting.

Hope this helps.

Apr 18, 2015

TDD, Code review and Economics of Software Quality

To understand the value of Junits (developer tests), try maintaining, or worst, refactoring a code base that has none. The cost of  maintaining such code is so high, that in most cases, it gets replaced instead of being improved or enhanced. The developer tests leads to ease of  maintenance and thus enable change. They are now a critical part of software development, most enterprises have adopted them and have moved from "no" tests to "some" test organization, but the road beyond that is unclear.  The industry prescribed techniques (Uncle Bob TDD rules and 100 % code coverage) are difficult to adopt for large enterprises which have a massive code base and globally distributed teams. Enterprises needs a way to standardize testing practices, which can be easily implemented and enforced across internal development teams and external outsourced development partners.

Code Coverage metric provides an ability to define a specific coverage target and measure it in an automated way, but it has its own limitations.
The developer tests, are not cheap, being "developer" tests, they take developer’s time and effort, which would otherwise be spent on adding features and functions. A large test suite would increase development  cost due to increase in test execution time and it would have its own maintenance overhead.

Whenever tests are written merely to attain high coverage, they lead to excessive tests for trivial and obvious functionality, and insufficient tests for critical or change-prone code. Also, not all code needs same coverage, there might be framework and boilerplate code that does not require extensive coverage, whereas some code may need more than 100% code coverage, such as test with a wide range of dataset. There might be other project specific attributes that influence test coverage too. Therefore, a flat coverage target may not work in all situations.

The other issue of writing tests for coverage is that the tests are retro-fitted as compared to test first approach of TDD. It is not only challenging to write tests for a code that was not designed for testability but also, the benefits of TDD (test first) are not realized.

TDD is a code design process that produces testable and high-quality code. In TDD, the developer is not just  implementing the feature, but by writing tests, he is also designing modular, decoupled and testable code. A developer would find it hard to test a unit that is doing too much or is tightly coupled with other units, and would be forced to refine the code. The multiple iterations of writing code, tests and refactoring, also leads to better self-review. The developer invests  a lot more thought in the code design and would find issues early that would otherwise go undetected.

When tests are retro-fitted, these benefits are not realized. Retrofitting tests is about documenting what the code does, rather than using tests as a code design tool. But doing TDD for all the code, all the time can slow down the development. Not all code is critical enough to need TDD,  some tests can be retrofitted, like integration tests. In order to expedite development, the application can be released for integration (UI) and QA, and integration tests can be added later to document the system behavior.

So how does one verify that TDD is practiced when and where required, and there is sufficient coverage for the code when and where required. I think instead of relying on automated tools, the code review process can be expanded to review tests for quality, coverage and TDD practices.

The coverage tools and TDD cannot check the quality of tests, ie if tests properly assert and verify the code. Only a manual review can check such test quality issues.

The review process would also promote TDD. If the code submitted for review has no tests, it would  suggest that the developer did not consider testing while development and the code might not be testable (maintainable), also self-review did not occur. The reviewer can reject such code since if the code is important enough to be peer reviewed, it is important enough to be self-reviewed. The reviewer can check if the coverage for the code is sufficient or unnecessary.

This review would also increase the efficiency of the code review. The code  reviewer would review the code in the context of the tests and get better understanding of the code, thus providing better feedback.

The cost effective way to achieve software testability is to promote TDD, and instead of relying on automated tools, piggyback on existing code review process to promote and ensure TDD. 

Sep 15, 2014

Netflix Public API - Rest in peace

Netflix decision to  retire its Public API may be based on its own business and IT strategy, however, since it is the front-runner  in Web API trend, this decision needs to be assessed in a broader sense.  Specifically, what this means to the enterprises API program and their vision to increase API exposure.

Web API is a rapidly growing trend, where Enterprises offer programmatic access of their data, services and resources to the developers: internal teams, external partners, or public third-party developers. Web APIs allow access to data and functionality, that is typically available via enterprises webapps (website),  to other consumers such as internal webapps, portals, mobile and B2B partners.

Having realized the value of APIs, through reuse, where a single resource endpoint can service various consumers, the Enterprises are now looking at ways to expand their API program to a wider consumer base. I think this expansion needs a careful thought, and can greatly learn from the Netflix API program, that had to shut-off one of its API consumers (the third-party developers). Netflix may get away from its decision by  upsetting a handful of developers, however, the mainstream enterprises cannot do that without negatively affecting their business relationship and bottom lines.

Every API consumer brings in some cost and complexity that impacts API design and manageability. This sounds counterintuitive, wherein, API design needs to be consumer-agnostic, and a well-designed API should serve any consumer. Again, looking at Netflix and based on my experience with API design over the  last few years, this expectation cannot be met easily.

In my experience, API design, inadvertently gets influenced by the consumers that it is initially developed for; a browser-based webapp, mobile, external partners, etc. The optimization needed for different consumers has to be handled at the API design level. On one hand, API need to handle strict security and policy controls for external users, while on the other hand it needs to  handle course-grained/auto-discovery for an internal webapp. To serve all types of consumers, API needs to stay fine-grained, but that may make the consuming webapp too chatty and result in performance issues. In reality, designing a single API that handles all optimizations for all its consumers gets cumbersome.


Netflix VP of Edge Engineering, Daniel Jacobson, in his blog  Why rest keeps me up at night, points out similar complexity when trying to design one-size-fits-all APIs. Here are a few extracts from his blog:

Our REST API, while very capable of handling the requests from our devices in a generic way, is optimized for none of them.

That means that each device potentially has to work a little harder (or sometimes a lot harder) to get the data needed to create great user experiences because devices are different from each other.

Because of the differences in these devices, Netflix UI teams would often have to do a range of things to get around our REST API to better serve the users of the device. Sometimes, the API team would be required to extend the base service to handle special cases, often resulting in spaghetti code or undocumented features. "

The design solution to handle consumer-specific complexities are expensive, either consumer has to do extra-work, or services has consumer-specific rules, or an extra proxy (intermediary) is required to handle consumer-specific features.


At the end, adding API consumer has cost implications that need to be assessed against the business value of API expansion.

While Daniel Jacobson may have solved his problem by shutting-down Public APIs and is having a good night's sleep, some of us still need to find a better way to rest.

Jun 19, 2014

REST API documentation- HTMLWadlGenerator for CXF

In Apache CXF, one can generate a Wadl for any registered resource by appending ?_wadl to the resource url. This Wadl provides an excellent source of real-time REST API documentation, but the output format is not reader-friendly.

I extended the default CXF WadlGenerator to support text/html mediaType using Wadl XSL stylesheet. 

Here is the code for HTMLWadlGenerator, which can be registered as a jaxrs:provider with the jaxrs:server. To see the output, append ?_wadl&_type=text/html  to the resource url.

This would give a nice looking HTML page of REST API documentation for the registered resources.

Jun 8, 2014

Configure Solr -Suggester

Solr includes an autosuggest component, Suggestor. From Solr 4.7 onwards, the implementation of this Suggestor is changed. The old SpellChecker based search component is replaced with a new suggester that utilizes Lucene suggester module. The latest Solr download is preconfigured with this new suggester, but the documentation on the Solr wiki is still of the previous  SpellCheck version.

It took me sometime to understand the new suggester and get it working.

There are two configurations for suggester, a search component and a request handler:
<searchComponent name="suggest" class="solr.SuggestComponent">
            <lst name="suggester">
      <str name="name">mySuggester</str>
      <str name="lookupImpl">FuzzyLookupFactory</str>      <!-- org.apache.solr.spelling.suggest.fst -->
      <str name="dictionaryImpl">DocumentDictionaryFactory</str>     <!-- org.apache.solr.spelling.suggest.HighFrequencyDictionaryFactory -->
      <str name="field">cat</str>
      <str name="weightField">price</str>
      <str name="suggestAnalyzerFieldType">string</str>
    </lst>
  </searchComponent>

  <requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
    <lst name="defaults">
      <str name="suggest">true</str>
      <str name="suggest.count">10</str>
    </lst>
    <arr name="components">
      <str>suggest</str>
    </arr>
  </requestHandler>

To check the suggester, index few documents with good test values for  cat field, which is set as the suggestion field.

The url for getting suggestions


(use suggest.build=true for the first time)

In my case this returns 
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">13</int>
</lst>
<str name="command">build</str>
<lst name="suggest">
<lst name="mySuggester">
<lst name="A">
<int name="numFound">2</int>
<arr name="suggestions">
<lst>
<str name="term">A Clash of Kings</str>
<long name="weight">0</long>
<str name="payload"/>
</lst>
<lst>
<str name="term">A Game of Thrones</str>
<long name="weight">0</long>
<str name="payload"/>
</lst>
</arr>
</lst>
</lst>
</lst>
</response>

Since a default suggester  is not configured, suggest.dictionary is required, without it, you will get an exception: No suggester named default was configured

You can configure default suggestor in the SolrConfig.xml
  <requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
    <lst name="defaults">
      <str name="suggest">true</str>
      <str name="suggest.count">10</str>
                  <str name="suggest.dictionary">mySuggester</str>
    </lst>
    <arr name="components">
      <str>suggest</str>
    </arr>
  </requestHandler>

Now you should be able to get suggeston, without having to specify dictionary in the URL.