Wednesday, 20 March 2013

Continuous Integration

These are excerpts from Martin Flower's Continuous Integration . Please visit Continous Integration to view the complete article.Some of the lines are added by me to make the article sound better.

What is CI?


Continuous Integration is a software development practice where members of a team integrate their work frequently, usually each person integrates at least daily - leading to multiple integrations per day. Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible. Many teams find that this approach leads to significantly reduced integration problems and allows a team to develop cohesive software more rapidly.

Integration is a long and unpredictable process.

But the above is not true.

Any integration errors are found rapidly and can be fixed rapidly.This contrast isn't the result of an expensive and complex tool. The essence of it lies in the simple practice of everyone on the team integrating frequently, usually daily, against a controlled source code repository.

The term 'Continuous Integration' originated with the Extreme Programming development process, as one of its original twelve practices.

Although Continuous Integration is a practice that requires no particular tooling to deploy, but it is useful to use a Continuous Integration server.

Before understanding CI we need to understand source control.

What is Souce Control?
A source code control system keeps all of a project's source code in a repository. The current state of the system is usually referred to as the 'mainline'. At any time a developer can make a controlled copy of the mainline onto their own machine, this is called 'checking out'. The copy on the developer's machine is called a 'working copy'.

In source control you can both alter the production code, and also add or change automated tests.

Continuous Integration assumes a high degree of tests which are automated into the software.



Steps that are followed for CI

  1. I begin by taking a copy of the current integrated source onto my local development machine. I do this by using a source code management system by checking out a working copy from the mainline.
  2. I do changes to the local development copy.
  3. Once I'm done (and usually at various points when I'm working) I carry out an automated build on my development machine. This takes the source code in my working copy, compiles and links it into an executable, and runs the automated tests. Only if it all builds and tests without errors is the overall build considered to be good.
  4. With a good build, I can then think about committing my changes into the repository. The twist, of course, is that other people may, and usually have, made changes to the mainline before I get chance to commit. So first I update my working copy with their changes and rebuild. If their changes clash with my changes, it will manifest as a failure either in the compilation or in the tests. In this case it's my responsibility to fix this and repeat until I can build a working copy that is properly synchronized with the mainline.
  5. Once I have made my own build of a properly synchronized working copy I can then finally commit my changes into the mainline, which then updates the repository.
  6. At this point we build again, but this time on an integration machine based on the mainline code. Only when this build succeeds can we say that my changes are done. There is always a chance that I missed something on my machine and the repository wasn't properly updated. Only when my committed changes build successfully on the integration is my job done. This integration build can be executed manually by me, or done automatically.
Everything should be in the repository.

 Everything you need to do a build should be in there including: test scripts, properties files, database schema, install scripts, and third party libraries.

One of the features of version control systems is that they allow you to create multiple branches, to handle different streams of development.

Keep your use of branches to a minimum.

 In general you should store in source control everything you need to build anything, but nothing that you actually build. Some people do keep the build products in source control, but I consider that to be a smell - an indication of a deeper problem, usually an inability to reliably recreate builds.[I liked this line. :)]

Automated environments for builds are a common feature of systems.
 The Unix world has had make for decades,
 the Java community developed Ant,
the .NET community has had Nant and now has MSBuild.

Make sure you can build and launch your system using these scripts using a single command.

The build should include getting the database schema out of the repository and firing it up in the execution environment.

Rule of Thumb:
Anyone should be able to bring in a virgin machine, check the sources out of the repository, issue a single command, and have a running system on their machine.

Depending on what you need, you may need different kinds of things to be built. You can build a system with or without test code, or with different sets of tests. Some components can be built stand-alone. A build script should allow you to build alternative targets for different cases.

Build Softwares vs IDES
 Many of us use IDEs, and most IDEs have some kind of build management process within them. However these files are always proprietary to the IDE and often fragile. Furthermore they need the IDE to work. It's okay for IDE users set up their own project files and use them for individual development. However it's essential to have a master build that is usable on a server and runnable from other scripts. So on a Java project we're okay with having developers build in their IDE, but the master build uses Ant to ensure it can be run on the development server.

XP and TDD
In particular the rise of Extreme Programming (XP) and Test Driven Development (TDD) have done a great deal to popularize self-testing code and as a result many people have seen the value of the technique.

Both of these approaches make a point of writing tests before you write the code that makes them pass - in this mode the tests are as much about exploring the design of the system as they are about bug catching. This is a Good Thing, but it's not necessary for the purposes of Continuous Integration, where we have the weaker requirement of self-testing code. (Although TDD is my preferred way of producing self-testing code.)

For self-testing code you need a suite of automated tests that can check a large part of the code base for bugs. The tests need to be able to be kicked off from a simple command and to be self-checking. The result of running the test suite should indicate if any tests failed. For a build to be self-testing the failure of a test should cause the build to fail.

Tests don't prove the absence of bugs. However perfection isn't the only point at which you get payback for a self-testing build. Imperfect tests, run frequently, are much better than perfect tests that are never written at all.

The one prerequisite for a developer committing to the mainline is that they can correctly build their code. This, of course, includes passing the build tests. As with any commit cycle the developer first updates their working copy to match the mainline, resolves any conflicts with the mainline, then builds on their local machine. If the build passes, then they are free to commit to the mainline

The key to fixing problems quickly is finding them quickly.Conflicts that stay undetected for weeks can be very hard to resolve.

The more frequently you commit, the less places you have to look for conflict errors, and the more rapidly you fix conflicts.

Frequent commits encourage developers to break down their work into small chunks of a few hours each. This helps track progress and provides a sense of progress.

Regular builds happen on an integration machine and only if this integration build succeeds should the commit be considered to be done. Since the developer who commits is responsible for this, that developer needs to monitor the mainline build so they can fix it if it breaks. A corollary of this is that you shouldn't go home until the mainline build has passed with any commits you've added late in the day.

There are two main ways  to ensure this: using a manual build or a continuous integration server.

The manual build approach is the simplest one to describe. Essentially it's a similar thing to the local build that a developer does before the commit into the repository. The developer goes to the integration machine, checks out the head of the mainline (which now houses his last commit) and kicks off the integration build. He keeps an eye on its progress, and if the build succeeds he's done with his commit.

A continuous integration server acts as a monitor to the repository. Every time a commit against the repository finishes the server automatically checks out the sources onto the integration machine, initiates a build, and notifies the committer of the result of the build. The committer isn't done until she gets the notification - usually an email.

  1. The whole point of continuous integration is to find problems as soon as you can.
  2. The whole point of working with CI is that you're always developing on a known stable base.
The whole point of Continuous Integration is to provide rapid feedback. Nothing sucks the blood of a CI activity more than a build that takes a long time.

  1. A build that takes an hour to be totally unreasonable.Because every minute you reduce off the build time is a minute saved for each developer every time they commit.
Introduce some automated testing into your build. Try to identify the major areas where things go wrong and get automated tests to expose those failures. Particularly on an existing project it's hard to get a really good suite of tests going rapidly - it takes time to build tests up. You have to start somewhere though - all those cliches about Rome's build schedule apply.

If you are starting a new project, begin with Continuous Integration from the beginning. Keep an eye on build times and take action as soon as you start going slower than the ten minute rule. By acting quickly you'll make the necessary restructurings before the code base gets so big that it becomes a major pain.

On the whole I think the greatest and most wide ranging benefit of Continuous Integration is reduced risk.

The trouble with deferred integration is that it's very hard to predict how long it will take to do, and worse it's very hard to see how far you are through the process. The result is that you are putting yourself into a complete blind spot right at one of tensest parts of a project - even if you're one of the rare cases where you aren't already late.

Bugs are cumulative. The more bugs you have, the harder it is to remove each one.

As a result projects with Continuous Integration tend to have dramatically less bugs, both in production and in process.

If you have continuous integration, it removes one of the biggest barriers to frequent deployment. Frequent deployment is valuable because it allows your users to get new features more rapidly, to give more rapid feedback on those features, and generally become more collaborative in the development cycle. This helps break down the barriers between customers and development - barriers which I believe are the biggest barriers to successful software development.

So, these are my notes on CI from Continuous Integration .

2 comments: