Showing posts with label CI. Show all posts
Showing posts with label CI. Show all posts

19 January 2014

Jenkins performance hints

Well, Jenkins CI server scalability has limitations. But for vast majority of applications its performance is fairy enough. There are installations with hundreds of slaves running about 10k builds daily. While Jenkins configuration is relatively simple, some art required to setup and maintain a busy server. There are some suggestions how to keep it fast - divided to Master configuration, Slave configuration and Job design. Plus few notes on Multi-master Jenkins.

Jenkins Master configuration

Number of plugins. Plugins cause performance issues for builds (because of hooks) and UI (because they adds stuff to it). Do not add too many plugins and anyway - evaluate them thoroughly [ref].

Number of jobs. Jenkins gets slow (at least in UI) with 1000+ jobs [ref]. Moving jobs to several masters (manual static sharding) helps. E.g. one master - for builds, another - for tests. Functional segregation lets to simplify Jenkins configuration and to decrease number of plugins. While splitting a big master to two similar ones leaves two complex configurations on each master.
Keep the number of active jobs reasonable, remove unused.
Utilize Git and Gerrit Trigger plugins to serve multiple branches by one set of jobs.

Jobs on Master. Should be none. Only light internal tasks crucial for Jenkins housekeeping. Definitely - no application jobs.

SCM polling on Master. SCM polling for Git or Perforce require execution of the CLI program for each check for each job. For reliable polling it should be configured to run on master. Polling on slaves (default for both VCS) is bad because slaves are trashable.
Use push hooks instead of polling. For Git use Gerrit trigger - “Ref update” event can replace SCM polling in most cases. For Perforce … set polling period to something large, use “H” or “@hourly” for Cron expression in polling configuration.
Subversion uses SVNkit instead of CLI so it is not affected.

Builds lazy-loading. When JVM minimum and maximum heap sizes differ, WeakReferences (lazy loading uses them) garbage-collected before JVM tries to expand the heap [ref]. It causes extra load on builds re-loading and sometimes may lead to disappearing build records.
JVM configuration for servers should have minimum and maximum heap sizes set to the same value.

Access control. Authenticated users should be allowed to do anything excluding system administration [ref]. “Trust users not to be malicious. Don’t trust users not to do daft things - or read documentation, or to have well behaved unit tests.” [ref] Trust encourages. But also it helps to save on authorization. Complex authorization (e.g. Role Strategy plugin) kills UI performance, API performance suffers too.

Disk IO performance. Use fast disks for configuration (startup time) and build records (build lazy loading) [ref]. SSD on master helps a lot [ref]. Separate configuration, builds records and artifact storage. Worth to look at Pluggable artifact transfer and storage (JENKINS-17236).

Use external API/UI frontend for Jenkins. Jenkins is not very good at UI performance. UI plugins worsen it even more. Workarounds - external UI dashboards or frontend systems [ref]. Examples of problematic plugins:

  • Dashboard view plugin is having a real problem with lazy-loading (it thought being fixed though) [ref].
  • Nested Views plugin causes permission re-evaluation for each job on the server several times. Using regexp to filter jobs makes it worse. Worth to try - use explicit lists of jobs, not regexps. Replace it with Cloudbees Folders Plugin - it might help but needs evaluation.

HTTP cache. Fast HTTP proxy in front of Jenkins to cache static data [ref] might help. But it requires further evaluation.

Servlet container. Embedded Winstone (before 1.535) or Jetty8 (1.535+, but not in 1.532.1 LTS) vs Tomcat. Jetty used to be better on consistent throughput and resource consumption than Tomcat. But for recent Jetty 8-9 and Tomcat 7 there are no clear evidence of it.

Jenkins Slave configuration

Number of slaves. There is “X1K initiative” - goal for Jenkins developers to assure smooth operation of master with 1000 executors on all slaves [ref]. It is still a challenge. Somewhere around 250 slaves and lots of builds slave connections start getting broken in the middle of a build [ref], there are evidences of Jenkins tending to lose connection to slaves when there are about hundred of slaves [ref]. Since thread usage improvement in Jenkins remoting in Jenkins core 1.521 and SSH Slaves plugin 0.27 it should not be an issue [ref, ref], but it is not proven yet.

Number of executors per slave. Increasing number of executors over the slave capabilities decreases overall throughput - due to clashes, IO congestion or RAM swapping. Leverage RAM, CPU cores and build type. RAM should be enough for maximum number of builds at maximum memory setting + file cache. CPU should be enough to work below 100% utilization, taking IO into account - IO releases some CPU time. Have less than 1 executor per CPU core for single-thread builds. Consider IOPS limit - to avoid disk IO being a bottleneck. Generally if 15 min Load Average more than the number of cores, the number of  executors should be decreased. There is a suggestion - 1 executor per slave for isolation [ref]. It is reasonable in cloud but for dedicated hardware the same isolation can be achieved by lightweight containers.

Job design

Workspace cleanup - removing job workspace before build start to get a clean build or after it - to save disk space. It adds time for fresh checkout and even longer - for Maven to download dependencies. Finally build may run few times longer.
Address it in the build system - have a reliable “clean” target in the build script, do not create files outside of temporary build directories, never touch files under version control. Clean up workspaces periodically to be sure. Do it always for “release” builds when the build speed not as important as build sanity.

Artifact fingerprinting. Large fingerprint database may kill Jenkins master performance. Copy Artifact plugin always check fingerprints. Maven builds record artifacts fingerprints unconditionally.
So - prevent code review (Gerrit) builds recording Fingerprints for Maven2/3 builds [ref], maybe - by disabling Maven artifact archiving. Applies to freestyle builds too, but it is controllable there.

Post-build actions. Limit post-build steps, they serialise parallel build (JENKINS-9913). Move the work to to build steps. E.g. use custom artifact archiver (as a build step) such as “mvn deploy”.

Maven jobs vs Freestyle jobs. Use Freestyle - Maven jobs are notably slower. And has its own set of bugs. Even Maven job type inself considered bad by a core Jenkins contributor [ref].

Large build log. Build log is loaded to master memory causing OoM error if the log is too big. Use Log File Size Checker plugin to fail the job if console log reaches a limit.

Sonar analysis. Sonar analysis at each build makes it longer 2-3 times while adds little value - Sonar is a monitoring and code inspection tool, not a gatekeeper. Run it nightly, do not - in each build.

Reference repository for Git SCM. Git repository on the local file system can be used as a reference - only update is downloaded, the rest is hardlinked.

Multi master?

There are no multi-master Jenkins clusters. And it is not expected in a foreseen future [ref, ref]. The only way to share load between masters without custom software - setup 2 masters each for its set of jobs.

  • Jenkins Enterprise by Cloudbees - just for fault tolerance [ref]. It is “active - spare” cluster. No load balancing.
  • Jenkins Operations Center by Cloudbees - simplifies management of multiple masters and slaves. Does not provide multi-master instance with single point of entry. [ref]
  • Openstack/HP multi-master uses custom software (Zuul + Gearman) and specific standardized workflow over it [ref]. It does not use Jenkins UI, only provides direct link to builds in Zuul or Gerrit. Build history, analytics and trends are collected via an external search engine [ref].

General & Cultural tips

Follow Keep it simple, stupid and You aren't gonna need it principles.

References

  1. “Keynotes”. Kohsuke Kawaguchi, Cloudbees. Jenkins User Conference 2013 - Palo Alto.
    Slides: http://www.cloudbees.com/sites/default/files/juc/juc2013/2013-1023-JUC-PaloAlto-Kohsuke-Keynote.pptx
    Video: http://www.youtube.com/watch?v=FaMoiVpKUvQ
  2. “Multiple Jenkins Master Support” Khai Do, Hewlett Packard. Jenkins User Conference 2013 - Palo Alto.
    Slides: http://docs.openstack.org/infra/publications/gearman-plugin/
    Video: http://www.youtube.com/watch?v=pLQddm85fPQ
  3. “Maintaining Huge Jenkins Clusters - Have We Reached the Limit of Jenkins?” Robert Sandell, Sony Mobile Communications. Jenkins User Conference 2013 - Palo Alto.
    Slides: http://www.cloudbees.com/sites/default/files/juc/juc2013/2013-1023-Palo-Alto-Robert_Sandell-Maintaining-Huge-Jenkins-Clusters.pdf
    Video: http://www.youtube.com/watch?v=LRonDiXUx1U
  4. "To Infinity & Beyond the Small Team" James Nord, Cisco
    Slides: http://www.cloudbees.com/sites/default/files/JUC_Palo_Alto_2013_TIaBTST.pdf
    Video: http://www.youtube.com/watch?v=CGjgS16dVUc
  5. “Scaling Jenkins Horizontally with Jenkins Operations Center by Cloudbees”. Cloudbees blog: http://blog.cloudbees.com/2013/12/scaling-jenkins-horizontally-with.html
  6. “Jenkins at Three Years: Becomes Literate, Does Mobile in the Cloud and Handles Multi-Branch”. Harpreet Singh & Kohsuke Kawaguchi, CloudBees. Jenkins User Conference 2013 - Palo Alto.
    Slides: http://www.slideshare.net/kohsuke/jenkins-user-conference-2013-literate-multibranch-mobile-and-more
    Video: http://www.youtube.com/watch?v=AKcQuOROFlI
  7. “Jenkins Scalability Summit notes”. Jenkins Scalability Summit, Oct 2013 - Los Altos. https://docs.google.com/document/d/1GqkWPnp-bvuObGlSe7t3k76ZOD2a8Z2M1avggWoYKEs/edit#
  8. “Kohsuke with OSS hat / Core improvements”. Jenkins Scalability Summit, Oct 2013 - Los Altos.
    Slides: https://wiki.jenkins-ci.org/download/attachments/68747344/Kohsuke.pptx
  9. “Sony Mobile list to Santa Claus”. Robert Sandell, Sony Mobile. Jenkins Scalability Summit, Oct 2013 - Los Altos.
    Slides: https://wiki.jenkins-ci.org/download/attachments/68747344/Sony+Mobile.pptx
  10. “Reducing the # of threads in Jenkins: SSH slaves”. Kohsuke Kawaguchi, Cloudbees. Jenkins CI blog: http://jenkins-ci.org/content/reducing-threads-jenkins-ssh-slaves
  11. “High availability”. Jenkins Enterprise: http://www.cloudbees.com/jenkins-enterprise-cloudbees-features-high-availability-plugin.cb
  12. “Jenkins' Maven job type considered evil”. Stephen Connolly. Stephen's Java Adventures. http://javaadventure.blogspot.ru/2013/11/jenkins-maven-job-type-considered-evil.html

14 January 2014

Big Jenkins servers of 2013

Jenkins CI server used to scale reasonably well for 2000s. That does not hold anymore. Because Jenkins was not designed initially for decade of development and for huge installations with hundreds of servers. So experience of large installations is helpful to realize Jenkins abilities.

Parameters of few large Jenkins installations are published:

Data

Details:

  • Openstack / HP uses Gearman server + 2 Jenkins masters + 300 dynamic slaves handle 10k builds daily. While one master could not keep it.
  • Sony Mobile has 7 independent Jenkins servers, the largest one is "Jenkins Regular @ SELD" in Lundt. One master (24 cores, 64GB RAM, 6TB disk) with 300 slaves (2..4HT cores, about 8GB RAM each) handle 6k builds daily. Sony Mobile configures 1 executor per slave.
  • Yahoo Advertising Platform team has primary Jenkins master (12 HT cores, 96GB RAM, 1.2TB disk + 20TB networked storage for jobs and builds data) and 3 backup master in 2 data centers, 50 slaves in 3 data centers. It perform 8k builds per day producing 6 TB of data.
  • Netfilx has their builders in Amazon cloud. 6 independent masters (AWS m2.2xlarge - 4 cores, 32GB RAM each) and 100 slaves (AWS m1.xlarge) run 4000 builds daily generating 3TB build data.

Links

  1. "Maintaining Huge Jenkins Clusters - Have We Reached the Limit of Jenkins?" Robert Sandell, Sony Mobile Communications. Jenkins User Conference 2013 - Palo Alto.
    Slides: http://www.cloudbees.com/sites/default/files/juc/juc2013/2013-1023-Palo-Alto-Robert_Sandell-Maintaining-Huge-Jenkins-Clusters.pdf
    Video: http://www.youtube.com/watch?v=LRonDiXUx1U
  2. "Multiple Jenkins Master Support" Khai Do, Hewlett Packard. Jenkins User Conference 2013 - Palo Alto.
    Video: http://www.youtube.com/watch?v=pLQddm85fPQ
    Slides:http://docs.openstack.org/infra/publications/gearman-plugin/
  3. "To Infinity & Beyond the Small Team" James Nord, Cisco
    Slides: http://www.cloudbees.com/sites/default/files/JUC_Palo_Alto_2013_TIaBTST.pdf
    Video: http://www.youtube.com/watch?v=CGjgS16dVUc
  4. Jenkins Scalability Summit notes. Jenkins Scalability Summit, Oct 2013 - Los Altos.https://docs.google.com/document/d/1GqkWPnp-bvuObGlSe7t3k76ZOD2a8Z2M1avggWoYKEs/edit#
  5. "13,000 jobs and counting". Mujibur Wahab, Yahoo!. Jenkins Scalability Summit, Oct 2013 - Los Altos.
    Slides: https://wiki.jenkins-ci.org/download/attachments/68747344/Yahoo.pptx
  6. "How Jenkins Builds the Netflix Global Streaming Service". Gareth Bowles, Brian Moyles, Netflix. Jenkins User Conference 2012 - San Francisco.
    Slides: http://www.cloudbees.com/sites/default/files/juc2011/JUCSF_2012_Building-Netflix-Streaming-with-Jenkins_JUC.pdf
    Video: http://www.youtube.com/watch?v=GF0p7jTf6tk

19 July 2013

Virualization cost for build automation

It is a build time trend chart from a Jenkins job building a Maven project. Around build # 480 the job was switched from a bare metal builder to a same size virtual machine. Average build time increased for ~25%.

The job was running on a dedicated Jenkins slave machine running CentOS 6 Linux. Earlier it had an i7 4 core + HT CPU with 8 GB RAM and software RAID0 over 2 rotating disks. Then it moved to a Xen 4 virtual machine (paravirtualized) with the same characteristics, only CPU logical cores number was set to 7 instead of 8. No other load was put on this physical box. So the performance drop is due to virtualization only.

Build time change was similar for the other jobs. Moreover C# builds running on Windows Server 2008 boxes shown the same figures. It means
migration from a physical machine to a Xen VM costs about 25% build time increase.
Given various comparative benchmarks of different hypervisors I don't expect better results from the other virtualization technologies. Of course things like Linux cgroups don't count. Another important note - we use local disks. For networked storage things are completely different.

27 March 2012

Reasons to split large Jenkins

From time to time discussions start in (Jenkins Users) group on managing several Jenkins instances simultaneously. And each time there is a suggestion to merge these instances to a single one and leverage views and access control to separate domains.

Well, the single build server instance is easier to manage - to some extent.

It can provide more resources and computation power to its jobs.

But there are several strong reasons not to merge build servers.

  • Administrative domain separation. Co-location of stuff belonging to different teams may be just prohibited.
  • Performance. Smaller Jenkins servers boot faster. Also Jenkins has quite large memory footprint - 0.5 GB heap size is a minimum requirement for a modest installation. More jobs - more builds, more builds - larger heap. For multi GBytes heap GC delays may become an issue.
  • There are also plugin-related issues. Such as Subversion credentials one - SVN plugin cannot log in with different credentials to a single authentication domain.

But if someone decides to run multiple build servers they should be installed at different (virtual) hosts. It is difficult to imagine a case when co-location of several Jenkins masters can be justified.

22 March 2012

2 ways to distributed Jenkins

One of the key Jenkins features - build cluster at little to no cost. Ability to run jobs at Linux, Windows, Mac at the same time, ability to scale far above a single computing box. Moreover this ability is implemented in a very easy way to maintain and use.

Default method to build a distributed Jenkins is master-centered. I.e.

  • slave node is configured at the master;
  • master controls slave lifestyle - starts it (via SSH or DCOM) and stops.

There is a number of plugins supporting this scenario. Libvirt Slaves Plugin, VirtualBox Plugin, vSphere Cloud Plugin to control VM provisions, Slave Setup Plugin to configure the slave node. In an extreme case slaves are deployed to a public cloud (Amazon EC2 Plugin, Delta Cloud API plugin, JClouds Plugin).

Another approach - to assembly a cluster from slave side. Jenkins agents are started by someone at slave hosts, then they connect to a master and form a distributed server. For now only Swarm Plugin plugin does that.

Sure it is possible to start a slave agent simply via JNLP. But the corresponding slave node is to be configured at the master. So this case does not fit.

None of these methods is the best. They have different use cases.

  • Master-controlled approach leverages centralized resource provisioning. And the center is Jenkins. It supposes that either slave hosts are dedicated for the Jenkins server or there is plenty of resources.
  • Slave-initiated way works when resource control is outside Jenkins, when several Jenkins servers coincide in a computer poll, when some spare resources are to be utilized. Especially useful for heavy testing tasks such as Selenium tests.

In my opinion the centralized configuration is for build engineers servicing single project team. For a multi-project build farm the distributed control suits better.

15 March 2012

Missed features in Jenkins?

Sure Jenkins is the leading continuous integration server software. Competition with it is as difficult as with Linux in general purpose server platform area. Jenkins is for general purpose too and it is the cause of some functionality omissions:

  • Weak support for pipelines. I.e. build-test-QA-release, staged deployment.
  • No support for parallel coding - feature branches, server side integration, pre-commit builds.
  • Difficult mass configuration - shared SCM configuration, credentials, job configuration inheritance and parametrization.
  • Weak traceability of user operations, no support for audit.

Well, we all implementing that yourself. There are a lot of helper plugins, but such solutions are patchy and copy-paste based.

I would argue that these functionality omissions are not really disadvantages of Jenkins. It is too general tool for such options. It would be right to treat Jenkins as a foundation for higher level solutions. A platform, see a note by Kohsuke himself (Kohsuke Kawaguchi: Writing programs that drive Jenkins). Because Jenkins is

  • quite simple;
  • really flexible;
  • easily extensible;
  • having a set of versatile APIs.

It is difficult to guess how many such uber-applications are being developed and used internally. We have one too. Next publicly available solutions should emerge - such as CloudBees Jenkins as a service, GitHub Janky. Hope these solutions will fuel Jenkins developers financially given a good balance between open source platform and custom applications will be maintained.