Tuesday, December 06, 2011

LAVA Deployment Changes

Up to this point, we've supported LAVA releases in a number of ways:

  • Tarballs - not really recommended, but source in it's raw form
  • BZR branches - if you are doing development, or just *have* to be on the bleeding edge
  • pypi - convenient, easy to install, updated with monthly release cycles
  • .deb packages in the PPA - convenient, easy to install for Ubuntu, updated with monthly release cycles

Here are the problems:

Packages are fairly convenient to install, but take quite a bit of time at release time to update, rebuild, copy to all supported versions, and test.  Because of this, if we have a new feature or important bug fix that we want to roll out before the next release, we have only two choices: 1. hot-fix it on the server, make very sure that we apply the same fix to trunk, or 2. fix it in trunk, test, make a lava-foo-20YY.MM-1 release, repackage, install, etc.  Option 1 is a bit ugly, but fast.  Option 2 is really the right thing to do, but very time consuming.

Another thing we would really like to do is have the ability to host multiple "instances", such as a production and a staging instance.  Using packages, this isn't really possible.  Using VMs is an option of course, but there are downsides and it would consume a lot of extra resources.  Being able to deploy multiple instances is not only useful for production systems, but for development as well.  If you are working on multiple branches and want to test them separately, it's nice to have an easy way to do that.

Finally, as we look for ways to make LAVA more scalable, one of the things we are looking at is celery.  There are other libraries we need as well, so celery is just one of many, but one of the issues we have here is that there are no packages in the archive.  Sure, we could build a package of it and keep it in our PPA, but then we are maintaining that package in addition to all the other LAVA components.  And there will surely be others besides celery too.

As of yesterday, we are now deploying LAVA in the Linaro Validation Lab using a more flexible approach.  Basically, it involves python virtual environments, with separate tables for each instance, and each instance running under it's own userid.  Zygmunt and Michael in particular did a lot of hacking on most of the components to make them aware of the instances, and create upstart jobs that can start/stop/restart components based on the instance ID.  Instances can be assembled from a list of requirements that can pull from pypi, or even bzr branches.  There are even scripts (lp:lava-deploy-tool) to help with creation and setup of the instances.  The scripts even support backing up and restoring the data.

So what will become of the packages?  It was recently announced on the linaro-dev mailing list that we are phasing out packages, for at least the server components.  We feel like the new methods of deployment offer greater flexibility, stable deployment support as well as easy ways to update to the latest code, or even your own branches, and many other benefits.  Try it out and let us know what you think.

Monday, September 26, 2011

Stacks and stacks of PandaBoards

If you watch our scheduler at http://validation.linaro.org/lava-server/scheduler/ you may have noticed that even though we are increasing the number of continuous integration tests for Android and Linux kernel, the jobs are clearing out much more quickly in the past few days.  We've added infrastructure and boards and now have 24 PandaBoards in the Linaro Validation Farm!  We've also updated our rack design to more efficiently pack a lot more boards into less space, while keeping them accessible and serviceable.  Here'a picture Dave sent me, showing a bit of what he's put in place there.

We did hit a bit of a snag with one thing, and I anticipated this would be an issue quite a ways back.  We use linaro-media-create to construct the images in the same way anyone else using Linaro images would construct them, but running 30 of these in parallel will pretty much drag the server down to a crawl.  I did some quick tests of multiple processes running linaro-media-create locally and the completion time for every l-m-c process running in parallel increases significantly for each new process you add.  Combine this with lots of boards, lots of jobs, and other IO such as database hits, and it can take hours to complete just the image creation, which should only take minutes.  The long term solution is that we are looking at things like celery to distribute big tasks out to other systems.  In the short term, simply serializing the l-m-c processes results in a significant performance increase for all the jobs

Making LAVA more People-Friendly

One of the other new features of LAVA that's worth pointing out is a subtle, but significant step toward making it a little friendlier for those trying to find the results they are looking for.  Internally, LAVA uses things like SHA1 on the bundles, and UUIDs on the test runs to have a unique identifier that can be transferred between systems.  Previously, we displayed this as the name of the link.  If you're looking through a results stream and trying to find the test you just ran on the ubuntu-desktop image with the lt-mx5 hardware pack though, it's not very helpful.  You could, of course, go through the scheduler and link to the results there, but if you just wanted to browse the results in a bundle stream and look at ones that interest you, there was no easy way to do that.

Now, we use the job_name specified in the job you submit to the scheduler to give it a name. What you set the job_name field to, is entirely up to you.  It's all about helping it to mean something to the end user.  In the example above, the stream of results is for daily testing of hardware packs and images.  So the hwpack name, datestamp, image name, and image datestamp are simply used for the job_name.  Kernel CI results, Android CI results, and others will certainly have different names that mean more to them in their context.

Tuesday, September 20, 2011

Configuring LAVA Dispatcher

An important new change will be landing in the release of LAVA Dispatcher this week, and it should be good news to anyone currently deploying the dispatcher. Configuration for your board types and test devices will no longer be in python modules, but in configuration files that you can keep across upgrades.

First off, if you don't have a config, a default will be provided for you. You'll probably want to tell it more about your environment though. If you are configuring it for the whole system, you will probably want to put your configs under /etc/xdg/lava-dispatcher/. If you are developing locally on your machine, you may want to use ~/.config/lava-dispatcher/ instead. 

The main config file is lava-dispatcher.conf.  Here's an example:
#Main LAVA server IP in the boards farm

#Location for hosting rootfs/boot tarballs extracted from images
LAVA_IMAGE_TMPDIR = /var/www/images/tmp

#URL where LAVA_IMAGE_TMPDIR can be accessed remotely
#PWL - might not be needed
#LAVA_IMAGE_URL_DIR = /images/tmp
LAVA_IMAGE_URL = http://%(LAVA_SERVER_IP)s/images/tmp

#Default test result storage path
LAVA_RESULT_DIR = /lava/results

#Location for caching downloaded artifacts such as hwpacks and images
LAVA_CACHEDIR = /linaro/images/cache

# The url point to the version of lava-test to be install with pip
LAVA_TEST_URL = bzr+http://bazaar.launchpad.net/~linaro-validation/lava-test/trunk/#egg=lava-test

The big things to change here will be the LAVA_SERVER_IP, which should be set to the address where you are running the dispatcher, and the directories.  LAVA_TEST_URL, by default, will point at the lava-test in the trunk of our bzr branch.  This means you'll always get the latest, bleeding edge version.  If you don't like that, you can point it at a stable tarball, or even your own branch with custom modifications.

Next up is device-defaults.conf.  Look at the example under the lava_dispatcher/default-config branch, because it's a bit longer.  Fortunately, most of this can probably go unchanged. You'll want to specify things like the default network interface, command prompts, and client types here.  For most people using Linaro images, this can just remain as-is.

The part you will almost certainly want to customize is in the devices and device-types directories.  First, a device-type

boot_cmds = mmc init,
    mmc part 0,
    setenv bootcmd "'fatload mmc 0:3 0x80200000 uImage; fatload mmc
    0:3 0x81600000 uInitrd; bootm 0x80200000 0x81600000'",
    setenv bootargs "' console=tty0 console=ttyO2,115200n8
    root=LABEL=testrootfs rootwait ro earlyprintk fixrtc nocompcache
    vram=48M omapfb.vram=0:24M mem=456M@0x80000000 mem=512M@0xA0000000'",
type = panda

boot_cmds_android = mmc init,
    mmc part 0,
    setenv bootcmd "'fatload mmc 0:3 0x80200000 uImage;
    fatload mmc 0:3 0x81600000 uInitrd;
    bootm 0x80200000 0x81600000'",
    setenv bootargs "'console=tty0 console=ttyO2,115200n8
    rootwait rw earlyprintk fixrtc nocompcache vram=48M
    omapfb.vram=0:24M,1:24M mem=456M@0x80000000 mem=512M@0xA0000000
    init=/init androidboot.console=ttyO2'",
If you are using a pandaboard with Linaro images, you can probably just use this as it is.

Now to specify a device we want to test on:

device_type = panda
hostname = panda01
And that's it. You'll want one of those for each board you have, and a device-type config file for each type of device you have. Many thanks to David Schwarz and Michael Hudson-Doyle for pulling this important change together and getting it merged. Oh, and what else for this release? LOTS! But more than I want to include in a single post. I'll try to hit some of the highlights in other postings around the release though. Enjoy :)

Friday, August 12, 2011

Arm stacks

Here's a quick snapshot of a stack of 4 pandaboards that will be going into the new rack at the Linaro validation farm when it arrives. The current plan is to add 30 new pandaboards. We are not using a custom case at the moment, but just stacking them on a shelf so they are easy to access. To get better density, we will put 4 of these stacks on a single shelf, which will occupy about 4U of rack space. I spent some time thinking about the rack layout last week, and came up with a design that should give us about 96 development boards (assuming similar size) to a rack, including infrastructure such as switches, console servers, and control nodes.

Can we do better? Absolutely, but for now this is a good balance between density and accessibility. More pictures coming when the rack arrives and we get it installed.

Wednesday, July 27, 2011

New features in LAVA

I'd like to take a moment to highlight some of the new features that have been added to LAVA.

First off, the LAVA scheduler now has a basic UI. You can now see the status of boards, the current status of running jobs, and even click a link to see all jobs. You can't see this from the UI, but we also added support for submitting jobs for a device_type. So test jobs can now specify a target system to run on, or just a type.

Clicking on a job will show you more details about it. From here you can see things like start/finish time, the json text of the job that was submitted, and even see a stream of the live output from the job!

Bundles can now be viewed together. In LAVA, a bundle is the test results (this can come from more than one test suite), metadata, and attachments submitted to the dashboard. Bundles can be organized into streams - think of these as sort of like directories or containers that logically keep your bundles organized. Previously, you could click on a bundle stream and all you would see is a lot of individual test runs. Now what you see is a list of bundles. When you click on a bundle, you can see a summary of all test results in the bundle, with the number of passes and failures for each. The view of a bundle looks like this:

Clicking on the uuid for a specific test_run, such as posixtestsuite in this example, lets you drill down to even more detail. This view now lets you sort by any column, filter results, or change the number of rows to display per page. In practice, this means that it's really simple to alter the view so that you see just the results you care about.

Many many more good things are coming soon. The website will be getting an overhaul to highlight the big picture on the front page, and allow for categories of more interesting results (of course you'll still be able to get at the details like this by going to the dashboard). The Android team is starting to kick off tests of new images as soon as they are built, helping them with their continuous integration. It will soon be possible to inject new kernels and other components into an image before testing it. Want to know how much power is consumed while running a test? That's being worked on also. If you are interested in deploying LAVA yourself, let us know on IRC, or on the linaro-dev mailing list.

Thursday, July 21, 2011

LAVA 2011.07 is out!

The Linaro Validation team is pleased to announce the latest release of LAVA, for the 2011.07 milestone.

LAVA is the Linaro Automated Validation Architecture that Linaro is deploying to automate the testing of Linaro images and components on supported development boards.

One of the biggest changes you'll see this month, is the UI for the dashboard got an overhaul. You can now view entire bundles that were submitted, with the test runs organized underneath. You can also sort columns to easily see failures, filter large result tables, and change the number of items displayed per page. On the scheduler, we added a basic UI to let you see the status of boards and jobs, and also the ability to scheduler jobs by device type. The dispatcher has better error handling and preliminary support for Snowball boards added, and lava-test now streams results while the test is running. The list of bugs and blueprints that were completed for this release can be found here:

The release pages with release notes, highlights, changelogs, and downloads can be found at:

* lava-dashboard - https://launchpad.net/lava-dashboard/linaro-11.11/2011.07
* lava-dashboard-tool - https://launchpad.net/lava-dashboard-tool/linaro-11.11/2011.07
* lava-dispatcher - https://launchpad.net/lava-dispatcher/linaro-11.11/2011.07
* lava-scheduler - https://launchpad.net/lava-scheduler/linaro-11.11/2011.07
* lava-server - https://launchpad.net/lava-server/linaro-11.11/2011.07
* lava-test - https://launchpad.net/lava-test/linaro-11.11/2011.07
* lava-tool - https://launchpad.net/lava-tool/linaro-11.11/2011.07
* linaro-python-dashboard-bundle - https://launchpad.net/linaro-python-dashboard-bundle/linaro-11.11/2011.07
* linaro-django-xmlrpc - https://launchpad.net/linaro-django-xmlrpc/+milestone/2011.07

For more information about installing, running, and developing on LAVA, see: https://wiki.linaro.org/Platform/Validation/LAVA/Documentation

To get a preview of what's coming next month take a look at: https://launchpad.net/lava/+milestone/2011.08
We have some good things coming soon, such as out-of-tree test support in lava-test, subscription to be notified of test results, improvements in the scheduler UI, and the website will be getting a facelift to give a make current testing and results more visible.

Updated packages will be available from the linaro-validation ppa soon.

Friday, July 01, 2011

LAVA 2011.06 Released!

The Linaro Validation team is pleased to announce the first full release of LAVA, for the 2011.06 milestone.

LAVA is the Linaro Automated Validation Architecture that Linaro is deploying to automate the testing of Linaro images and components on supported development boards.

The release pages with release notes, highlights, changelogs, and downloads can be found at:

  • lava-dashboard - https://launchpad.net/lava-dashboard/linaro-11.11/2011.06
  • lava-dashboard-tool - https://launchpad.net/lava-dashboard-tool/linaro-11.11/2011.06
  • lava-dispatcher - https://launchpad.net/lava-dispatcher/linaro-11.11/2011.06
  • lava-scheduler - https://launchpad.net/lava-scheduler/linaro-11.11/2011.06
  • lava-scheduler-tool - https://launchpad.net/lava-scheduler-tool/linaro-11.11/2011.06
  • lava-server - https://launchpad.net/lava-server/linaro-11.11/2011.06
  • lava-test - https://launchpad.net/lava-test/linaro-11.11/2011.06
  • lava-tool - https://launchpad.net/lava-tool/linaro-11.11/2011.06
  • linaro-python-dashboard-bundle - https://launchpad.net/linaro-python-dashboard-bundle/linaro-11.11/2011.06
  • linaro-django-xmlrpc - https://launchpad.net/linaro-django-xmlrpc/+milestone/2011.06

    For more information about installing, running, and developing on LAVA, see: https://wiki.linaro.org/Platform/Validation/LAVA/Documentation
  • Wednesday, June 01, 2011

    LAVA Project Changes

    If you take a look at http://launchpad.net/lava you'll see some structural changes are afoot:
    LAVA is now a project group
    lava-server - the core server components
    lava-dashboard - the results dashboard (was launch-control)
    lava-scheduler - the lava scheduler
    lava-dispatcher - the dispatcher
    lava-tool - the core pieces of the command line interface
    lava-test - (coming soon) the test execution framework

    All the Linaro validation tools are now going to be consolidated under the LAVA project group in launchpad. If you are already deploying and experimenting with LAVA, don't worry, there will be some instructions coming soon (and packages too) for installing the latest versions. This is laying the groundwork for the development that will take place over the next few months. More on that later. :)

    Monday, April 25, 2011

    ELC 2011

    I attended the Embedded Linux Conference in San Francisco for the first time this year, and have to say it's one of the better conferences I've been to. There were good sessions, interesting demos, and best of all, lots of people working in embedded Linux to talk to.

    For my part, I gave an overview of the Linaro Automated Validation Architecture that we are working on. If you're interested, you can find the slides to that, and many other talks here.

    Monday, March 14, 2011

    Max partitions on mmc

    One thing my team has been working on recently is a test automation framework for ARM systems running Linaro images. Since Linaro is also now looking at providing Android based images, the scope increased slightly, and we need a way to cover those as well. My original though for doing this was to have what we call a "master image", basically the first 2 partitions on the SD card which would house a boot and rootfs partition for a stable image, followed by a testboot and testroot partition that could be reformatted and reinstalled at will. This works pretty nicely for Linaro images, but our android images require several more partitions. So... rethinking this, we should be able to accommodate both:
    mmcblk0p1 - boot
    mmcblk0p2 - root
    mmcblk0p3 - androidsystem
    mmcblk0p4 [extended]
    mmcblk0p5 - androidcache
    mmcblk0p6 - androiddata
    mmcblk0p7 - androidsdcard / testrootfs (this could be the bulk of remaining space used by both, depending on which image type we are booting)
    mmcblk0p8 - testboot

    When I actually tried to implement this, I hit a wall at the 8th partition. Only the first 7 show up under /dev. A little digging in the mmc driver reveals that MMC_BLOCK_MINORS is set to default of 8 (1 for the whole device + 7 partitions). So without changing this, by default we can only have 7 partitions on an sd card.

    We could probably push this change in the Linaro kernels, but why is it set so low to begin with? Furthermore, will changing it help at all. If uboot does't like it (haven't checked yet) then it doesn't really matter.

    Long term, I think the approach will be different. However this requires special hardware. We've been talking about a new approach that would allow us to have a dual mmc interface. The board would boot from the first mmc, load an initramfs, then switch over to the other mmc card. This would be the master image boot, and since we would be running from a ramfs, we could have full access to the second mmc to repartition and destroy it however we like. Then boot into a test image on the second mmc. I think this is where we want to be eventually, but is a ways off still. Now we need to find a way around this silly partition limitation if we want to support android in the short term.