September 14, 2015


OSL GSOC 2015-Oregon's Catch

by Evan Tschuy

This summer the Open Source Lab had three students from around the world working on open source software through Google Summer of Code. The OSL has participated in GSoC for nine years, and each year has had its own unique challenges and successes.

I had an opportunity to work with a student, Chaitanya, on What's Fresh, a project I originally developed last summer at the OSL for Oregon Sea Grant. With What's Fresh (which Sea Grant is planning to brand as Oregon's Catch), Sea Grant wanted to allow visitors to the Oregon coast to be able to find fresh fish available from fishermen, and had CASS, the new organization the OSL is a part of, develop the app and backend. Chaitanya worked on the backend, making data entry easier. It now has several important features, like easier location entry, search, and inline forms so users don't need to leave the page to add related items. It is also now themeable, so other organizations can use easily set up a customized version for their area.

It was initially slow-going as we got more familiar with working with each other and as he got comfortable working on the project. Since Chaitanya was more familiar with Python and Django than Javascript, it took a while for things to start coalescing. However, at the end of the summer, we're both proud of what's been accomplished and the features added to the project. It was exciting to see Chaitanya's skills grow, and to myself feel more comfortable in a mentorship role. We're going to deploy the improved version of the backend after one more round of code review.

This year, the Open Source Lab will have the opportunity to send one person to Google's annual Mentorship Summit. We look forward to seeing other mentors there!




by Elsie Phillips at September 14, 2015 06:14 PM

September 12, 2015


OSL GSOC 2015-Protein Geometry Database

by Elijah Voigt

What is the Protein Geometry Database?

The Protein Geometry Database project (PGD) is many things to many people.

The synopses on says:

"Protein Geometry Database is a specialized search engine for protein geometry. It allows you to explore either protein conformation or protein covalent geometry or the correlations between protein conformation and bond angles and lengths."

There's a lot of science in that paragraph; I speak code much better than I speak science, so let's look at the Github Repository. That page says things like...

It also describes the code as being:

  • 59.2% Python,
  • 27.2% HTML,
  • 12.4% JavaScript, and
  • 1.2% Other

Depending on what you use PGD for (if you use it at all) you have a different relationship with the project. What matters here is that PGD is a project that the OSL develops and maintains. This year a lot of great work was done on it for the 2015 Google Summer of Code.

What PGD Accomplished During GSOC 2015

This year's PGD GSOC project had five core goals, all of which got accomplished.

  1. Revamping the current account system.
  2. Building occupancy awareness into PGD.
  3. Testing the current development branch of PGD.
  4. Implementing a search by deposition date filter.
  5. Upgrading PGD to Django 1.8 (from Django 1.6!)


The student for this project was S. Ramana Subramanyam. He is in his second year at the Birla Institute of Technology and Science in Goa, India, and was wonderful to work with. Despite a 12 hour time difference he was able to be productive the majority of the time.

Although none of the code developed for this year's GSOC has been merged into PGD, it has all been reviewed and will be merged over the next few months as the project lead (Jack Twilley) and I are able to work together on migrating the changes.

Overcoming Challenges

The largest challenges we were faced with in this project were scheduling.

The PGD Project Lead (Jack) got an amazing internship for his Food Science degree in California at a vineyard; as a result he was unable to work on PGD and his GSOC mentorship as much as was initially expected. While I was able to answer (or at least help with) many of the questions S. Ramana had, sometimes we were forced to throw up our hands, send an email to Jack, and wait.

This didn't stop S. Ramana from completing all of his goals for the GSOC project; there was always plenty to do so he could put one thing on the back-burner and focus on a new task. At the most it was a mild inconvenience but didn't get in the way too often.

Where PGD Stands

Once the code is merged and the inevitable version control conflicts are resolved, PGD will have some pretty neat new features:

  1. Search results can be saved.
  2. Search results can be saved as a PNG image.
  3. Occupancy Awareness.
  4. Deposition Date is now a search Filter.
  5. PGD is pgraded to Django 1.8.

It took a lot of energy not to add ! to the end of each of those items.

Despite scheduling conflicts and the usual technical snafus that come with major engineering changes, I would say that this GSOC was a success for PGD and the OSL.

Personal Takeaways

This was my first time mentoring a student for GSOC and although I have had limited experience mentoring students with Devops Bootcamp, mentoring a student remotely with a 12 hour time difference is an entirely different can of worms.

My mentorship abilities were challenged but I learned a lot of great skills and added many tools to my belt when it comes to dealing with problems and knowing when/who to ask for help. If I am given the opportunity to be a GSOC mentor next year I will definitely jump on the opportunity to do so.



by Elsie Phillips at September 12, 2015 02:18 AM

September 04, 2015

Piotr Banaszkiewicz

AMY update #7

With a week of delay (3 weeks after v0.8 release), I finally released AMY v0.9 today.

The list of changes for this release doesn’t contain one big thing I was hoping for, but it’s still decent.

This is also the last release that (partly) took place during Google Summer of Code 2015.

Google Summer of Code 2015

I’d really like to thank everyone who helped me during this Summer:

  • Raniere Silva for keeping eye on the participants’ reminders and for so many friendly chats over these months
  • Greg Wilson for mentoring me and for lots of enjoyable meetings
    1. Trevor King for introducing me to some very advanced git methods
  • my application’s users: Amy, Giacomo, Maneesha, Tracy and others, for providing me with excellent feedback.

This was incredible Summer; I learned a lot and had so much fun. That’s great to see that we’ll release AMY v1.0 soon as a wrap up of 4 or 5 months of work.

Bug fixes

Okay, back to the change log.

  • Some workshop URLs weren’t translated between repository and website versions. This is now fixed and every workshop with one of these links will automatically display the other link as well.
  • It happened that users put whole links into Host.domain (in the form https://domain.tld). This caused errors in many places where we wanted to display link to that Host. The issue is now fixed: users aren’t allowed to put protocols or trailing slashes into Host.domain.
  • Some management commands were broken due to the fact that we switched underlying methods to use API endpoints, but we didn’t change the commands themselves.


This release was focused on getting permissions sorted out and adding read-only access to people from Software-Carpentry Foundation Steering Committee.

We had to cut out the another exciting feature: workshop requests, profile update requests. The Pull Request for this feature is very big and we’ll release it in v0.9.5 somewhere in the next week.

by Piotr Banaszkiewicz ( at September 04, 2015 10:00 PM

August 14, 2015

Piotr Banaszkiewicz

AMY update #6

After another two weeks, AMY v0.8 was released today.

The list of changes for this release is really small.

Bug fixes

  • Some workshops that didn’t provide a list of instructors or helpers were erroring out during Import from URL. That was fixed in a v0.7.1 in between release.
  • Some of location fields (address or venue) and contact field were too short for some events. Their length was bumped to 255 chars.


This release was focused on integrating main Software-Carpentry website and AMY.

On the AMY side, I implemented a basic REST API (read-only).

Changes to the website are still WIP: this is a big project and I still don’t get "the whole image" clearly, so the development is slower.

Plans for v0.9

Nothing for now, except triaged issues.

by Piotr Banaszkiewicz ( at August 14, 2015 10:00 PM

August 07, 2015


Mysql1-vip Outage Post-Mortem



On July 15th we ran into a number of issues with replication on mysql2 on a couple of session tables. This caused replication to be paused, and a large number of statements had to be skipped. Replication was restarted successfully. On July 16th some more issues with the same tables were encountered, but in far greater number. A ticket was created to track the issue. Replication was restarted several times, but on the week of the 20th a decision was made to entirely reload mysql2 and examine some alternative replication methods (primarily row-based replication).

Our servers, mysql1 and mysql2, are running mysql 5.5. While documentation and tribal knowledge claimed a master-slave replication set-up, they were configured as master-master replication.

What Happened

On July 30th a decision was made to reload mysql2 at 4:00PM PDT to fix replication errors. Slave replication was intentionally stopped. Databases were dropped one at a time on mysql2 with a small delay between each drop.

As mentioned previously, mysql1 and mysql2 were unexpectedly set up in master-master replication configuration. Therefore, though slave replication on mysql2 was stopped,  mysql2 was still sending commands to mysql1. This caused databases to be dropped on both machines. Thanks to the script delays we realized after a few minutes that mysql1 was dropping databases and the script was stopped. We then immediately started working to restore the databases.

Why restores took so long

As demand for the mysql cluster has grown, our backup strategy has shifted to be optimized to save disk space, our greatest resource bottleneck. This has been a worthwhile tradeoff in the past, as we have rarely had to do full restores. We use mysql-zrm to back up mysql with heavy compression. Because of this strategy, restores were largely CPU-bound instead of IO-bound.

We also discovered we had a couple of databases that had issues restoring due to indexing and foreign keys. Each time one of these failed, we had to parse the entire backup file (around 200GB), and pull out the bad database to restore separately, and then pull out the rest of the unrestored databases.

A further complication was that our backups were pointed at mysql2, which was out-of-date with mysql1, due to the initial synchronization failures. Fortunately, we had the binary logs from the 17th through the 30th. This means that though most data could be restored, some data from between the 15th and the 17th was lost.

These three factors combined meant a much slower, and much more labor-intensive restore process than we had anticipated.

Looking Forward

We learned a lot of important lessons from this outage, both related to how we run our mysql cluster, as well as how we plan and manage resources at the OSL in general.

Most immediately, some of the most important changes we will implement for the mysql service over the next month or two include:

  1. Evaluating better replication strategies to mitigate the initial cause, including row-based replication

  2. Storing binlogs as a backup on a separate server.

  3. Doing backups using Percona XtraBackup, allowing for much faster full restores

  4. Using mydumper rather than mysql-zrm to improve the speed of our logical backups

  5. Work on our documentation and training for our complex systems, including

    1. Regularly testing full restores as part of our backup process on a spare server

    2. Gather more accurate ETAs for the restoration process

    3. Regularly audit the databases we host -- Multiple test and ballooning databases (100GB+) seriously delayed the restore process

  6. Migrate to a bigger, more powerful mysql cluster (already planned before this outage)

In terms of the bigger picture, we recognize that we need to change how the lab plans, monitors, and manages resources and projects. Despite our best efforts, the backlog of hosting requests to the OSL continues to grow. We have, over the years, worked hard to stretch our resources to provide services to as many projects as we can. This has always come with tradeoffs, such as the compression of backups to maximize disk use, and less redundancy than we would have wished.

We have for a while been concerned about how thinly resources have been stretched, and have been working on a set of policy changes, as well as raising funds to reinvest in the lab. Some of you may have heard our staff talk about this plan -- we hope to talk to a lot more of you about this in the near future. Our new FTP cluster, perhaps one of our most neglected pieces of infrastructure, was an important first step in this renewal.

Over the next few months, the OSL will be looking at a number of different services and policies, including:

  1. Instituting a policy and mechanisms for better keeping the community informed

    1. Of outages, maintenance, etc.

    2. Of resource use & warning signs (dashboards)

  2. Identifying and redesigning “core” services, including

    1. Defining and monitoring capacity limits

    2. Implementing redundancy and restore practices, including staff drills

    3. Migrating more of these services to Chef

    4. Instituting periodic review of documentation, policies and performance metrics

    5. Finding better ways of leveraging community expertise to supplement our own

  3. Raising funds to refresh our most aging infrastructure, and catch up on the worst of our technical debt.

We want to thank you for your patience and support during this outage and over the years we have served you. We realize that the length of this outage, and the lack of progress reports was unacceptable, and we want you to know that we are taking steps to reduce both the likelihood and the impact of future outages.

by jordane at August 07, 2015 09:13 PM

July 29, 2015

Piotr Banaszkiewicz

AMY update #5

AMY v0.7 was released today. You’ll find list of changes below.

This release contains surprisingly few changes, and the development seemed slowed a little, but that’s not a bad thing: Greg’s moving to Europe with his family and I’m going from time to time on 2-day trips (Wrocław tomorrow)

Bug fixes

This time I start with bug fixes.

  • Check if event’s starting date is earlier than it’s ending date.
  • Ensure event’s administrative fee and event’s attendance are both non-negative numbers.

Theme problem

This release was themed "fixing host/site/organizer representation in the database", and I claim success!

The change was rather significant and required me confirming ideas with Software-Carpentry administrators and Steering Committee.

Rename "Site" to "Host"
Previously event’s host was incorrectly named "site" (ie. location).
Rename "organizer" to "administrator"
Event.organizer field was mostly unused and no-one knew it’s real purpose. By changing it to "administrator" we now have a place for our administrators to mark events they’re working on.
Filter by administrators on the "All Events" page
We can filter by Host, we can now also by administrator.
More location fields for Event
Workshops that have a public website (most of our workshops does) contain standardized location data we didn’t previously collect. This was required by some other features.
contact field for Event
Additional field we didn’t collect in the past.

New features

Faster testing
I switched the Travis-CI server to use fast Linux Containers. Testing time dropped from 1.5min to 45s.
Enhanced filtering
It’s possible to select multiple countries and preferable gender on "Find Instructors" page.
Lookup instructors closest to the event’s location
From event’s details page administrators can quickly go to "Find Instructors" and search by latitude/longitude.
"Update from URL" functionality
This works in a similar way to "Import from URL" from v0.6.

Plans for v0.8

Greg wants v0.8 to store demands for Instructor Training in AMY.

by Piotr Banaszkiewicz ( at July 29, 2015 10:00 PM

July 16, 2015

Ben Kero

Goodbye Mozilla

It is with a heavy heart that I’m announcing the resignation of my position at Mozilla. Last month marked my 5th year here, and over that time I’ve met some of the most intelligent and driven people in the world. I’m proud to have known you and worked alongside you these years.

I am leaving my responsibilities in the capable hands of my teammates. Although I will no longer be here, the work will still get done.

I’d like to thank all of you who helped me along the way. In particular, the release engineering team for introducing me to the reality of operations at an impressive scale. I’d also like to thank IT for teaching me how large of a scope an org can have, and for civilizing this operations cowboy. I also owe a great appreciation and shout-out to my teammates in Developer Services (especially fubar and hwine) who have had my back through some rough outages.

Lastly, I’d like to thank my managers for giving me direction and always keeping me on course:

Justin Fitzhugh
Matthew Zeier
Phong Tran
Corey Shields
Shyam Mani
Jake Maul
Laura Thomson
Lawrence Mandel

Post-Mozilla I’ll be moving on to other software development and operations work. Since free software is one of my passions you’ll certainly see me around. If you’re curious as to what I’m up to next feel free to send me a private message.

Feel free to reach out to me on IRC, Facebook, Twitter, or in meatspace. If you see me at a conference, don’t hesitate to come say hello. My personal email address is

My last day will be Friday (2015-07-17).

Thank you,

Ben Kero
Senior Systems Administrator, Developer Services

by bkero at July 16, 2015 10:23 PM

Piotr Banaszkiewicz

AMY update #4

Today is the deadline and release day for AMY v0.6.

Here’s what’s new and what’s changed in this release.

In-between releases

I was forced to add some hotfixes in v0.5.1 and v0.5.2.

v0.5.1 included a fix for the name of one of our lessons (was ‘dc/spreadsheet’, but it should be ‘dc/spreadsheets’).

v0.5.2 included a fix for the round-off error that was crashing AMY. If you’re more interested on the story, read this commit message.

New features

Deletion of tasks
Tasks can now be deleted from their details page.
Eventbrite link
Event details page will show link to the Eventbrite’s event page instead of sole key ID.
Quick link for new airport page
Accessible from "+" dropdown menu.
Auto-fill end-date on event form
Setting a start-date for an event will set-up an end-date (+1 day) automatically (most of our workshops lasts for 2 days). The auto-fill won’t happen if user put something into end-date field.
2-column layout on person-details page
First column contains awards and tasks, while the second column contains knowledge domains and lessons person can teach.
Facelifted Find Instructors page
Full-width layout, filters in sidebar, no setting of "wanted" instructors. Using GET for filters application instead of POST as previously.
Ability to add tasks for person from their edit page
Now admins can not only assign awards to specific persons from their edit page, but also tasks.
Tabs on person edit page
Because of using 3 forms and 2 listings on the person edit page, the split was necessary. The first tab contains original person edit form, the second tab contains awards and award-form, the third one contains tasks and task-form.
Facelifted bulk-upload
Another switch to full-width layout. Confirmation page for bulk-upload will now show what happens with specific entries ("will be created" or "it’s already in the database" kind of thing).
Lessons sorted by default
Lessons will appear in alphabetical order in Find Instructors filter and also in "lessons" column on that same page.
Importing event from URL
Admins can pre-fill event create form with data from workshop page.
Improved "failed to delete" page
Thanks to one small discovery, a big and ugly chunk of code was replaced with almost a one-liner.


Display of tasks
Tasks now show links to events and persons.
Eventbrite event ID name change
Previously it was ‘reg key’, now it’s clearer.
Better fields in airport form
"Airport name" instead of "Fullname", "IATA code" with link in the "help_text" instead of "Iata".
Better list of tasks on person-details page
These tasks now contain links to related events and clearly indicate roles a person has.
Searching for "firstname lastname"
Searching for "Piotr Banaszkiewicz" now works!
Enforced uniqueness for event slugs
Having two events with the same slug resulted in a crash, so we decided to enforce uniqueness for event slugs. This not only prevents the bug from hunting us, but also prevents admins from adding events with slugs that are already in the database.
Awards will prevent events from deletion
If there are some awards that point to specific event, that event will not be deleted.

Plans for v0.7

There are two things that I want to work on for v0.7 release.

  1. I want to add develop branch and set it as the main branch for the repo.
  2. I want to fix host/site/organizer representation in the database.

The rationale for 1. is that I had to make two minor releases to v0.5 when we already had features for v0.6 merged to the master. If we had only kept stable releases in the master (git-flow), I’d not have that problem.

by Piotr Banaszkiewicz ( at July 16, 2015 10:00 PM

June 17, 2015


Write the Docs '15

by Elijah Voigt

The day is May 18. The location is the Portland's Crystal Ballroom. The conference is Write the Docs (WtD). Excitement and anticipation fill the air as we collectively munch on breakfast foods and find a seat. The keynote begins and immediately sets the mood: docs are fun, docs are interesting, and here's how you can make your docs awesome.


WtD was quite the experience and it got me excited about documentation, something I admit I never expected to be all that excited about. At times it felt like a support group for non-technical individuals that work with engineers, other times it felt like a storyteller sharing with us their adventure in documenting some massive project, and most importantly it was always engaging and interesting. Some of my most memorable talks were of Twillio's efforts to make their documentation better, GitHub's workflow of writing docs for GitHub with GitHub, and Google's new documentation tool and how it was developed and adopted in a grass roots effort as opposed to a top-down corporate approach. I even gave a Lightning Talk on "How to Write the Best Email You've Never Written... Until Now" which went over very well and seemed to speak to a lot of people.

Inspired by this awesome conference, we have have started a massive overhaul on our documentation including writing official style guides, overhauling the new hire onboarding docs, and updating our wiki. With the new hire documentation we have taken into account lessons learned from the conference, like how we should make docs fun to read in addition to informational; this shift has resulted in our 'Gamified New Hire Docs' rewrite, which essentially gamifies the onboarding process to be more fun. Once one of the new student employees passes a milestone, like submitting their first GitHub Pull Request, they get a reward badge (e.g., a gold star sticker). It might not seem like much, but this is way better than slogging through a daunting pile of docs as one starts a new job.

by Anonymous at June 17, 2015 09:03 PM

May 12, 2015

Russell Haering

Next Adventure: ScaleFT

In 2008 I stumbled across the opportunity to work as a sysadmin at the OSU Open Source Lab. When I started there I didn't have much experience with internet infrastructure, but it quickly became a passion of mine and inspired a mission that has had a profound influence on my life. My Twitter profile has a (necessarily) succinct summary of that mission:

Building infrastructure that makes the internet more usable to more people.

I've had a great time pursuing this mission at Cloudkick, and at Rackspace after we were acquired in December of 2010. I've met countless great people and learned a ton from them. I've worked with (and on) a bunch of great teams that are doing great work and furthering this mission more than I ever could alone.

But its time for the next step in my mission. Yesterday some good friends and I announced our new company, ScaleFT.

At ScaleFT we're focusing on improving how teams use infrastructure and working to make those interactions more collaborative and ultimately easier, safer and more fun. Tools like GitHub have proven the power of collaboration when applied to writing code. We're going to bring that same power to interactions with infrastructure.

Time to get hacking.

by Russell Haering at May 12, 2015 06:54 PM

May 06, 2015

Lars Lohn

History as a Birthday Present

On my 55thbirthday, I received an unusual gift: an armchair history adventure similar to a PBS episode of the History Detectives. It started with the gift of three old photographs acquired from a local thrift store. 

Written on the back of one of the images was a cryptic “T+T RR ACME MINE 1915”. I wondered if I could find some context for this crash that would explain the strange ordering of the train: two box cars, the engine and then two more box cars. 

I started my research by first identifying the name of the railroad, “T+T RR” stands for Tonopah and Tidewater Railroad, a short line railroad built in 1905 by the Pacific Coast Borax Company to transport borax from the local mines. It was roughly two hundred miles long, stretching from Ludow, CA to Beatty, NV. The rail line was abandoned in 1940. 

Wanting to see the location of the crash on Google Earth, I then searched for “ACME Mine” in Google, only to find that ACME is a rather common name. However, since I knew that the Tonopah and Tidewater Railroad ran through the Mojave desert, I examined links in the Google search results that also mentioned California. I focused closely at an entry for the “Amargosa Mine (ACME Mine)” on Looking at some of the icons of nearby locations, I found reference to a place called China Ranch. In Google Maps, I was able to find “China Ranch Date Farm and Bakery”.

That was the key to finding the location in Google Earth.  I entered search for the bakery and then scrolled down the canyon and found remains of an old railroad grade:

 Reorienting the perspective so as to look at the hillside: 

I see direct and obvious connections with the landscape in the original photographs.  I found the the location of the crash.  

The question that I wanted answered was how did the crash happen.  Looking over the terrain, I can see nothing the helps explain the ordering of the cars in the train.  

Armed with some new search terms, I entered “China Ranch” “T&T” and found the Google scan of a book called “Railroads of Nevada and Eastern California: The southern roads”. I found this paragraph:
... some new traffic was generated through construction of the so-called ACME spur from Morrison (later ACME) to the main line in Amargosa Canyon ... The tracks ran past China Ranch and through a picturesque canyon to a gypsum deposit at Acme. No particular importance attaches to the line, unless it be recalled that it was on this branch that two cars got away from an engineer and coasted all the way to the junction, resulting in a bad wreck
Further perusal of the scanned book, I found unattributed reproductions of two of my three photos. Captioning linked the photos to the derailment in the quoted paragraph above.  So I found a reference to the derailment, but no details. 

Browsing lead to a cache of images of the Tonopah and Tidewater Railroad called the “Henrick Collection”, I noticed a flaw in my search strategy. I kept using the term “derailment” and it seemed the term “wreck” was more common in that era. The simple search “T&T 1915 Wreck” yielded the gold that I needed: a newspaper clipping from the “Tonapah Daily Bonanza” with an account from a participant in the rescue and clean up.

It turned out that this was a pretty dramatic crash: two runaway box cars were being chased by an engine towing two other box cars. On finally catching and coupling, they were going too fast for the curve and rolled. It was, perhaps, a good thing they rolled, as the story implies, a passenger rail car with a complement of passengers waited on the track nearby or below.  One person, the fireman, died from injuries suffered from bailing out at the wrong place. The engineer was “frightfully maimed and burned”, but recovered within a year and returned to work.  The locomotive, #9 survived, was righted and labored in the heat on the Mojave rails for another twenty-five years.

That explains the odd ordering of the cars with engine in the middle.  My original question had been answered.

Of course, this spawns more questions.  Are these photos of the era or later generation reproductions? How did they end up in a Humane Society Thrift Store in Corvallis, OR?  These questions are for a future history expedition.

I've got to say, Paul, this was a brilliant birthday present.  It wasn't just a thing, it was a wonderful mystery to solve.  It demonstrates the power of the open Web. I'm grateful to live in a time where such a research project is achievable from my home desk in mere hours.  

by K Lars Lohn ( at May 06, 2015 01:26 PM

March 16, 2015

Beaver BarCamp

Beaver Barcamp: Now with More Lightning Talks!

This year we will be introducing Lightning talks to Beaver Barcamp! A lightning talk is a five-minute presentation on any given topic; it's basically just a shorter version of the usual barcamp talk. Instead of a keynote, our first session will be all lightning talks. You can come early and propose a topic to give a lightning talk on, or vote on other topics that you want to hear about. The most popular proposals will be chosen to give their presentations. If you have any questions about this format, please email us at info<at> We look forward to seeing you at Beaver Barcamp 15!

by OSU Open Source Lab at March 16, 2015 07:00 AM

February 10, 2015

Ben Kero

Size of mozilla-central compared

As part of my ongoing work I’ve been measuring the size and depth of mozilla-central to extrapolate future repository size for scaling purposes. Part of this was figuring out some details such as average file size, distribution of types of files, and on-disk working copy size versus repository size.

When I posted a graph comparing the size of the mozilla-central repository by Firefox version my colleague gszorc was quick to point out that the 4k blocksize of the filesystem meant that the on-disk size of a working copy might not accurately reflect the true size of the repository. I considered this and compared the working copy size (with blocksize =1) to the typical 4k blocksize. This is the result.

Mozilla-central blocksize comparison


As you can see the repository size is much smaller — about 72%. As of Firefox 5 the ratio of working copy size was about 73%. This went on a general downward trend to about 71% as of Firefox 38.

What this could mean is that 27-29% of files in the mozilla-central repository are below the 4 kilobytes in size. Most likely what it means is that 27-29% of the space used in a working copy of mozilla-central is padding smaller files until they are 4k in size, which roughly matches what I’ve found by calculating average file size in the repository.

Excluding some large binary files that are in the repository, the mean file size is 6306 bytes. This if offset by some very large source code files:

  • 4.7M ./security/nss/lib/sqlite/sqlite3.c
  • 4.8M ./js/src/octane/mandreel.js
  • 5.3M ./db/sqlite3/src/sqlite3.c
  • 8.6M ./js/src/jit-test/lib/mandelbrot-results.js

However, if we look at median filesize we come up to something much more plausible: 1173 bytes.

Here is the new working copy size in comparison with the source lines of code count from the original chart:

Working copy size (bs=1) vs SLOC

From this we can see a general upward trend in the amount of space used versus source line count. This can mean one of two things: more binary assets are being added compared to the amount of code added, or that more files below 4k in size are being added to the repository.

by bkero at February 10, 2015 04:50 PM

February 06, 2015

Ben Kero

Trends in Mozilla’s central codebase

UPDATE: By popular demand I’ve added numbers for beta, aurora, and m-c tip

As part of my recent duties I’ve been looking at trends in Mozilla’s monolithic source code repository mozilla-central. As we’re investigating growth patterns and scalability I thought it would be useful to get metrics about the size of the repositories over time, and in what ways it changes.

It should be noted that the sizes are for all of mozilla-central, which is Firefox and several other Mozilla products. I chose Firefox versions as they are useful historical points. As of this posting (2015-02-06) version 36 is Beta, 37 is Aurora, and 38 is tip of mozilla-central.

Source lines of code and repo size

UPDATE: This newly generated graph shows that there was a sharp increase in the amount of code around v15 without a similarly sharp rise of working copy size. As this size was calculated with ‘du’, it will not count hardlinked files twice. Perhaps the size of the source code files is insignificant compared to other binaries in the repository. The recent (v34 to v35) increase in working copy size could be due to added assets for the developer edition (thanks hwine!)

My teammate Gregory Szorc has reminded me that since this size is based off a working copy, it is not necessarily accurate as stored in Mercurial. Since most of our files are under 4k bytes they will use up more space (4k) when in a working copy.

From this we can see a few things. The line count scales linearly with the size of a working copy. Except at the beginning, where it was about half the ratio until about Firefox version 18. I haven’t investigated why this is, although my initial suspicion is that it might be caused by there being more image glyphs or other binary data compared to the amount of source code.

Also interesting is that Firefox 5 is about 3.4 million lines of code while Firefox 35 is almost exactly 6.6 million lines. That’s almost a doubling in the amount of source code comprising mozilla-central. For reference, Firefox 5 was released around 2011/06/21 and Firefox 35 was released on 1/13/2015. That’s about two and a half years of development to double the codebase.

If I had graphed back to Firefox 1.5 I am confident that we would see an increasing rate at which code is being committed. You can almost begin to see it by comparing the difference between v5 and v15 to v20 and v30.

I’d like to continue my research into how the code is evolving, where exactly the large size growth came from between v34 and v35, and some other interesting statistics about how individual files evolve in terms of size, additions/removals per version, and which areas show the greatest change between versions.

If you’re interested in the raw data collected to make this graph, feel free to take a look at this spreadsheet.

The source lines of code count was generated using David A. Wheeler’s SLOCCount.

by bkero at February 06, 2015 04:56 PM

February 04, 2015

Ben Kero

My Gear Post


Whenever I encounter people as I travel, they are often curious about my luggage. It seems to be invisible. They’ll often ask where my bag is, assuming that it must have gotten lost in transit. Their eyes go wide and confusion sets in when I tell them that the bag on my back is the only one.

It is my estimation that at least some people would be curious about what gear I travel with. They ask how I’m able to pack all the necessities into such a small space. There is no great secret to traveling light. All it takes is a little research and compromise in creature comforts. If you have browsed the postings of other nomadic hackers, there might be little to be gleaned from this post. Here’s a basic rundown, with almost each article deserving its own article.

It should go without saying that nobody paid for me to write this post, and likewise nobody as sent me any products to test.

Tom Bihn Synapse 19L backpack. This backpack is the key to fitting everything. Unlike conventional backpacks, the pockets in this one intrude less on the space of other pockets. The pack is lightweight, durable, and has the incognito look of your grade school Jansport backpack. Although it lacks a laptop pocket, instead it has a system of hooks and rails to keep a separate neoprene case in place. I found Bihn’s neoprene cases to be far too bulky for serious use. Instead, there is a small elastic sub-pocket in the main pack cavity that can be used instead. The pack has a central water bottle pocket, which is ideal for a collapsible water bottle. Their small clear document organizer pouch is the perfect size for passports, travel docs, and accumulating receipts. The under-bag storage is ideal for keeping all my clothes and a pair of sandals.

Lenovo Thinkpad X240Lenovo must have made a deal with the devil to get a device with this much battery life. It has an internal 24Wh battery and an external 3-cell or 6-cell 72Wh battery at the back. I often get a marathon day of heavy usage, or two days of lighter usage without charging. This model is a new re-design from Lenovo. Unfortunately for this model, Lenovo decided to remove the Trackpoint mouse buttons. This made a lot of Thinkpad users grumpy, so Lenovo added the buttons back to the X250.  Linux runs swimmingly on this laptop with almost everything working out-of-the-box.

Etymotic hf-2 Canalphones. These canalphones offer excellent  sound reproduction, and have an integrated microphone. The microphone also has a button for controlling playback on an Android device. The noise insulation of these canalphones is close dedicated earplugs. They come with several different tips to find the best fitment. Although they can be uncomfortable at first, they are unnoticeable after finding the ideal tips. Custom ear-molded ones are also available.

Nook Simple Touch E-Book Reader. This is what I bring instead a tablet. I like the read-anywhere e-ink screen, open source hackability, and long battery life. Left suspended this device can last for months on a single battery charge. It can be hacked to run other Android apps, including a VNC server to tether to the tablet as a second screen. Unlike a Kindle I don’t need an account anywhere, and it can read EPUB files.

Dr Bronner’s Magic Hippie Peppermint SoapMy name for these products aside, they do an excellent job at cleaning anything. This soap is more concentrated than regular liquid soap. The result is that a single squirt is enough for a complete shower. Likewise, a small dab will also do an entire load of sink laundry. I’ve also been meaning to try out their soap bar.

GoToob Travel BottlesThese little bottles are for storing various liquids, usually toiletries. The’re made of a pleasant soft silicone material, and come in 2oz (nice) and 3oz sizes. My initial experiments show the lid has problems staying on under stress. If I were to pack again I’d rather keep the Dr Bronner’s in its original container, and get a small shampoo container.

Vapur Element 0.7L Water BottleI find this bottle to be just the right capacity of water. It is collapsible, which means when not full, the extra air can be squeezed out to make it smaller. Likewise when it is empty, you can fold it up and secure it with the attached carabiner. After about a year of use the only wear is the blue sticker is starting to separate.

Uniqlo Airism Boxer BriefsUnlike all the other wool you see on this list, these things are 100% synthetic. Their fitment is top-notch, and they don’t ride up your legs, unlike a lot of other undergarments. They’re also inexpensive considering their quality. Unfortunately they’re not as anti-odor as Icebreaker boxer-briefs. They also tend to restrict certain delicate anatomy.

Wool & Price Better Button Down shirts. I carry one or two of these with me everywhere. These are my favorite shirts. Most of their patterns can serve formal and informal duties. They can be donned to formal events and at a neighborhood pub. They are 100% wool, so they are anti-wrinkle and anti-odor. Their one down size is that they tend to smell a bit sheepish when wet.

Icebreaker Wool Pants. These are pretty conservative brown pants. They don’t have obnoxious pockets, zippers, and logos all over the place. Alas, they do have a bit of the irritating external stitching, but it does serve a worthwhile purpose on these. The stitching allows the front pockets to be fixed, which means no more backwards-facing front pockets when putting on pants. Although they are a wool blend, they keep most of the excellent anti-odor and anti-wrinkling properties.

Icebreaker Multisport Ultralight Mini SocksThese socks are nice and soft, and come with a lifetime guarantee against holes. I’ve had these long enough to figure out why: the webbing is made of a much stronger material than the wool, so although the wool on the bottom of the socks will wear off, the elastic webbing will not. Still, they’re good in hot and cold weather, can be washed in the sink, dry quickly, and keep you warm even when wet.

PackTowl Nano Lite Towel & Large Towel. These towels are excellent travel compansions. The pocket towel can double as a washrag and resists stains despite its bright color. The “large” towel is still pretty small, but is lightweight and folds up to about the size of two decks of cards. Both towels are quick-drying and resist odors.

Clark’s Desert BootsIn the beeswax color these are formal and and informal enough to wear anywhere. Like most shoes, they are not without their problems. For example, they pinch the upper foot due to the position of the lacing. There is little to be done about it. They also do a poor job insulating against rain. Additionally the interesting crepe bottom wears out faster than usual shoe rubber. All that said, they do look and feel excellent. Nonetheless, I’m looking for better alternatives.

Leatherman Style PS. This is a small travel-friendly multitool. I take it through airports all the time, although save time by taking it out of my bag before X-ray. I’ve been flying with it for about a year now with no overeager TSA agents confiscating it. It is the only tool that I need to disassemble my laptop.

Let me know if this post was useful for you. Likewise, let me know if you have some tips on something that might work better for me.

by bkero at February 04, 2015 02:45 AM

January 18, 2015

Lars Lohn

The Smoothest Migration

I must say that it was the smoothest migration that I have ever witnessed. The Socorro system data has left our data center and taken up residence at Amazon.

Since 2010, HBase has been our primary storage for Firefox crash data.  Spread across something like 70 machines, we maintained a constant cache of at least six months of crash data.  It was never a pain free system.  Thrift, the system through which Socorro communicated with HBase, seemed to develop a dislike for us from the beginning.  We fought it and it fought back.

Through the adversity that embodied our relationship with Thrift/HBase, Socorro evolved fault tolerance and self healing.  All connections to external resources in Socorro are wrapped with our TransactionExecutor.  It's a class that recognizes certain types of failures and executes a backing off retry when a connection fails.  It's quite generic, as it wraps our connections to HBase, PostgreSQL, RabbitMQ, ElasticSearch and now AmazonEC2.  It ensures that if an external resource fails with a temporary problem, Socorro doesn't fail, too.

Periodically, HBase would become unavailable. The Socorro system, detecting the problem, would back down, biding its time while waiting for the failed resource to recover.  Eventually, after probing the failed resource, Socorro detects recovery and picks up where it left off.

Over the years, we realized that one of the major features that originally attracted us to HBase was not giving us the payoff that we had hoped.  We just weren't using the MapReduce capabilities and found the HBase maintenance costs were not worth the expense.

Thus came the decision that we were to migrate away.  Initially, we considered moving to Ceph and began a Ceph implementation of what we call our CrashStorage API.

Every external resource in Socorro lives encapsulated in a class that implements the Crash Storage API.  Using the Python package Configman, crash storage classes can be loaded at run time, giving us a plugin interface.  Ceph turned out to be a bust when the winds of change directed us to move to AmazonS3. Because we implemented the CrashStorage API using the Boto library, we were able to reuse the code.

Then began the migration.  Rather than just flipping a switch, our migration was gradual.  We started 2014 with HBase as primary storage:

Then, in December, we started running HBase and AmazonS3 together.   We added the new AmazonS3 CrashStorage classes to the Configman managed Socorro INI files.  While we likely restarted the Socorro services, we could have just sent SIGHUP, prompting them to reread their config files, load the new Crash Storage modules and continue running as if nothing had happened.

After most of a month, and completing a migration of old data from HBase to  Amazon, we were ready to cut HBase loose.

I was amused by the non-event of the severing of Thrift from Socorro.  Again, it was a matter of editing HBase out of the configuration, sending a SIGHUP, causing HBase to fall silent. Socorro didn't care.  Announced several hours later on the Socorro mailing list, it seem more like a footnote than an announcement: "oh, by the way, HBase is gone".

Oh, the migration wasn't completely perfect, there were some glitches.  Most of those were from minor cron jobs that were used for special purposes and inadvertently neglected.

The primary datastore migration is not the end of the road.  We still have to move the server processes themselves to Amazon system.  Because everything is captured in the Socorro configuration, however, we do not anticipate that this will an onerous process.

I am quite proud of the success of Socorro's modular design.  I think we programmers only ever really just shuffle complexity around from one place to another.  In my design of Socorro's crash storage system, I have swung a pendulum far to one side, moving the complexity into the configuration.  That has disadvantages.  However, in a system that has to rapidly evolve to changing demands and changing environments, we've just demonstrated a spectacular success.

Credit where credit is due: Rob Helmer spearheaded this migration as the DevOp lead. He pressed the buttons and reworked the configuration files.  Credit also goes to Selena Deckelmann, who lead the way to Boto for Ceph that gave us Boto for Amazon.  Her contribution in writing the Boto CrashStorage class was invaluable.  Me?  While I wrote most of the Boto CrashStorage class and I'm responsible for the overall design, I was able to mainly just be a witness to this migration.  Kind of like watching my children earn great success, I'm proud of the Socorro team and look forward to the next evolutionary steps for Socorro.

by K Lars Lohn ( at January 18, 2015 01:09 PM

January 17, 2015

Pranjal Mittal

My new blog for programming related posts

After a lot of thought I have decided to divide my blogging activities into non-technical and technical blogs. I have created a separate blog for technical posts. I realized that I was facing a lot of difficulties trying to make syntax highlighting for code work in the current blog (which uses Blogger).

Hopefully the new blog which uses Octopress, gives me an incentive to complete my blog posts. Until now I have left some of my blog posts incomplete until because I got frustrated trying to paste code I wrote with syntax highlighting eventually giving up and then forgetting to complete the post. Even though there are some syntax highlighting JS libraries out there, they do not work so well with blogger and the highlighted code takes a small noticeable time to render. I did not like it so much or maybe I didn't try hard enough to make it work smoothly. But believe me it was much easier to have a blog setup on Octopress in the meantime.

If you would like to see an Octopress blog post sample, there you go: my post on finding total number of users on github using the Github API

by Pranjal Mittal ( at January 17, 2015 08:05 AM

December 22, 2014

Alex Polvi

December 16, 2014

Alex Polvi

December 10, 2014

Pranjal Mittal

Javascript vs Python: Comparing ways of doing stuff

In this post I am going to compare ways of doing useful stuff in Javascript and Python.

1. Unpacking an array and passing as arguments to a function Background: Math.min in JS vs min in Python.


var array = [1, 2, 3, 4]
Math.min.apply(Math.min, array)

// Apply keyword is used in Javascript while calling a function to unpack array into arguments for a function

// Math.min(array) will give an error


array = [1, 2, 3, 4]


# Use * while calling function to unpack arguments

# min can also be called with a list as input directly. Python is beautiful.

P.S: If you know about some stuff in one of the language and cannot figure out how to do it in the other. Just leave a comment down and I will work it out for you.

by Pranjal Mittal ( at December 10, 2014 11:41 AM

November 13, 2014

Lars Lohn

the World's One Door

Last evening, just before I retired for the night, a coworker, Edna Piranha (not her real name), tweeted something that intrigued me:

the WORLD… their WORLD… the WORLD.. world world, world? world! world. wow that word looks funny. world.
Suddenly my brain shifted into what I can only call poetry mode.  Words and phrases similar to the word, "world" began surfacing and swimming around in my mind. After about twenty minutes, I replied to her tweet with:
 @ednapiranha may your weird wonder ward our wired world for we were old and wandered and whirled from the word of the one door.
I immediately went to bed and began a night woven with those words.  They were in my dreams.  I'd wake up with chants in my head, "world - we're old" and "wonder - one door". It haunted me all night long and now continues into the next day.

Ten years ago, a dear friend, since deceased, came up with a new word, ospid, that perfectly describes what I was experiencing.
n, an object which, for a brief period after its creation, intensely fascinates its creator.  Once the fascination is over, the object is no longer an ospid.
I look forward to the moment, hopefully later today, when this is no longer an ospid. 

by K Lars Lohn ( at November 13, 2014 02:59 PM

October 29, 2014

Lars Lohn

Judge the Project, Not the Contributors

I recently read a blog posting titled, The 8 Essential Traits of a Great Open Source Contributor I am disturbed by this posting. While clearly not the intended effect, I feel the posting just told a huge swath of people that they are neither qualified nor welcome to contribute to Open Source. The intent of the posting was to say that there is a wide range of skills needed in Open Source. Even if a potential contributor feels they lack an essential technical skill, here's an enumeration of other skills that are helpful.
Over the years, I’ve talked to many people who have wanted to contribute to open source projects, but think that they don’t have what it takes to make a contribution. If you’re in that situation, I hope this post helps you get out of that mindset and start contributing to the projects that matter to you.
See? The author has completely good intentions. My fear is that the posting has the opposite effect. It raises a bar as if it is an ad for a paid technical position. He uses superlatives that say to me, “we are looking for the top people as contributors, not common people”.

Unfortunately, my interpretation of this blog posting is not the need for a wide range of skills, it communicates that if you contribute, you'd better be great at doing so. In fact, if you do not have all these skills, you cannot be considered great. So where is the incentive to participate? It makes Open Source sound as if it an invitation to be judged as either great or inadequate.

Ok, I know this interpretation is through my own jaundiced eyes. So to see if my interpretation was just a reflection of my own bad day, I shared the blog posting with a couple colleagues.  Both colleagues are women that judge their own skills unnecessarily harshly, but, in my judgement are really quite good. I chose these two specifically, because I knew both suffer “imposter syndrome”, a largely unshakable feeling of inadequacy that is quite common among technical people.   Both reacted badly to the posting, one saying that it sounded like a job posting for a position for which there would be no hope of ever landing.

I want to turn this around. Let's not judge the contributors, let's judge the projects instead. In fact, we can take these eight traits and boil them down to one: essential trait of a great open source project:
Essential trait of a great open source project:
Leaders & processes that can advance the project while marshalling imperfect contributors gracefully.
That's a really tall order. By that standard, my own Open Source projects are not great. However, I feel much more comfortable saying that the project is not great, rather than sorting the contributors.

If I were paying people to work on my project, I'd have no qualms about judging their performance any where along a continuum of “great” to “inadequate”. Contributors are NOT employees subject to performance review.  In my projects, if someone contributes, I consider both the contribution and the contributor to be “great”. The contribution may not make it into the project, but it was given to me for free, so it is naturally great by that aspect alone.

Contribution: Voluntary Gift

Perhaps if the original posting had said, "these are the eight gifts we need" rather than saying the the gifts are traits of people we consider "great", I would not have been so uncomfortable.

A great Open Source project is one that produces a successful product and is inclusive. An Open Source project that produces a successful product, but is not inclusive, is merely successful.

by K Lars Lohn ( at October 29, 2014 05:16 PM

August 21, 2014

Pranjal Mittal

Sending sms messages from code without purchasing an online sms gateway

Very recently Makemymails introduced an alpha version of an SMS API that allows users to send automated sms messages from their own website code with a few lines of code which results in sms messages being routed via their android phone to the intended recipients.

This eliminates the need to buy expensive Sms gateways, because  your android phone itself becomes your sms gateway over which Makemymails provides a free web api that makes sending sms messgaes from phone dead simple. Sending an sms from your code simply boils down to calling a function from your code (supported languages PHP, Python) and above that a REST API is provided that allows integration with any programming language. What excites me is that the web sms api is completely free and I only have to pay a small amount for the sms plan/pack that I activate on my android phone.


In an era of smartphones do you need to look beyond your own device for sending messages?
Buying an SMS gateway is only useful for high volumes of sms. If you are sending < 100-200 text messages per day from your website or code, it is 5-10 times more economical to use this Web-Android api from Makemymails over buying sms gateways and plans from internet sms gateway providers.

Eg. Clickatell is a very good service for sending sms messages from code and they provide nice API's too. The only sad part is the pricing and as a small volume sms user who just wants to use sms for transactional purposes like sending order confirmations, password tokens, etc via your website to users it ain't a very good option as it would drain out a lot of your money.

How does it work?


- An mobile data enabled, android device
- Operational sim on the android phone that is capable of sending sms messages.
- (Optional) Sms plan/pack on Android phone which is much more cost effective than sms gateways for a few hundred sms per day.

1. You register for a free web account on Makemymails and obtain a username.

2. You install the Makemymails android app and provide your username inside the app to associate your device with your web account. You can associate multiple android devices with the same web account.

3. You visit your web account where you can see associated devices and your API KEY. Each device is assigned a unique device id from makemymails and you can use any of your device to send messages from the api by providing the corresponding device id during the api call.

Step by step instructions to get started?

Step 1: Sign up for a free account on Makemymails [1] and note your username somewhere.

(After sign up, do not get confused with the other services Makemymails offers. It offers an emailing service also which is a different use case altogether)

Step 2: Install Makemymails Android App from Google Play on the intended android device from which your messages will actually be sent.

Tap "Associate username" button.

Step 3: 

This page contains the api documentation which can be integrated with your website irrespective of the platform and programming language.


Api calls you make will cause an sms to be sent via your phone, so it is suggested to install an Sms plan on your default sim on your android device. Overall these sms plans are 5-10 times cheaper than the cost of buying the sms gateway and easier to activate.

The api call will cause a message to be sent from the default sim on your phone. The recipient will see your number as the Sender ID.

Step 4:

As soon as you make a POST request of content-type application/json to the url: 
an sms will be generated by makemymails as per your post and routed via the selected android phone.
Make sure your device is connected to the internet at the time of making the call if you want the message to be delivered immediately.

Useful API libraries in different languages


Typical coders/fun use cases

- A command line tool can be built which can help send messages from your command line, straight through your android phone.

(I am going to build one for myself very soon and open source it if you would like to try... but of course I will have to remember to hide my API KEY from the code)

Typical commercial use cases

- Small scale e-commerce companies who wish to send order confirmation to users after successful purchase.
- Websites for hotels and resorts who have online portals for booking and want to send messages to their users after making a booking.
- Restaurants with online websites who deliver food at home and wish to send food order confirmations.
- Any website that wishes to send registration confirmation messages to users, sms messages when someone contacts  you via a contact form on website, or updates to users or administrators of a website when a transaction is made.

by Pranjal Mittal ( at August 21, 2014 06:07 AM

May 26, 2014

Pranjal Mittal

Setting up Rsync in daemon mode on an AWS EC2 instance

I was trying to exlore and understand rsync in detail for a very cool project that I am planning to work on. The project is related to FTP Mirror Syncing about which I will write in detail next time. Rsync is a great tool for efficient syncing of directories. It transfers only the differences in files saving time and bandwidth. In this succint post I will quickly walk through the steps I perfomed to be able to setup rsync between 2 Amazon EC2 instances. I will particularly focus on using rsync in daemon mode as opposed to using rsync over ssh which you could explore easily without any problems.

Key to the steps described ahead:

(1) To edit default config file used by rsync daemon
(2) To start rsync daemon
(3) To kill rsync daemon
(4) Command to sync (push) contents in current directory to the server which is runnign the rsync daemon.
(5) To create several demo files for testing rsync

Steps performed in detail:

(Refer to corresponding key number)

(1) sudo nano /etc/rsyncd.conf

rsyncd.conf (contents)

lock file = /var/run/rsync.lock
log file = /var/log/rsyncd.log
pid file = /var/run/
port = 873

# Defines an rsync module.
    path = <absolute_path_this_module_maps_to>
    comment = The syncthis directory shall be synced.
    uid = ubuntu
    gid = ubuntu
    read only = no
    list = yes
    hosts allow =
    auth users = *
    secrets file = /etc/rsyncd.secrets

# Can define more modules if you want that map to a different path.

rsyncd.secrets (contents)

rsync_client_user: keepanypassword

Note: Make sure you change access permissions of your rsyncd.secrets file to 600 if you want your rsync daemon to actually accept your secrets file.

    $ sudo chmod 600 /etc/rsyncd.conf

(2) sudo rsync --daemon

Caveat: Make sure connections to port 873 are allowed on your instance. I spent about 5-6 days trying to figure out why my rsync daemon is not working correctly when I try to rsync to it from some other instance and later figured out that AWS firewall had blocked all conections to port 873 since there was no rule allowing access to port 873.

(3) sudo kill `cat /var/run/`

(4) rsync -rv . ubuntu@

Run this command on any other instance (without an rsync daemon) to push all contents in the current directory to the rsync module path on the instance running the rsync daemon.
 -r stands for recursively transferring all the contents of the directory.

Note: Double colon (::) means that rsync protocol will be used rather than ssh. If only a single colon (:) is provided then rsync tries to sync over ssh.

(5) for i in `seq 1 100`; do touch testfile$i; done

This simple bash command will generate 100 testfiles like testfile1, testfile2, etc.. which will be useful in case you wish to explore how a sync with several files involved looks like.

Quick Tip:

Syncing using rsync in daemon mode is much faster than using rsync in ssh mode. The daemon mode turns pretty useful for syncing public content where privacy is not much of a concern. The ssh mode takes more time as some time is spending on encrypting / decrypting the rsync transfer data.

by Pranjal Mittal ( at May 26, 2014 04:51 PM

May 02, 2014

Beaver BarCamp

A Succesful Beaver BarCamp 14

Justin Dugger giving a talk

Over 115 students, educators and community members joined together on Saturday, April 12, in the Kelley Engineering Center at Oregon State University in order to attend the Open Source Lab’s Beaver BarCamp 14.

Continuing on with the spirit of previous BarCamps, Beaver BarCamp 14 had a diverse group of tech based and non-tech based sessions; this year topics ranged from Heartbleed to Vagrant to How to Podcast to Magic the Gathering (with free sample decks).

“The secret I’ve found to getting people to show up to your talks is having lots of props to get their attention,” advises Evan Marshall, who hosted a session on helicopters. Marshall followed his own advice and brought a helicopter flight simulator to accompany his session.

Everyone is welcome to present at barcamp, regardless of their experience level. This open format provides the opportunity to hear from a wide variety of speakers from many different backgrounds and interest areas.

First time presenter Daniel Reichert ran a session on Theoretical Cryptography. “I wanted to get more experience speaking in public,” Reichert states. This was a consensus he shared with many of the other presenters, including Gregory Brewster, who ran a session on Google Glass.

Students discuss which talks to attend

“I decided to come with one of my friends that came last year, and thought I could get experience presenting to people,” Brewster says. “I happen to know about Google Glass, and thought that it could be interesting.”

The interactive element is always strong at the unconference, and Beaver BarCamp 14 was no exception. Those who attended the Google Glass session where given the opportunity to try it out. Some played games, some took photos and some simply explored the different features.

“Google Glass Tennis was absolutely exhilarating,” reports barcamp newcomer Maren Vick. “Your head is the racket and it’s so lifelike. Be careful, though, you need to focus on where you’re walking as well.”

Rackspace sponsored Beaver BarCamp 14, and attendees were able to enjoy a full, free Beaver BarCamp experience along with food, refreshments and t-shirts.

The Board

In the past, Beaver BarCamp has been a biannual event, however the Open Source Lab has decided to switch to an annual format going forward.

“This year we decided that it was best to switch to a once a year format and focus on making Beaver BarCamp better," says Lance Albertson, director of the Open Source Lab. “We also look forward to developing new education programs such as a Hackathon focused on DevOps and FOSS, getting students interested in it early on in their school year. This would also enable us to kickstart DevOps Bootcamp.”

This means that Beaver BarCamp 15 will take place in April 2015. Any updates or details will be posted to the Beaver BarCamp website, so stay tuned!

by OSU Open Source Lab at May 02, 2014 07:36 PM

March 26, 2014

Beaver BarCamp

Upcoming changes to Beaver Barcamp

The Open Source Lab loves Beaver Barcamp, and we know the community does too. The event plays an important role in fostering relationships between the tech and academic communities, something the OSL wants to continue doing. However, we see a need for more hands-on, workshop events. Given the tight academic calendar, as well as the amount of organizing this event entails, the OSL has chosen to substitute the fall Beaver BarCamp with a DevOps or Free and Open Source Software (FOSS) Hackathon event.

“This year we decided that it was best to switch to a once a year format and focus on making Beaver BarCamp better," says Lance Albertson, director of the Open Source Lab. “We also look forward to developing new education programs such as a Hackathon focused on DevOps and FOSS, getting students interested in it early on in their school year. This would also enable us to kickstart DevOps Bootcamp.”

The April 2014 Beaver BarCamp will continue as planned. The lab has decided to call this Beaver BarCamp 14, aligning the number with the year. (And yes, skipping unlucky number 13!)

At Beaver Barcamp 14, the Open Source Lab will offer a feedback session. We hope you’ll join us in discussing how the lab can continue to support the open source and academic communities at Oregon State.

Stay tuned for exciting details about our future events!

by OSU Open Source Lab at March 26, 2014 10:15 PM

February 25, 2014

Brandon Philips

Slides: etcd at Go PDX

Last week I gave a talk at the PDX Go meetup (Go PDX). The presentation is a refinement on the talk I gave last month at GoSF but contains mostly the same content.

Several people in the audience had some experience with etcd already so it was great to hear their feedback on the project as a whole. The questions included partition tolerance and scaling properties, use cases and general design. It was a smart crowd and it was great to meet so many PDX Gophers.




by Brandon Philips at February 25, 2014 12:00 AM

February 16, 2014

Brandon Philips

Getting to Goven

This is the step by step story of how etcd, a project written in Go, arrived at using goven for library dependency management. It went through several evolutionary steps while trying to find a good solution to these basic goals:

  • Reproducible builds: given the same git hash and version of the Go compiler we wanted an identical binary everytime.
  • Zero dependencies: developers should be able to fork on github, make a change, build, test and send a PR without having anything more than a working Go compiler installed.
  • Cross platform: compile and run on OSX, Linux and Windows. Bonus points for cross-compilation.

Checked in GOPATH

Initially, to get reproducible builds and zero dependencies we checked in a copy of the GOPATH to “third_party/src”. Over time we encountered several problems:

  1. “go get” was broken since downstream dependencies would change master and “go get” would setup a GOPATH that looked different than our checked in version.
  2. Windows developers had to have a working bash. Soon we had to maintain a copy of our build script written in Powershell.

At the time I felt that “go get” was an invalid use case since etcd was just a project built in Go and “go get” is primarliy useful for easily grabbing libraries when you are hacking on something. However, there was mounting user requests for a “go gettable” version of etcd.

To solve the Windows problem I wrote a script called “third_party.go” which ported the GOPATH management tools and the shell version of the “build” script to Go.


third_party.go worked well for a few weeks and we could remove the duplicate build logic in the Powershell scripts. The basic usage of was simple:

# Bump the raft dependency in the custom GOPATH
go run third_party.go bump
# Use third_party.go to set GOPATH to third_party/src and build
go run third_party.go build

But, there was a fatal flaw with this setup: it broke cross compilation via GOOS and GOARCH.

GOOS=linux go run third_party.go build
fork/exec /var/folders/nq/jrsys0j926z9q3cjp1yfbhqr0000gn/T/go-build584136562/command-line-arguments/_obj/exe/third_party: exec format error

The reason is that GOOS and GOARCH get used internally by “go run`. Meaning it literally tries to build “third_party.go” as a Linux binary and runs it. Running a Linux binary on a OSX machine doesn’t work.

This soultion didn’t get us any closer to being “go gettable” either. There were several inquiries per week for this. So, I started looking around for better solutions and eventually settled on goven.

goven and goven-bump

goven achieves all of the desirable traits: reproducible builds, zero dependencies to start developing, cross compilation, and as a bonus “go install” works.

The basic theory of operation is it checks all dependencies into subpackages of your project. Instead of importing “” you import It makes the imports uglier but it is automated by goven.

Along the way I wrote some helper tools to assist in bumping dependencies which can be found on Github at philips/goven-bump. The scripts `goven-bump” and “goven-bump-commit” grab the hg revision or git hash of the dependency along with running goven. This makes bumping a dependency and getting a basic commit message as easy as:

cd ${GOPATH}/
git commit -m 'bump( 074202958b0a25b4d1e194fb8defe5d69c300774'

goven and introduces some additional complexity for the maintainers of the project. But, the simplicity it presents to regular contributors and users used to “go get” make it worth the additional effort.

by Brandon Philips at February 16, 2014 12:00 AM

February 07, 2014

Russell Haering

Ridiculously Fast 'sprintf()' for Node.js

Today I was reminded of one of my neatest Node.js hacks. A few years ago, in the process of optimizing how Rackspace Cloud Monitoring compiles user-supplied alarms (a javascript-like DSL used to implement thresholds) we discovered that we were spending a significant amount of CPU time in a widely used Javascript implemetation of sprintf. This was back in the dark ages of Node.js, before util.format landed.

The CPU time spent in sprintf wasn't enough to be a problem: even compiling a few hundred thousand alarms is pretty fast, as compared to reading them out of a database, serializing the compiled alarms to XML, and loading them into Esper. Nonetheless, in a bout of "not invented here" and with a spirit of adventure in my heart, I did the obvious thing, and took a weekend to write a faster sprintf.

"Standard" Sprintf

The standard implementation of sprintf takes a format string, followed by any number of positional arguments intended to be injected into the resulting string. It operates by parsing the format string using a series of regular expressions, to generate a parse tree consisting of alternate constant strings and formating placeholders.

For example, consider:

sprintf('The %s ran around the tree', 'dog');  

The generated parse tree looks something like:

['The ', '%s', ' ran around the tree']

Then the tree is is iterated, and positional (or named) arguments injected to generate an array that can be joined into the appropriate result:

return ['The ', 'dog', ' ran around the tree'].join('');  

As an optimization, the parse tree is cached for each format string, so that repeated calls to sprintf for a given format string need only repeat the actual argument injection.

Getting Wild

TLDR; the code

So how can this be further optimized? We know a few things about V8:

  1. V8 is very good at concatenating strings.
  2. V8 is very good at just-in-time compiling "hot" functions.
  3. At least as of Crankshaft (the latest version of V8 I've used in any
    seriousness), V8 was unable to optimize code that treated arguments in unusual ways such as iterating it, or mixing its use with named arguments.

I was able to take advantage of these properties by generating a function which applied the format string through a single-line string concatenation, instead of instead of generating a parse tree. Taking the example above, I generate a string such as:

var fnBody = "return 'The ' + arguments[1] + ' jumped over the tree';";  

Then compiling that string into a function on the fly:

return Function(fnBody);  

By caching the resulting Function object, I was able to cause V8's JIT to optimize calls to sprintf into little more than a dictionary lookup, a function call and a string concatenation.


An obvious risk of this strategy is that an attacker might find a way to cause us to generate arbitrary javascript.

This can be mitigated by never passing user-supplied input as a format string. In fact, because the cache doesn't implement any expiration, you should probably only ever pass literal format strings or you'll end up with a memory leak. This seems to be true of node-sprintf as well, so I don't consider it a serious limitation, just something to be aware of.


At the time, we saw marked (if not especially necessary) speedups in alarm compilation performance, but I don't have the bencharks on-hand. Instead, on a modern-ish version of Node.js (v0.10.17) running on my Macbook Pro I tested:

  1. My "fast" sprintf
  2. Node's util.format
  3. The widely used sprintf module

The test was:

for (var i = 0; i < 10000000; i++) {  
  sprintf_fn('The %s jumped over a tree', i);

The results:

Implementation Time
fast sprintf 1504ms
util.format 14761ms
standard sprintf 22964ms

The improved sprintf lacks a lot of the functionality of the other implementations, so the comparison isn't entirely fair. Nonetheless, with a speedup of about 10x over util.format and 15x over sprintf (at least for this benchmark), I think its safe to declare this hack a success.

by Russell Haering at February 07, 2014 12:00 AM

Ridiculously Fast 'sprintf()' for Node.js

Today I was reminded of one of my neatest Node.js hacks. A few years ago, in the process of optimizing how Rackspace Cloud Monitoring compiles user-supplied alarms (a javascript-like DSL used to implement thresholds) we discovered that we were spending a significant amount of CPU time in a widely used Javascript implemetation of sprintf. This was back in the dark ages of Node.js, before util.format landed.

The CPU time spent in sprintf wasn't enough to be a problem: even compiling a few hundred thousand alarms is pretty fast, as compared to reading them out of a database, serializing the compiled alarms to XML, and loading them into Esper. Nonetheless, in a bout of "not invented here" and with a spirit of adventure in my heart, I did the obvious thing, and took a weekend to write a faster sprintf.

"Standard" Sprintf

The standard implementation of sprintf takes a format string, followed by any number of positional arguments intended to be injected into the resulting string. It operates by parsing the format string using a series of regular expressions, to generate a parse tree consisting of alternate constant strings and formating placeholders.

For example, consider:

sprintf('The %s ran around the tree', 'dog');

The generated parse tree looks something like:

['The ', '%s', ' ran around the tree']

Then the tree is is iterated, and positional (or named) arguments injected to generate an array that can be joined into the appropriate result:

return ['The ', 'dog', ' ran around the tree'].join('');

As an optimization, the parse tree is cached for each format string, so that repeated calls to sprintf for a given format string need only repeat the actual argument injection.

Getting Wild

TLDR; the code

So how can this be further optimized? We know a few things about V8:

  1. V8 is very good at concatenating strings.
  2. V8 is very good at just-in-time compiling "hot" functions.
  3. At least as of Crankshaft (the latest version of V8 I've used in any seriousness), V8 was unable to optimize code that treated arguments in unusual ways such as iterating it, or mixing its use with named arguments.

I was able to take advantage of these properties by generating a function which applied the format string through a single-line string concatenation, instead of instead of generating a parse tree. Taking the example above, I generate a string such as:

var fnBody = "return 'The ' + arguments[1] + ' jumped over the tree';";

Then compiling that string into a function on the fly:

return Function(fnBody);

By caching the resulting Function object, I was able to cause V8's JIT to optimize calls to sprintf into little more than a dictionary lookup, a function call and a string concatenation.


An obvious risk of this strategy is that an attacker might find a way to cause us to generate arbitrary javascript.

This can be mitigated by never passing user-supplied input as a format string. In fact, because the cache doesn't implement any expiration, you should probably only ever pass literal format strings or you'll end up with a memory leak. This seems to be true of node-sprintf as well, so I don't consider it a serious limitation, just something to be aware of.


At the time, we saw marked (if not especially necessary) speedups in alarm compilation performance, but I don't have the bencharks on-hand. Instead, on a modern-ish version of Node.js (v0.10.17) running on my Macbook Pro I tested:

  1. My "fast" sprintf
  2. Node's util.format
  3. The widely used sprintf module

The test was:

for (var i = 0; i < 10000000; i++) {
  sprintf_fn('The %s jumped over a tree', i);

The results:

Implementation Time
fast sprintf 1504ms
util.format 14761ms
standard sprintf 22964ms

The improved sprintf lacks a lot of the functionality of the other implementations, so the comparison isn't entirely fair. Nonetheless, with a speedup of about 10x over util.format and 15x over sprintf (at least for this benchmark), I think its safe to declare this hack a success.

February 07, 2014 12:00 AM

January 18, 2014

Brandon Philips

Video: etcd at GoSF

Last week I gave a talk at the San Francisco Go meetup (GoSF). The event was great and has about 200 Go Gophers in attendance.

Giving the talk was great because it made me realize how much we have accomplished on etcd since my last talk in October. The audience was mostly curious about how it differs from Zookeeper, how master elections work, and how we were testing various failure modes. A great suggestion from Brad Fitz was to use a mock of net.Conn to test various network problems. I hope to start executing on that soon.

by Brandon Philips at January 18, 2014 12:00 AM

January 12, 2014

Justin Dugger

LCA 2014 Videos of Note

Linuxconf 2014 wrapped up last week, and the videos are already online!

I didn't get a chance to review all the video, but here's some of the sessions I thought were interesting:

Rusty Russel discusses virtIO standardization. I thought I knew what virtIO was but his initial explaination leaves me more confused than I started out. Nevertheless, Rusty gives a implementer's view of the standardization process, and shares how virtIO manages forward and backward compatibility between hypervisor, guest OSes, and even hardware.

Elizabeth Krumbach Joseph explains how the OpenStack Core Infra team publishes does their work in the open. We've taken a similar approach, so its nice to see other approaches and bits we might steal =). Storing Jenkins jobs in YAML in config management sounds very nice, and I will have to bring it up at my next meeting.

Bdale Garbee shares his experience losing his home to the Black Forest Fire. As a serial renter / mover, I'm already well prepared to answer the question "What would you take if you had five minutes to clean out your home?" So I would have liked a bit more in the way of disaster recovery / offsite backups / tech stuff, but but I happen to know he rescued his servers from the fire and isn't storing them locally anymore. So perhaps there is no lesson to share yet =)

Michael Still presents a third party CI approach for database migrations in OpenStack. Looks like a combo of gerrit for code reviews, Zuul, and some custom zuul gearman worker. Surprisingly little duplicate content from the other open stack infrastructure talk!

Jim Cheetham asks 'Is it safe to mosh?' The answer appears to be yes, but takes a hands off approach to the underlying cryto.

Lots of exciting talks, and maybe I need to sit down and think about writing my own proposal for LCA 2015.

by Justin Dugger at January 12, 2014 12:00 AM

October 01, 2013

Brandon Philips

Video: Modern Linux Server with Containers

At LinuxCon 2013 I gave a talk that dissects “Linux Containers” into its component parts in the Kernel: cgroups and namespaces. The talk shows how cgroups act as the “accounting bean counter” and namespaces as the “castle walls” that isolate processes from each other.

If you are already familiar with the basics of namespaces and cgroups I show off some tools like nsenter, docker, and systemd-nspawn. Skip to the end to catch the demos.

The full slides are availble on slide deck and mirrored as a pdf here.

by Brandon Philips at October 01, 2013 12:00 AM

May 01, 2013

Beaver BarCamp

Students and community members learn together at Beaver BarCamp 11

Quotes and select photos from Beaver BarCamp 11 are now available on the OSU Open Source Lab's website.

A gallery of user-contributed photos from the event is also available in a Google Plus photo gallery.

by OSU Open Source Lab at May 01, 2013 07:12 PM

April 27, 2013


It's All About Community: DC Metro Open Source Community Summit May 10, 2013

Oregon State University Open Source Lab is pleased lend its support to the Open Source Initiative and the first Open Source Community Summit, being held in Washington D.C. on May 10, 2013.

It's a great way to stand up and be counted as part of the DC open source comunity; check it out!more...

by deborah at April 27, 2013 05:53 AM

September 30, 2012

Justin Dugger

PuppetConf 2012

Recovered from the post-con crash a while ago, so it's time to write up some thoughts. Last week I attended PuppetConf with my coworkers at the OSL. The OSL attended PuppetConf primarily as a pre-deployment information gathering exercise. We want to avoid common pitfalls, and be able to plan for things coming down the pipeline. Puppet 3.0 was targetted to be released on Friday and clearly that slipped.

The venue itself was nice, but space partitioned poorly. The two main tracks had surplus space, but the three side tracks nearly always had people turned away for space concerns. Supposedly, the recordings will be available shortly, so it may not be the Worst Thing In The World, but only time will tell.

Content wise, one recurring theme is to start small and simple, and not worry about scale or sharing until they become an issue. Designing a deployment for thousands of nodes when you have perhaps a dozen gives new life to the term "architecture astronaut," and there's a certain amount of benefit to procrastinating on system design while the tools and ecosystem mature. Basically, build one to throw away.

Another problem we've been worrying about at the OSL is updating 3rd party config modules in their various forms. The hope is that by explicitly annotating in your system where things came from, you can automate pulling in updates from original sources. Pretty much the universal recommendation here is a condemnation: avoid git submodules. Submodules sounds like the right strategy, but it's for a different use case. In our experience, it dramatically complicates the workflow. At least one person mentioned librarian-puppet, which as far as I can tell is isn't much different than mr with some syntactic sugar for PuppetForge. This is great, because mr was basically the strategy I was recommending prior to PuppetConf.

The Better Living Through Statistics talk was less advanced than I'd hoped. Anyone who's spent maybe 5 minutes tuning nagios check_disks realizes how inadequate it is, and that the basic nagios framework is to blame. What you really want is an alert when the time to disk outage approaches time to free up more disk, and no static threshold can capture that. While Jamie did provide a vision for the future, I was really hoping for some new statistical insight on the problem. It appears it's up to me to create and provide said insight. Perhaps in another post.

R Tyler Croy gave a useful talk on behavior/test driven infrastructure. I'd looked into Cucumber before, but RSpec was only a word to me before this talk. It's certainly something I'll need to take some time to integrate into the workflow and introduce to students. One concern I had (that someone else aired) was that in the demo, the puppet code and the code to test it was basically identical, such that software could easily translate from code to test and back. Croy insisted this was not the case in more complicated Puppet modules, but I'm reserving judgement until I see said modules.

Overall, I'd definately recommend the conference to people preparing to deploy puppet. There's plenty more sessions I didn't cover in here that are worth your time. You'd probably get the most out of it by starting a trial implementation first, instead of procrastinating until Wednesday night to read the basics like I did. Beyond simply watching lectures, it's useful to get away from the office and sit down to learn about this stuff. Plus, it's useful to build your professional network of people you can direct questions to later.

by Justin Dugger at September 30, 2012 12:00 AM

July 01, 2012

Justin Dugger

Open Source Bridge Wrapup

Friday marked the end of Open Source Bridge. Just about the best introduction to Portland culture as you can find. Vegan lunches, Voodoo Donut catering, lunch truck friday, and rock and roll pipe organists in the Unitarian's sanctuary.

The keynotes were pretty cool. I'd seen Fenwick's presentation from LCA, and was surprised at how much had changed, hopefully since some of his keystone evidence turned out to be bogus; turns out there's strong evidence that the only "priming" effect was in grad students running the study. I'm still not quite clear on what JScott wants people to run vbox for, but he did have a really good idea about bringing your own recording equipment that I wish I had taken to heart.

Probably the most useful talk I attended was Laura Thompson's presentation on Mozilla's Crash Reporting service, powered by Socorro. A few of the projects the OSL hosts are desktop apps and collecting crash data might be a good engineering tool win for them. A lot of embedded hardware talks that would have been interesting, but not directly relevant to the needs of the OSL. Hopefully they'll be up as recordings soon.

The OSL was also well represented as well in the speaker's ranks: we ran five sessions during the main conference, and two during the Friday unconference. I think next year it would be a good idea to encourage our students to participate as volunteers; getting them facetime with speakers and the community at large can only do us a world of good. I gave a first run of a talk on using GNUCash for personal finance; the turnout was pretty good, given how many people were still at the food carts. I should have recorded it to self-critique and improve.

The "after party" on Thursday was nice. Lance won the 2012 Outsanding Open Source Citizen award, which is great, because he deserves recongition for handling the turmoil at the OSL over the past year. But now I've got to figure out my plan meet or beat that for next year. No small task.

Next up is catching up back at the Lab, and then OSCON!

by Justin Dugger at July 01, 2012 12:00 AM

June 13, 2012

Lance Albertson

Ganeti Tutorial PDF guide

As I mentioned in my previous blog post, trying out Ganeti can be cumbersome and I went out and created a platform for testing it out using Vagrant. Now I have a PDF guide that you can use to walk through some of the basics steps of using Ganeti along with even testing a fail-over scenario. Its an updated version of a guide I wrote for OSCON last year. Give it a try and let me know what you think!

by lance at June 13, 2012 01:53 AM

June 11, 2012

Frédéric Wenzel

Fail Pets Research in UX Magazine

I totally forgot blogging about this!

Remember how I curate a collection of fail pets across the Interwebs? Sean Rintel is a researcher at the University of Queensland in Australia and has put some thought into the UX implications of whimsical error messages, published in his article: The Evolution of Fail Pets: Strategic Whimsy and Brand Awareness in Error Messages in UX Magazine.

In his article, Rintel attributes me with coining the term "fail pet".

Attentive readers may also notice that Mozilla's strategy of (rightly) attributing Adobe Flash's crashes with Flash itself by putting a "sad brick" in place worked formidably: Rintel (just like most users, I am sure) assumes this message comes from Adobe, not Mozilla:

Thanks, Sean, for the mention, and I hope you all enjoy his article.

June 11, 2012 07:00 AM

June 08, 2012

Frédéric Wenzel

Let's talk about password storage

Note: This is a cross-post of an article I published on the Mozilla Webdev blog this week.

During the course of this week, a number of high-profile websites (like LinkedIn and have disclosed possible password leaks from their databases. The suspected leaks put huge amounts of important, private user data at risk.

What's common to both these cases is the weak security they employed to "safekeep" their users' login credentials. In the case of LinkedIn, it is alleged that an unsalted SHA-1 hash was used, in the case of, the technology used is, allegedly, an even worse, unsalted MD5 hash.

Neither of the two technologies is following any sort of modern industry standard and, if they were in fact used by these companies in this fashion, exhibit a gross disregard for the protection of user data. Let's take a look at the most obvious mistakes our protagonists made here, and then we'll discuss the password hashing standards that Mozilla web projects routinely apply in order to mitigate these risks. <!--more-->

A trivial no-no: Plain-text passwords

This one's easy: Nobody should store plain-text passwords in a database. If you do, and someone steals the data through any sort of security hole, they've got all your user's plain text passwords. (That a bunch of companies still do that should make you scream and run the other way whenever you encounter it.) Our two protagonists above know that too, so they remembered that they read something about hashing somewhere at some point. "Hey, this makes our passwords look different! I am sure it's secure! Let's do it!"

Poor: Straight hashing

Smart mathematicians came up with something called a hashing function or "one-way function" H: password -> H(password). MD5 and SHA-1 mentioned above are examples of those. The idea is that you give this function an input (the password), and it gives you back a "hash value". It is easy to calculate this hash value when you have the original input, but prohibitively hard to do the opposite. So we create the hash value of all passwords, and only store that. If someone steals the database, they will only have the hashes, not the passwords. And because those are hard or impossible to calculate from the hashes, the stolen data is useless.

"Great!" But wait, there's a catch. For starters, people pick poor passwords. Write this one in stone, as it'll be true as long as passwords exist. So a smart attacker can start with a copy of Merriam-Webster, throw in a few numbers here and there, calculate the hashes for all those words (remember, it's easy and fast) and start comparing those hashes against the database they just stole. Because your password was "cheesecake1", they just guessed it. Whoops! To add insult to injury, they just guessed everyone's password who also used the same phrase, because the hashes for the same password are the same for every user.

Worse yet, you can actually buy(!) precomputed lists of straight hashes (called Rainbow Tables) for alphanumeric passwords up to about 10 characters in length. Thought "FhTsfdl31a" was a safe password? Think again.

This attack is called an offline dictionary attack and is well-known to the security community.

Even passwords taste better with salt

The standard way to deal with this is by adding a per-user salt. That's a long, random string added to the password at hashing time: H: password -> H(password + salt). You then store salt and hash in the database, making the hash different for every user, even if they happen to use the same password. In addition, the smart attacker cannot pre-compute the hashes anymore, because they don't know your salt. So after stealing the data, they'll have to try every possible password for every possible user, using each user's personal salt value.

Great! I mean it, if you use this method, you're already scores better than our protagonists.

The 21st century: Slow hashes

But alas, there's another catch: Generic hash functions like MD5 and SHA-1 are built to be fast. And because computers keep getting faster, millions of hashes can be calculated very very quickly, making a brute-force attack even of salted passwords more and more feasible.

So here's what we do at Mozilla: Our WebApp Security team performed some research and set forth a set of secure coding guidelines (they are public, go check them out, I'll wait). These guidelines suggest the use of HMAC + bcrypt as a reasonably secure password storage method.

The hashing function has two steps. First, the password is hashed with an algorithm called HMAC, together with a local salt: H: password -> HMAC(local_salt + password). The local salt is a random value that is stored only on the server, never in the database. Why is this good? If an attacker steals one of our password databases, they would need to also separately attack one of our web servers to get file access in order to discover this local salt value. If they don't manage to pull off two successful attacks, their stolen data is largely useless.

As a second step, this hashed value (or strengthened password, as some call it) is then hashed again with a slow hashing function called bcrypt. The key point here is slow. Unlike general-purpose hash functions, bcrypt intentionally takes a relatively long time to be calculated. Unless an attacker has millions of years to spend, they won't be able to try out a whole lot of passwords after they steal a password database. Plus, bcrypt hashes are also salted, so no two bcrypt hashes of the same password look the same.

So the whole function looks like: H: password -> bcrypt(HMAC(password, localsalt), bcryptsalt).

We wrote a reference implementation for this for Django: django-sha2. Like all Mozilla projects, it is open source, and you are more than welcome to study, use, and contribute to it!

What about Mozilla Persona?

Funny you should mention it. Mozilla Persona (née BrowserID) is a new way for people to log in. Persona is the password specialist, and takes the burden/risk away from sites for having to worry about passwords altogether. Read more about Mozilla Persona.

So you think you're cool and can't be cracked? Challenge accepted!

Make no mistake: just like everybody else, we're not invincible at Mozilla. But because we actually take our users' data seriously, we take precautions like this to mitigate the effects of an attack, even in the unfortunate event of a successful security breach in one of our systems.

If you're responsible for user data, so should you.

If you'd like to discuss this post, please leave a comment at the Mozilla Webdev blog. Thanks!

June 08, 2012 07:00 AM

May 31, 2012

Greg Lund-Chaix

Large Moodle downloads die prematurely when served through Varnish

Varnish and Moodle, to be blunt, hate each other. So much so that for my Moodle 1.9.x sites, I simply instruct Varnish to return(pass) without even trying to cache anything on a Moodle site. Today, however, I discovered even that is insufficient. Here’s what happened:

A user was reporting that when downloading large files from within Moodle (500mb course zip backups in this case), they’d stop at approximately 200mb. A look at varnishlog showed that Varnish was properly seeing that it’s a Moodle request that had a “Cache-Control: no-cache” header and didn’t even try to cache it before sending the request off to the backend. The backend was behaving exactly as expected and serving up the file. At some point, however, the download simply terminates before completion. No indications in the Varnish or Apache logs, nothing. It just … stops.


So I put the following code in my VCL in vcl_recv:

if (req.url ~ "file.php") {
return (pipe);


Note: this must go into the VCL before the line in vcl_recv that checks the Cache-Control header, otherwise it’ll pass before it gets to the pipe:

if (req.url ~ "file.php") {
return (pipe);

# Force lookup if the request is a no-cache request from the client
if (req.http.Cache-Control ~ "no-cache") {
return (pass);

Share this: Digg Facebook Google Bookmarks Furl Print Reddit Slashdot StumbleUpon Technorati TwitThis Fark LinkedIn Ma.gnolia NewsVine Pownce Tumblr

by Greg at May 31, 2012 02:42 AM

May 30, 2012

Frédéric Wenzel

Fun with ebtables: Routing IPTV packets on a home network

In my home network, I use IPv4 addresses out of the 10.x.y.z/8 private IP block. After AT&T U-Verse contacted me multiple times to make me reconfigure my network so they can establish a large-scale NAT and give me a private IP address rather than a public one (this might be material for a whole separate post), I reluctantly switched ISPs and now have Comcast. I did, however, keep AT&T for television. Now, U-Verse is an IPTV provider, so I had to put the two services (Internet and IPTV) onto the same wire, which as it turned out was not as easy as it sounds. <!--more-->

tl;dr: This is a "war story" more than a crisp tutorial. If you really just want to see the ebtables rules I ended up using, scroll all the way to the end.

IPTV uses IP Multicast, a technology that allows a single data stream to be sent to a number of devices at the same time. If your AT&T-provided router is the centerpiece of your network, this works well: The router is intelligent enough to determine which one or more receivers (and on what LAN port) want to receive the data stream, and it only sends data to that device (and on that wire).

Multicast, the way it is supposed to work: The source server (red) sending the same stream to multiple, but not all, receivers (green).

Turns out, my dd-wrt-powered Cisco E2000 router is--out of the box--not that intelligent and, like most consumer devices, will turn such multicast packets simply into broadcast packets. That means, it takes the incoming data stream and delivers it to all attached ports and devices. On a wired network, that's sad, but not too big a deal: Other computers and devices will see these packets, determine they are not addressed to them, and drop the packets automatically.

Once your wifi becomes involved, this is a much bigger problem: The IPTV stream's unwanted packets easily satisfy the wifi capacity and keep any wifi device from doing its job, while it is busy discarding packets. This goes so far as to making it entirely impossible to even connect to the wireless network anymore. Besides: Massive, bogus wireless traffic empties device batteries and fills up the (limited and shared) frequency spectrum for no useful reason.

Suddenly, everyone gets the (encrypted) data stream. Whoops.

One solution for this is only to install manageable switches that support IGMP Snooping and thus limit multicast traffic to the relevant ports. I wasn't too keen on replacing a bunch of really expensive new hardware though.

In comes ebtables, part of netfilter (the Linux kernel-level firewall package). First I wrote a simple rule intended to keep all multicast packets (no matter their source) from exiting on the wireless device (eth1, in this case).

ebtables -A FORWARD -o eth1 -d Multicast -j DROP

This works in principle, but has some ugly drawbacks:

  1. -d Multicast translates into a destination address pattern that also covers (intentional) broadcast packets (i.e., every broadcast packet is a multicast packet, but not vice versa). These things are important and power DHCP, SMB networking, Bonjour, ... . With a rule like this, none of these services will work anymore on the wifi you were trying to protect.
  2. -o eth1 keeps us from flooding the wifi, but will do nothing to keep the needless packets sent to wired devices in check. While we're in the business of filtering packets, might as well do that too.

So let's create a new VLAN in the dd-wrt settings that only contains the incoming port (here: W) and the IPTV receiver's port (here: 1). We bridge it to the same network, because the incoming port is not only the source of IPTV, but also our connection to the Internet, so the remaining ports need to be able to connect to it still.

dd-wrt vlan settings

Then we tweak our filters:

ebtables -A FORWARD -d Broadcast -j ACCEPT
ebtables -A FORWARD -p ipv4 --ip-src ! -o ! vlan1 -d Multicast -j DROP

This first accepts all broadcast packets (which it would do by default anyway, if it wasn't for our multicast rule), then any other multicast packets are dropped if their output device is not vlan1, and their source IP address is not local.

With this modified rule, we make sure that any internal applications can still function properly, while we tightly restrict where external multicast packets flow.

That was easy, wasn't it!

Some illustrations courtesy of Wikipedia.

May 30, 2012 07:00 AM

May 21, 2012

Lance Albertson

Trying out Ganeti with Vagrant

Ganeti is a very powerful tool but often times people have to look for spare hardware to try it out easily. I also wanted to have a way to easily test new features of Ganeti Web Manager (GWM) and Ganeti Instance Image without requiring additional hardware. While I do have the convenience of having access to hardware at the OSU Open Source Lab to do my testing, I'd rather not depend on that always. Sometimes I like trying new and crazier things and I'd rather not break a test cluster all the time. So I decided to see if I could use Vagrant as a tool to create a Ganeti test environment on my own workstation and laptop.

This all started last year while I was preparing for my OSCON tutorial on Ganeti and was manually creating VirtualBox VMs to deploy Ganeti nodes for the tutorial. It worked well but soon after I gave the tutorial I discovered Vagrant and decided to adapt my OSCON tutorial with Vagrant. Its a bit like the movie Inception of course, but I was able to successfully get Ganeti working with Ubuntu and KVM (technically just qemu) and mostly functional VMs inside of the nodes. I was also able to quickly create a three-node cluster to test failover with GWM and many facets of the webapp.

The vagrant setup I have has two parts:

  1. Ganeti Tutorial Puppet Module
  2. Ganeti Vagrant configs

The puppet module I wrote is very basic and isn't really intended for production use. I plan to re-factor it in the coming months into a completely modular production ready set of modules. The node boxes are currently running Ubuntu 11.10 (I've been having some minor issues getting 12.04 to work), and the internal VMs you can deploy are based on the CirrOS Tiny OS. I also created several branches in the vagrant-ganeti repo for testing various versions of Ganeti which has helped the GWM team implement better support for 2.5 in the upcoming release.

To get started using Ganeti with Vagrant, you can do the following:

git clone git://
git submodule update --init
gem install vagrant
vagrant up node1
vagrant ssh node1
gnt-cluster verify

Moving forward I plan to implement the following:

  • Update tutorial documentation
  • Support for Xen and LXC
  • Support for CentOS and Debian as the node OS

Please check out the README for more instructions on how to use the Vagrant+Ganeti setup. If you have any feature requests please don't hesitate to create an issue on the github repo.

by lance at May 21, 2012 06:09 AM

April 26, 2012

Jeff Sheltren

Memcached and PECL memcache on CentOS and Fedora

At Tag1 Consulting we do a lot of work on increasing web site performance, especially around Drupal sites. One of the common tools we use is memcached combined with the Drupal Memcache module. In Drupal, there are a number of different caches which are stored in the (typically MySQL) database by default. This is good for performance as it cuts down on potentially large/slow SQL queries and PHP execution needed to display content on a site. The Drupal Memcache module allows you to configure some or all of those caches to be stored in memcached instead of MySQL, typically these cache gets/puts in memcache are much faster than they would be in MySQL, and at the same time it decreases work load on the database server. This is all great for performance, but it involves setting up an additional service (memcached) as well as adding a PHP extension in order to communicate with memcached. I've seen a number of guides on how to install these things on Fedora or CentOS, but so many of them are out-dated or give instructions which I wouldn't suggest such as building things from source, installing with the 'pecl' command (not great on a package based system), or using various external yum repositories (some of which don't mix well with the standard repos). What follows is my suggested method for installing these needed dependencies in order to use memcached with Drupal, though the same process should be valid for any other PHP script using memcache.

PECL Packages

For the Drupal Memcache module, either the PECL memcache or PECL memcached (note the 'd'!) extensions can be used. While PECL memcached is newer and has some additional features, PECL memcache (no 'd'!) tends to be better tested and supported, at least for the Drupal Memcache module. Yes, the PECL extension names are HORRIBLE and very confusing to newcomers! I almost always use the PECL memcache extension because I've had some strange behavior in the past using the memcached extension; likely those problems are fixed now, but it's become a habit and personal preference to use the memcache extension.

Installing and Configuring memcached

The first step is to get memcached installed and configured. CentOS 5 and 6 both include memcached in the base package repo, as do all recent Fedora releases. To install memcached is simply a matter of:
# yum install memcached

Generally, unless you really know what you're doing, the only configuration option you'll need to change is the amount of memory to allocate to memcached. The default is 64MB. That may be enough for small sites, but for larger sites you will likely be using multiple gigabytes. It's hard to recommend a standard size to use as it will vary by a large amount based on the site. If you have a "big" site, I'd say start at 512MB or 1GB; if you have a smaller site you might leave the default, or just bump it to 512MB anyway if you have plenty of RAM on the server. Once it's running, you can watch the memory usage and look for evictions (removal of a cache item once the cache is full) to see if you might want to increase the memory allocation.

On all Fedora / CentOS memcached packages, the configuration file is stored in /etc/sysconfig/memcached. By default, it looks like this:


To increase the memory allocation, adjust the CACHESIZE setting to the number of MB you want memcached to use.

If you are running memcached locally on your web server (and only have one web server), then I strongly recommend you also add an option for memcached to listen only on your loopback interface (localhost). Whether or not you make that change, please consider locking down the memcached port(s) with a firewall. In order to listen only on the interface, you can change the OPTIONS line to the following:


See the memcached man page for more info on that or any other settings.

Once you have installed memcached and updated the configuration, you can start it up and configure it to start on boot:

# service memcached start
# chkconfig memcached on

CentOS / RHEL PECL Module Install


If you are on Fedora and using PHP from the base repo in the distribution, then installation of the PECL extension is easy. Just use yum to install whichever PECL extension you choose:

# yum install php-pecl-memcache


# yum install php-pecl-memcached

CentOS 5 / RHEL 5

CentOS and RHEL can be a bit more complicated, especially on EL5 which ships with PHP 5.1.x, which is too old for most people. Here are the options I'd suggest for EL5:

  • If you are OK using the PHP provided with EL5, then you can get the PECL extensions from EPEL. Once you've enabled the EPEL repository (instructions), you can install either PECL extension by using the same yum commands outlined above in the Fedora section.
  • If you want to use PHP 5.2 or PHP 5.3 with EL5, I suggest using the IUS repositories (IUS repo instructions). Note that IUS provides the PECL memcache extension, but not the PECL memcached extension. Based on which PHP version you decide to use, you can install the PECL memcache extension with either:

    # yum install php52-pecl-memcache


    # yum install php53u-pecl-memcache

CentOS 6 / RHEL 6

EL6 ships with PHP 5.3, though it is an older version than is available for EL6 at IUS. If you are using the OS-provided PHP package, then you can install the PECL memcache extension from the base OS repo. If you want the PECL memcached extension, it is not in the base OS repo, but is available in EPEL. See the instructions linked from the CentOS 5 section above if you need to enable the EPEL repo.

# yum install php-pecl-memcache

Or, enable EPEL and then run:

# yum install php-pecl-memcached

As with EL5, some people running EL6 will also want the latest PHP packages and can get them from the IUS repositories. If you are running PHP from IUS under EL6, then you can install the PECL memcache extension with:

# yum install php53u-pecl-memcache

Similar to EL5, the IUS repo for EL6 does not include the PECL memcached module.

PECL Memcache Configuration

If you are using PECL memcache extension and will be using the clustering option of the Drupal Memcache module which utilizes multiple memcached instances, then it is important to set the hash strategy to "consistent" in the memcache extension configuration. Edit /etc/php.d/memcache.ini and set (or un-comment) the following line:


If you are using the PECL memcached module, this configuration is done at the application level (e.g. in your Drupal settings.php).

Once you've installed the PECL memcache (or memcached) extension, you will need to reload httpd in order for PHP to see the new extension. You'll also need to reload httpd whenever you change the memcache.ini configuration file.

# service httpd reload


If you have SELinux enabled (you should!), I have an older blog post with instructions on configuring SELinux for Drupal.

That's it, you're now good to go with PHP and memcache!

by jeff at April 26, 2012 06:02 PM

cfengine 3.3.0 packages for Fedora / CentOS / RHEL

As I've used cfengine less and less recently the packages in Fedora and EPEL have been a bit neglected. At one point someone stepped up to update them, but then nothing ever came of it. I've finally updated the packages to the latest upstream version as of this writing (3.3.0) in Fedora 16, Fedora 17, Fedora Devel, and EPEL 6. They should be pushed to the updates-testing repos for each of those releases soon if not already there. There are some package changes since the last 3.x release, so any testing people can do would be appreciated.

I've uploaded EL6 and F17 RPMs here for reference:

Note that these are quite different from the upstream-provided RPMs which simply dump everything in /var/cfengine. The good news here is I've actually provided a source RPM for those that need to tweak the build. Also, I hit some configure errors when attempting to build on EL5 which I haven't worked out yet -- looks like an upstream bug with the configure script to me, so there are no EL5 packages at the moment.

If anyone is willing to co-maintain these in Fedora and/or EPEL with me, please let me know.

by jeff at April 26, 2012 04:06 PM

December 21, 2011

Jeff Sheltren

Stop Disabling SELinux!

I see a lot of people coming by #centos and similar channels asking for help when they’re experiencing a problem with their Linux system. It amazes me how many people describe their problem, and then say something along the lines of, “and I disabled SELinux...”. Most of the time SELinux has nothing to do with the problem, and if SELinux is the cause of the problem, why would you throw out the extra security by disabling it completely rather than configuring it to work with your application? This may have made sense in the Fedora 3 days when selinux settings and tools weren’t quite as fleshed out, but the tools and the default SELinux policy have come a long way since then, and it’s very worthwhile to spend a little time to understand how to configure SELinux instead of reflexively disabling it. In this post, I’m going to describe some useful tools for SELinux and walk through how to configure SELinux to work when setting up a Drupal web site using a local memcached server and a remote MySQL database server -- a pretty common setup for sites which receive a fair amount of traffic.

This is by no means a comprehensive guide to SELinux; there are many of those already!

Too Long; Didn’t Read Version

If you’re in a hurry to figure out how to configure SELinux for this particular type of setup, on CentOS 6, you should be able to use the following two commands to get things working with SELinux:
# setsebool -P httpd_can_network_connect_db 1
# setsebool -P httpd_can_network_memcache 1

Note that if you have files existing somewhere on your server and you move them to the webroot rather than untar them there directly, you may end up with SELinux file contexts set incorrectly on them which will likely deny access to apache to read those files. If you are having a related problem, you’ll see something like this in your /var/log/audit/audit.log:
type=AVC msg=audit(1324359816.779:66): avc: denied { getattr } for pid=3872 comm="httpd" path="/var/www/html/index.php" dev=dm-0 ino=549169 scontext=root:system_r:httpd_t:s0 tcontext=root:object_r:user_home_t:s0 tclass=file

You can solve this by resetting the webroot to its default file context using the restorecon command:
# restorecon -rv /var/www/html

Server Overview

I’m going to start with a CentOS 6 system configured with SELinux in targeted mode, which is the default configuration. I’m going to be using httpd, memcached, and PHP from the CentOS base repos, though the configuration wouldn’t change if you were to use the IUS PHP packages. MySQL will be running on a remote server which gives improved performance, but means a bit of additional SELinux configuration to allow httpd to talk to a remote MySQL server. I’ll be using Drupal 7 in this example, though this should apply to Drupal 6 as well without any changes.

Initial Setup

Here we will setup some prerequisites for the website. If you already have a website setup you can skip this section.

We will be using tools such as audit2allow which is part of the policycoreutils-python package. I believe this is typically installed by default, but if you did a minimal install you may not have it.
# yum install policycoreutils-python

Install the needed apache httpd, php, and memcached packages:
# yum install php php-pecl-apc php-mbstring php-mysql php-pecl-memcache php-gd php-xml httpd memcached

Startup memcached. The CentOS 6 default configuration for memcached only listens on, this is great for our testing purposes. The default of 64M of RAM may not be enough for a production server, but for this test it will be plenty. We’ll just start up the service without changing any configuration values:
# service memcached start

Startup httpd. You may have already configured apache for your needs, if not, the default config should be enough for the site we’ll be testing.
# service httpd start

If you are using a firewall, then you need to allow at least port 80 through so that you can access the website -- I won’t get into that configuration here.

Install Drupal. I’ll be using the latest Drupal 7 version (7.9 as of this writing). Direct link:
Download the tarball, and expand it to the apache web root. I also use the --strip-components=1 argument to strip off the top level directory, otherwise it would expand into /var/www/html/drupal-7.9/
# tar zxf drupal-7.9.tar.gz -C /var/www/html --strip-components=1

Also, we need to get the Drupal site ready for install by creating a settings.php file writable by apache, and also create a default files directory which apache can write to.
# cd /var/www/html/sites/default/
# cp default.settings.php settings.php
# chgrp apache settings.php && chmod 660 settings.php
# install -d -m 775 -g apache files

Setup a database and database user on your MySQL server for Drupal. This would be something like this:
mysql> CREATE DATABASE drupal;
mysql> GRANT ALL ON drupal.* TO drupal_rw@web-server-ip-here IDENTIFIED BY 'somepassword';

Test this out by using the mysql command line tool on the web host.
# mysql -u drupal_rw -p -h drupal

That should connect you to the remote MySQL server. Be sure that is working before you proceed.

Now for the Fun Stuff

If you visit your new Drupal site at http://your-hostname-here, you’ll be presented with the Drupal installation page. Click ahead a few times, setup your DB info on the Database Configuration page -- you need to expand “Advanced Options” to get to the hostname field since it assumes localhost. When you click the button to proceed, you’ll probably get an unexpected error that it can’t connect to your database -- this is SELinux doing its best to protect you!

Allowing httpd to Connect to a Remote Database

So what just happened? We know the database was setup properly to allow access from the remote web host, but Drupal is complaining that it can’t connect. First, you can look in /var/log/audit/audit.log which is where SELinux will log access denials. If you grep for ‘httpd’ in the log, you’ll see something like the following:
# grep httpd /var/log/audit/audit.log
type=AVC msg=audit(1322708342.967:16804): avc: denied { name_connect } for pid=2724 comm="httpd" dest=3306 scontext=unconfined_u:system_r:httpd_t:s0 tcontext=system_u:object_r:mysqld_port_t:s0 tclass=tcp_socket

That is telling you, in SELinux giberish language, that the httpd process was denied access to connect to a remote MySQL port. For a better explanation of the denial and some potential fixes, we can use the ‘audit2why’ utility:
# grep httpd /var/log/audit/audit.log | audit2why
type=AVC msg=audit(1322708342.967:16804): avc: denied { name_connect } for pid=2724 comm="httpd" dest=3306 scontext=unconfined_u:system_r:httpd_t:s0 tcontext=system_u:object_r:mysqld_port_t:s0 tclass=tcp_socket

Was caused by:
One of the following booleans was set incorrectly.
Allow HTTPD scripts and modules to connect to the network using TCP.

Allow access by executing:
# setsebool -P httpd_can_network_connect 1
Allow HTTPD scripts and modules to connect to databases over the network.

Allow access by executing:
# setsebool -P httpd_can_network_connect_db 1

audit2why will analyze the denial message you give it and potentially explain ways to correct it if it is something you would like to allow. In this case, there are two built in SELinux boolean settings that could be enabled for this to work. One of them, httpd_can_network_connect, will allow httpd to connect to anything on the network. This might be useful in some cases, but is not very specific. The better option in this case is to enable httpd_can_network_connect_db which limits httpd generated network connections to only database traffic. Run the following command to enable that setting:
# setsebool -P httpd_can_network_connect_db 1

It will take a few seconds and not output anything. Once that completes, go back to the Drupal install page, verify the database connection info, and click on the button to continue. Now it should connect to the database successfully and proceed through the installation. Once it finishes, you can disable apache write access to the settings.php file:
# chmod 640 /var/www/html/sites/default/settings.php

Then fill out the rest of the information to complete the installation.

Allowing httpd to connect to a memcached server

Now we want to setup Drupal to use memcached instead of storing cache information in MySQL. You’ll need to download and install the Drupal memcache module available here:
Install that into your Drupal installation, and add the appropriate entries into settings.php. For this site, I did that with the following:
# mkdir /var/www/html/sites/default/modules
# tar zxf memcache-7.x-1.0-rc2.tar.gz -C /var/www/html/sites/default/modules

Then edit settings.php and add the following two lines:
$conf['cache_backends'][] = 'sites/default/modules/memcache/';
$conf['cache_default_class'] = 'MemCacheDrupal';

Now if you reload your site in your web browser, you’ll likely see a bunch of memcache errors -- just what you wanted! I bet it’s SELinux at it again! Check out /var/log/audit/audit.log again and you’ll see something like:
type=AVC msg=audit(1322710172.987:16882): avc: denied { name_connect } for pid=2721 comm="httpd" dest=11211 scontext=unconfined_u:system_r:httpd_t:s0 tcontext=system_u:object_r:memcache_port_t:s0 tclass=tcp_socket

That’s very similar to the last message, but this one is for a memcache port. What does audit2why have to say?
# grep -m 1 memcache /var/log/audit/audit.log | audit2why
type=AVC msg=audit(1322710172.796:16830): avc: denied { name_connect } for pid=2721 comm="httpd" dest=11211 scontext=unconfined_u:system_r:httpd_t:s0 tcontext=system_u:object_r:memcache_port_t:s0 tclass=tcp_socket

Was caused by:
One of the following booleans was set incorrectly.
Allow httpd to act as a relay

Allow access by executing:
# setsebool -P httpd_can_network_relay 1
Allow httpd to connect to memcache server

Allow access by executing:
# setsebool -P httpd_can_network_memcache 1
Allow HTTPD scripts and modules to connect to the network using TCP.

Allow access by executing:
# setsebool -P httpd_can_network_connect 1

Again, audit2why gives us a number of options to fix this. The best bet is to go with the smallest and most presice change for our needs. In this case there’s another perfect fit: httpd_can_network_memcache. Enable that boolean with the following command:
# setsebool -P httpd_can_network_memcache 1

Success! Now httpd can talk to memcache. Reload your site a couple of times and you should no longer see any memcache errors. You can be sure that Drupal is caching in memcache by connecting to the memcache CLI (telnet localhost 11211) and typing ‘stats’. You should see some number greater than 0 for ‘get_hits’ and for ‘bytes’.

What are all these booleans anyway?

Now we’ve used a couple SELinux booleans to allow httpd to connect to memcached and MySQL. You can see a full list of booleans which you can control by using the command ‘getsebool -a’. They are basically a preset way for you to allow/deny certain pre-defined access controls.

Restoring default file contexts

As I mentioned briefly in the ‘TL;DR’ section, another common problem people experience is with file contexts. If you follow my instructions exactly, you won’t have this problem because we untar the Drupal files directly into the webroot, so they will inherit the default file context for /var/www/html. If, however, you were to untar the files in your home directory, and then use ‘mv’ or ‘cp’ to place them in /var/www/html, they will maintain the user_home_t context which apache won’t be able to read by default. If this is happening to you, you will see the file denials logged in /var/log/audit/audit.log -- something like this:
type=AVC msg=audit(1324359816.779:66): avc: denied { getattr } for pid=3872 comm="httpd" path="/var/www/html/index.php" dev=dm-0 ino=549169 scontext=root:system_r:httpd_t:s0 tcontext=root:object_r:user_home_t:s0 tclass=file

The solution in this case is to use restorecon to reset the file contexts back to normal:
# restorecon -rv /var/www/html

Update: It was noted that I should also mention another tool for debugging audit messages, 'sealert'. This is provided in the setroubleshoot-server package and will also read in the audit log, similar to what I described with audit2why.
# sealert -a /var/log/audit/audit.log


by jeff at December 21, 2011 11:36 PM

November 25, 2011

Frédéric Wenzel

Day 329 - Ready for the Sunset

Day 329 - Ready for the Sunset

A family of tourists, getting ready to watch the sun set on the Pacific coast. I love silhouette photos like this: It's fun to see the different characters with their body shapes and postures.

November 25, 2011 08:00 AM

November 08, 2011

Jeff Sheltren

CentOS Continuous Release

The CentOS Continuous Release repository (“CR”) was first introduced for CentOS 5.6, and currently exists for both CentOS 5 and CentOS 6. The CR repo is intended to provide package updates which have been released for the next point release upstream (from RHEL) which has not yet been officially released by CentOS yet due to delays around building, testing, and seeding mirrors for a new point release. For example, this means that once RedHat releases RHEL 5.8, CentOS will include package updates from 5.8 base and updates in CentOS 5.7 CR repo until the time that CentOS is able to complete the release of CentOS 5.8. For admins, this means less time without important security updates and the ability to be on the latest packages released in the latest RHEL point release.

Details on the CR Repo

What’s included in CR and how might it affect your current CentOS installs? At this point, the CR repo is used only for package updates which are part of the next upstream point release. For example, for CentOS 5.7, once Red Hat releases RHEL 5.8, the CR repo will contain updates from upstream base and updates repos. When a new update for RHEL 5.8 is released, it will be built in the CentOS build system, go through a relatively minimal amount of QA by the CentOS QA team, and then will be pushed to the CentOS 5.7 CR repo. This process will continue until the time that CentOS releases its own 5.8 release. Once CentOS releases 5.8, the CR repo will be cleared out until the time that RedHat releases the next (5.9) point release.

The CR repo is not enabled by default, so it is up to a system administrator to enable it if desired. That means, by default, you won’t see packages added to the CR repo. Installing the repo is very easy as it’s now part of the CentOS extras repository which is enabled by default. To enable CR, you simply have to:

yum install centos-release-cr

If you don’t have CentOS Extras enabled, you can browse into the extras/ directory for the release of CentOS you’re currently running and download and install the centos-release-cr package by hand, or manually create a centos-cr.repo in /etc/yum.repos.d/

In my opinion, unless you have an internal process for testing/pushing updates, you should absolutely be using the CR repo. Even if you do have your own local processes for updates, I would consider the CR repo to be part of CentOS updates for all intents and purposes, and pull your updates from there for testing/release. The packages in the CR repo can fix known security issues which without the CR repo you won’t have access to until the next CentOS point release -- and that can sometimes take longer than we’d like!

A New Proposal: Include CR by Default

In a recent post to the CentOS Developers list, Karanbir Singh proposed moving the CR repo into the main release for 6.x. What this would mean is for CentOS 6.x and onward, we would see the base OS and ISO directories be updated for each point release, but in general, updates would be pushed to a central 6/ directory, basically incorporating CR into what is currently considered updates/.

This proposal is different from the current CR setup in that it incorporates CR into the release by default, and puts less reliance on the old point release model. This will help ensure that people are always running the latest security updates as well as take a bit of pressure off of CentOS developers and QA team when trying to build, test, and release the next point release. If the package updates are already released and in use, point releases become less important (though still useful for new installs).

Incorporating CR more into the main release doesn’t mean that point releases will go away completely. They will still include updated base packages and ISO images, typically with installer bug fixes and/or new and updated drivers. In general, I see this as a good move: it means more people will be getting security updates by default instead of waiting during the time lapse between upstream RHEL releases and the time it takes for CentOS to rebuild, test, and release that point release. Having those packages available by default is great, especially for those admins who don’t pay close attention and wouldn’t otherwise enable the CR repo. It should be noted that at this point, the incorporation of CR into the main release is only being discussed for CentOS 6.x onward and won’t change anything in the 5.x releases where people will still need to manually opt-in to the CR packages.



by jeff at November 08, 2011 04:03 PM

August 14, 2011

Justin Dugger

Solving the Sunday Puzzle with coreutils & grep

Not too long ago, a puzzler admitted to solving the NPR Sunday Puzzle with a computer. Since I hate word puzzles quite a bit, I'd taken similar steps in the past. For example, this recent challenge:

Fill in the 4x4 square crossword with common uncapitalized words, using no letter more than once:


My secret weapon? GNU coreutils. Regex are a great tool, but I rarely have to use some of the more obscure features, which hurts on the occasions where they're called for. So the NPR puzzle can be a good way to practice and learn!

Edit: Commenter hggdh points out that the heavy worker here is grep, which is not part of coreutils. If your OS vendor doesn't provide grep, GNU grep sounds like a suitable replacement.

  1. I'm using the American English dictionary provided by Ubuntu /usr/share/dict/words. The format of this file is one word per line. Every form of a word, including contractions and possessives, gets its own line. We use | (pipe) to chain the output of one command as the input of the next. Cat simply outputs a file, and wc -l counts the lines in it.

    laptop:~$ cat /usr/share/dict/words | wc -l


  2. I assume no apostrophes are in the puzzle. Grep reads input and outputs only those lines that match a regular expression (regex). Using the -v option to grep changes it to output only lines that don't match our pattern.

    laptop:~$ cat /usr/share/dict/words | grep -v "'" | wc -l


  3. That's a lot of words to fuddle around with, so lets winnow this down. Firstly, we only care about 4 letter words. We can use grep to give us only these words, using the regular expression "^....$". Caret (^) represents the start of a line, and $ represents the end of one. Each period is a single free choice character for grep, matching exactly one character in the input.

    laptop:~$ cat /usr/share/dict/words | grep -v "'" | grep "^....$" | wc -l


  4. Having cut the search space by 96 percent, we now turn to the clues for... clues. Fortunately, nags and newts define which letters every word can start with. Grep treats symbols within [] as alternatives, meaning any any one symbol within can match the input. Below alters the regex from 3 to only match words starting with a, g, s, e, w or t.

    laptop:~$ cat /usr/share/dict/words | grep -v "'" | grep "^[agsewt]...$" | wc -l


  5. Rules say no two letters repeat in the puzzle, so we'll exclude all words with the letters from nags and newts anywhere other than the first letter. As an alternative to -v, we can use carets inside brackets to indicate "not".

    laptop:~$ cat /usr/share/dict/words | grep -v "'" | grep "^[agsewt][^nagsewt][^nagsewt][^nagsewt]$" | wc -l


  6. Next, we can rule out words with repeated letters, like solo and wool. To do this quickly, we'll need to use backreferences. Backreferences can be slow, but since our dataset is so tiny, it will be fine to add it to the end of the pipeline.

    cat /usr/share/dict/words | grep -v "'" | grep "^[agsewt][^nagsewt][^nagsewt][^nagsewt]$" | grep -vE "([a-z]).*(\1)" | wc -l


  7. Starting to get close! From here on out, this plays a lot like sudoku. Our goal is now to start constructing regex for each word. We replace the leading alternative for a specific letter. To start off, we've only got 7 options for 2 across:

    laptop:~$ cat /usr/share/dict/words | grep -v "'" | grep "^e[^nagsewt][^nagsewt][^nagsewt]$" | grep -vE "([a-z]).*(\1)"








We now write a different regex without negations to get the same list.

`laptop:~$ cat /usr/share/dict/words | grep "^e[cmpuvx][hipr][cloru]$" | grep -vE "([a-z]).*(\1)" | wc -l`


Now we build a similar regex for 2 down. Adding in what we know about it's intersection with 2 across (cmpuvx) is the sudoku like step:

`laptop:~$ cat /usr/share/dict/words | grep -v "'" | grep "^a[cmpuvx][^nagsewt][^nagsewt]$" | grep -vE "([a-z]).*(\1)"`






We rewrite this one as

laptop:~$ cat /usr/share/dict/words | grep -v "'" | grep "^a[cmv][hio][dky]$" | grep -vE "([a-z]).*(\1)" | wc -l


Applying the same logic to 3 down yields "^g[ir][lriu][bdlmp]$", and 4 down yields "^s[lu][cilmoru][bdfhkmopr]$".

  1. The last positions in each down regex constructs a new regex for 4 across:

cat /usr/share/dict/words | grep -v "'" | grep "^t[dky][bdlmp][bdfhkmopr]$" | grep -vE "([a-z]).*(\1)"


A unique solution to 4 across!

  1. Revisiting 2 down with this new fact also yields a unique answer. I leave it from here as an exercise to the reader to solve the entire puzzle.

by Justin Dugger at August 14, 2011 12:00 AM

August 09, 2011


New Speaker Announced: Dr. David A. Wheeler

We've added our final speaker to the GOSCON Cost Take Out Panel: David A. Wheeler. Dr. Wheeler is a Research Staff Member at the Institute for Defense Analyses and is an expert on developing secure software and the use of open source software in the security space. He is the author of several well known works in this space, including Secure Programming for Linux and Unix HOWTO, Why Open Source Software / Free Software (OSS/FS)?, Look at the Numbers!, and How to Evaluate OSS/FS Programs. more...

by Leslie at August 09, 2011 08:54 PM

Wayne Moses Burke

Executive Director
Open Forum Foundation

Mr. Moses Burke will be moderating the Building Outside the Box Panel during GOSCON DC 2011 at the Innovation Nation Forum.more...

by Leslie at August 09, 2011 08:48 PM

Alexander B. Howard

Government 2.0 Correspondent

O’Reilly Media

Mr. Howard will be moderating the Cost Take Out Panel during GOSCON DC 2011 at the Innovation Nation Forum.more...

by Leslie at August 09, 2011 08:43 PM

June 19, 2011

Peter Krenesky

Ganeti Web Manager 0.7

Ganeti Web ManagerWe’ve just release version 0.7 of Ganeti Web Manager. Ganeti Web Manager is a Django based web application that allows administrators and clients access to their ganeti clusters. It includes a permissions and quota system that allows administrators to grant access to both clusters and virtual machines. It also includes user groups for structuring access to organizations.

This is the fourth release of Ganeti Web Manager and it contains numerous new features.  It also includes various bug fixes and speed optimizations.  Here is the full CHANGELOG, or read on for the highlights.

Xen Support

Ganeti Web Manager now have full Xen support.  Prior versions could display Xen instances, but now you can create and edit them too.  This as an important addition because Xen is a widely used and mature project.  Now with full hardware virtualization in Linux 3.0, Xen will continue to be an important technology for virtualization.  This was our most often requested feature and we’re glad to have fulfilled it.


Thanks to a large community contribution, internationalization support was added for nearly all aspects of the interface.  Users can switch between their default language and any other.  Currently only a Greek translation is available, but we’d like to see many more languages. If you can read and write another language this is a great opportunity for you to get involved. We’re using Transifex to coordinate people who want to help translate.

Search & Improved Navigation

Administrators of larger cluster can now find objects easier with our search interface.  It includes an Ajax auto-complete feature, along with detailed results.

We’ve also added contextual links wherever we could.  This included ensuring breadcrumbs were properly formatted on each page.  Object Permissions and Object Log were updated to ensure navigating between those screens and Ganeti Web Manager is seamless.

Import Tools

There are now import tools for Nodes.  These work the same as for instances.  The cache updater has also been reworked to support both Nodes and Instances.  It’s now a twisted plugin with modest speed improvements due to Ganeti requests happening asynchronously.

Speed, Scalability, and Bugs

We’ve sought out places where we performed extra and or inefficient database queries.  We identified numerous places where database interaction could be reduced, and pages returned faster.  This is an ongoing process.  We’ll continue to optimize and improve the responsiveness as we find areas of the project we can improve.

Numerous bugs were fixed in both the user interface and the backend.  Notably, the instance creation interface has had several bugs corrected.

Module Releases

We’re building several modules along with Ganeti Web Manager.  The following projects have new releases coinciding with Ganeti Web Manager 0.7:

Django Object Permissions 1.4

  • improved user selection widget
  • speed improvements

Object Log 0.6

  • our first public release
  • speed, scalability, and flexibility improvements

Twisted VNC Auth Proxy

  • our first public release
  • added support for hixie 07 and latest noVNC version.

Want to learn more?

Lance Albertson and I will be speaking about Ganeti & Ganeti Web Manager at several conferences this summer.  Catch us at the following events:

by peter at June 19, 2011 03:49 AM

May 18, 2011

Peter Krenesky

Google I/O 2011

Google I/O LogoFive OSUOSL co-workers and I recently finished a road trip to Google I/O 2011.  We took two cars on an 11 hour drive through scenic southern Oregon and northern California.  We learned more about Android and other technologies shaping the web.  It was also a great opportunity to spend time with each other outside the office.

Monday night we joined about 30 Google Summer of Code mentors for dinner and drinks hosted by the Google Open Source Programs Office.  We’re always grateful for events that bring together friends old and new.  One developer nervously sat down at our table, professing that he didn’t know anyone.  We might not work on the same project, but we’re all part of the open source community.

The highlight of the conference was the double announcement of Android Open Accessory program and Android @ Home.  Both open up Android to integration with third party devices.  These features coupled with near field communications (NFC) stand to dramatically change how we use our mobiles devices to interact with the world around us.  This is not a new idea.  X10 home automation has existed since 1975.  Zigbee and Z-wave are more modern protocols, but also available for years.  The difference here is 100 million Android users and a half million Arduino hackers.

As Phillip Torrone wrote on the Makezine Blog, “There really isn’t an easier way to get analog sensor data or control a motor easier and faster than with an Arduino — and that’s a biggie, especially if you’re a phone and want to do this.”

It won’t be a short road.  We still have obstacles such as higher costs.  A representative from Lighting Science I spoke to at their I/O booth quoted Android@Home enabled LED lights at $30 per bulb.  Android and Arduino might be the right combination of market penetration, eager hackers, and solid platforms for a more integrated environment.

NFC Sticker

My favorite session was How To NFC.   NFC (near field communication) is similar to RFID except it only works within a few centimeters.  Newer android phones can send and receive NFC messages any time except when the phone is sleeping.  NFC chips can also be embedded in paper, like the stickers that came in our I/O Badges.  An NFC enabled app can share data such as a url, or launch a multiplayer game with your friend.  It makes complex tasks as simple as “touch the phone here”.  Android is even smart enough to launch an app required for an NFC message, or send you to the market to install the app you need.  Only the Nexus-S supports NFC now, but this feature is so compelling that others will support it soon too.

The other technical sessions were very useful too, whether you were interested in Android, Chrome, or other Google technologies.  The speakers were knowledgeable on the subject areas they spoke on.  I attended mostly Android talks, and it was great hearing from the people who wrote the APIs we’re trying to use.  The sessions were all filmed and are worth watching online.

by peter at May 18, 2011 10:46 PM

May 03, 2011

Lance Albertson

Rebalancing Ganeti Clusters

One of the best features of Ganeti is its ability to grow linearly by adding new servers easily. We recently purchased a new server to expand our ever growing production cluster and needed to rebalance cluster. Adding and expanding the cluster consisted of the following steps:

  1. Installing the base OS on the new node
  2. Adding the node to your configuration management of choice and/or installing ganeti
  3. Add the node to the cluster with gnt-node add
  4. Check Ganeti using the verification action
  5. Use htools to rebalance the cluster

For simplicity sake I'll cover the last three steps.

Adding the node

Assuming you're using a secondary network, this is how you would add your node:

gnt-node add -s <secondary ip> newnode

Now lets check and make sure ganeti is happy:

gnt-cluster verify

If all is well, continue on otherwise try and resolve any issue that ganeti is complaining about.

Using htools

Make sure you install ganeti-htools on all your nodes before continuing. It requires haskell so just be aware of that requirement. Lets see what htools wants to do first:

$ hbal -m
Loaded 5 nodes, 73 instances
Group size 5 nodes, 73 instances
Selected node group: default
Initial check done: 0 bad nodes, 0 bad instances.
Initial score: 41.00076094
Trying to minimize the CV...
1. g1.osuosl.bak:g2.osuosl.bak g5.osuosl.bak:g1.osuosl.bak 38.85990831 a=r:g5.osuosl.bak f
2. g3.osuosl.bak:g1.osuosl.bak g5.osuosl.bak:g3.osuosl.bak 36.69303985 a=r:g5.osuosl.bak f
3. g2.osuosl.bak:g4.osuosl.bak g5.osuosl.bak:g2.osuosl.bak 34.61266967 a=r:g5.osuosl.bak f


28. g3.osuosl.bak:g1.osuosl.bak g3.osuosl.bak:g5.osuosl.bak 4.93089388 a=r:g5.osuosl.bak
29. g2.osuosl.bak:g1.osuosl.bak g1.osuosl.bak:g5.osuosl.bak 4.57788814 a=f r:g5.osuosl.bak
30. g1.osuosl.bak:g3.osuosl.bak g1.osuosl.bak:g5.osuosl.bak 4.57312216 a=r:g5.osuosl.bak
Cluster score improved from 41.00076094 to 4.57312216
Solution length=30

I've shortened the actual output for the sake of this blog post. Htools automatically calculates which virtual machines to move and how using the least amount of operations. In most these moves, the VMs may simply be migrated, migrated & secondary storage replaced, or migrated, secondary storage replaced, migrated. In our environment we needed to move 30 VMs around out of the total 70 VMs that are hosted on the cluster.

Now lets see what commands we actually would need to run:

$ hbal -C -m

Commands to run to reach the above solution:

echo jobset 1, 1 jobs
echo job 1/1
gnt-instance replace-disks -n g5.osuosl.bak
gnt-instance migrate -f
echo jobset 2, 1 jobs
echo job 2/1
gnt-instance replace-disks -n g5.osuosl.bak
gnt-instance migrate -f
echo jobset 3, 1 jobs
echo job 3/1
gnt-instance replace-disks -n g5.osuosl.bak
gnt-instance migrate -f


echo jobset 28, 1 jobs
echo job 28/1
gnt-instance replace-disks -n g5.osuosl.bak
echo jobset 29, 1 jobs
echo job 29/1
gnt-instance migrate -f
gnt-instance replace-disks -n g5.osuosl.bak
echo jobset 30, 1 jobs
echo job 30/1
gnt-instance replace-disks -n g5.osuosl.bak

Here you can see the commands it wants you to execute. Now you can either put these all in a script and run them, split them up, or just run them one by one. In our case I ran them one by one just to be sure we didn't run into any issues. I had a couple of VMs not migration properly but those were exactly fixed. I split this up into a three day migration running ten jobs a day.

The length of time that it takes to move each VM depends on the following factors:

  1. How fast your secondary network is
  2. How busy the nodes are
  3. How fast your disks are

Most of our VMs ranged in size from 10G to 40G in size and on average took around 10-15 minutes to complete each move. Addtionally, make sure you read the man page for hbal to see all the various features and options you can tweak. For example, you could tell hbal to just run all the commands for you which might be handy for automated rebalancing.


Overall the rebalancing of our cluster went without a hitch outside of a few minor issues. Ganeti made it really easy to expand our cluster with minimal to zero downtime for our hosted projects.

by lance at May 03, 2011 05:55 AM

April 25, 2011

Russell Haering

Cast Preview Release

For the last few months I've been working on and off for Cloudkick (now Rackspace) on a project that we are calling Cast. I'm happy to announce that this afternoon we're releasing Cast version 0.1. The source has been on Github all along, but with this release we feel that the project has finally progressed to a point where:

  1. We've implemented the functionality planned for the first iteration.
  2. The afforementioned functionality actually works against the current version
    of Node.js.
  3. We have a website and documented most of the
    imporant parts.

Thats Great, So What Is It?

In short, Cast is an open-source deployment and service management system.

At Cloudkick we tend to see users deploying their code in one of three ways:

  1. Services are deployed via a configuration management system such as Puppet
    or Chef.
  2. Services are deployed by some sort SSH wrapper such as Fabric or Capistrano.
  3. Services are deployed to a "Platform as a Service" such as Heroku.

But none of these are perfect. Respectively:

  1. The high overhead in interacting with configuration management systems is
    fine when they are managing 'infrastructure' (that is, the systems on which you run your services), but tend to impede a smooth "devops" style workflow with fast iterations and easy deployment and upgrades.
  2. SSH wrappers typically work well enough on small scales, but but they feel
    like a hack, and don't trivially integrate with in-house systems.
  3. Of all the options, people seem to like these the best. The price speaks for
    itself - Platforms as a Service (PaaS) are hugely valuable to their users. The problem is that these platforms are closed systems, inflexible and not very "sysadmin friendly". When they go down, you're trapped. When the pricing or terms change, you're trapped. If they don't or can't do what you want, you're trapped.

With this situation in mind, what could we write for our users? An Open Platform (optionally, as a Service).

What Can it Do?

Using Cast you can:

  1. Upload your application to a server.
  2. Create 'instances' of your application. Think 'staging' and 'production'.
  3. Manage (start, stop, restart, etc) services provided by your application.
  4. Deploy new versions of your application.
  5. Do all of this from the command line or from a REST

We have a lot more interesting features planned. Hint: think "Cast cluster". But if this sounds like something you're interested in, stay tuned, share your thoughts or consider looking into a job at the new San Francisco Rackspace Office

by Russell Haering at April 25, 2011 12:00 AM

April 19, 2011

Greg Lund-Chaix

Facebook in Prineville, a slightly different view

On Friday, Facebook’s Senior Open Programs Manager, David Recordon, took a group of us from the OSL on a fantastic behind-the-scenes tour of the new Facebook data center in Prineville, Oregon. It was an amazing experience that prompted me to think about things I haven’t thought about in quite a few years. You see, long before I was ever a server geek I spent my summers and school holidays working as an apprentice in my family’s heating and air conditioning company. As we were walking through the data center looking at the ground-breaking server technology, I found myself thinking about terms and technologies I hadn’t considered much in years – evaporative cooling, plenums, airflow, blowers. The computing technology is fascinating and ground-breaking, but they’ve been covered exhaustively elsewhere. I’d like to spend some time talking about something a bit less sexy but equally important: how Facebook keeps all those servers from melting down from all the heat they generate.

First, though, some scale. They’re still building the data center – only one of the three buildings has been built so far, and it has less than half of its server rooms completed – but even at a fraction of its proposed capacity the data center was reportedly able to handle 100% of Facebook’s US traffic for a while when they tested it last week. The students we brought with us did a bit of back-of-the-envelope calculation: when the facility is fully built out, we suspect it’ll be able to hold on the order of hundreds of thousands of servers. It’s mind-boggling to think how much heat that many servers must generate. It’s hard enough to keep the vastly-smaller OSL data center cool, the idea of scaling it that large is daunting to say the least. As the tour progressed, I found myself more and more fascinated by the airflow and cooling.

The bottom floor of the facility is all data center floor and offices, while the upper floors are essentially giant plenums (the return air directly above the main floor, and the supply above the return). There is no ductwork, just huge holes (10′x10′) in the ceiling of the data center floor bring the cool air down from the “penthouse”, and open ceilings above the “hot” side of the racks to move the hot air out. A lot of the air movement is passive/convective – hot air rises from the hot side of the racks through the ceiling to the second floor and the cooled air drops down from the third floor onto the “cool” side of the server racks, where it’s pulled back though the servers. The air flow is certainly helped along by the fans in the servers and blowers up in the “penthouse”, but it’s clearly designed to take advantage of the fact that hot air rises and cold air sinks. They pull off a bit of the hot air to heat the offices, and split the rest between exhausting it outside and mixing with outside air and recirculating.

(Click to enlarge)

OK, enough with the talking, here are some pictures. Click on the images to enlarge them. Walking through the flow, we start at the “cool” side of the server racks:
Notice there are no faceplates to restrict the airflow. The motherboards, power supplies, processor heat sinks, and RAM are all completely exposed.

Then we move on to the “hot” side of the racks:
The plastic panels you can see on top of the racks and in the middle image guide the hot air coming out of the servers up through the open ceiling to the floor above. No ductwork needed. There are plastic doors at the ends of the rows to completely seal the hot side from the cold side. It was surprisingly-quiet even here. The fans are larger than standard and low speed. While uncomfortably warm, it was not very loud at all. We could speak normally and be heard easily. Very unlike the almost-deafening roar of a usual data center.

The second “floor” is basically just a big open plenum that connects the exhaust (“hot”) side of the server racks to the top floor in a couple of places (recirculating and/or exhaust, depending on the temperature). It’s a sort of half-floor between the ground floor and the “penthouse” that isn’t walk-able, so we climbed straight up to the top floor – a series of rooms (30′ high and very long) that do several things:

First, outside air is pulled in (the louvers to the right):

The white block/wall on the left is the return air plenum bringing the hot air from the floor below. The louvers above it bring the outside air into the next room.

Mix the outside air with the return air and filter it:

The upper louvers on the right are outside air, lower are return air bringing the hot air up from the servers. The filters (on the left) look like standard disposable air filters. Behind them are much more expensive high-tech filters.

Humidify and cool the air with rows and rows of tiny atomizers (surprisingly little water, and it was weird walking through a building-sized swamp cooler):
The left image shows the back of the air filters. The middle image shows the other side of the room with the water jets. The right image is a closer shot of the water jets/atomizers.

Blowers pull the now-cooled air through the sponges (for lack of a better word) in front of the atomizers and pass it on to be sent down to the servers:

They were remarkably quiet. We could easily speak and be heard over them and it was hard to tell how many (if any) were actually running.

Finally the air is dumped back into the data center through giant holes in the floor:
The first image shows the back of the blowers (the holes in the floor are to the right). The middle image shows the openings down to the server floor (the blowers are off to the left). The third image is looking down through the opening to the server room floor. The orange devices are smoke detectors.

The last room on the top floor is where the the unused hot return air is exhausted outside:

None of the exhaust fans were actually running, the passive airflow was sufficient without any assistance. The grates in the floor open down to the intermediate floor connecting to the hot side of the racks.

No refrigerant is used at all, just evaporative cooling (and that then only when needed). The only electricity used in the cooling system is for the fans and the water pumps. All of it – the louvers, the water atomizers, and the fans – are automatically controlled to maintain a static temperature/humidity down on the data center floor. When we were there, none of the fans (neither intake nor exhaust) appeared to be running, it was cool enough outside that they were passively exhausting all of the air from the data center and pulling in 100% outside air on the supply. As best I could tell, the only fans that were actually running were the little tiny 12V fans actually mounted on the servers.

This design makes great sense. It’s intuitive – hot air rises, cool air falls – and it obviously efficiently takes advantage of that fact. I kept thinking, “this is so simple! Why haven’t we been doing this all along?”

Share this: Digg Facebook Google Bookmarks Furl Print Reddit Slashdot StumbleUpon Technorati TwitThis Fark LinkedIn Ma.gnolia NewsVine Pownce Tumblr

by Greg at April 19, 2011 09:04 PM

April 17, 2011

Lance Albertson

Facebook Prineville Datacenter

Along with the rest of the OSU Open Source Lab crew (including students), I was invited to the grand opening of Facebook's new datacenter yesterday in Prineville, Oregon. We were lucky enough to get a private tour by Facebook's Senior Open Source Manager, David Recordon. I was very impressed with the facility on many levels.

Triplet racks & UPS

Triplet racks & UPS

I was glad I was able to get a close look at their Open Compute servers and racks in person. They were quite impressive. One triplet rack can hold ninty 1.5U servers which can add up quickly. We're hoping to get one or two of these racks at the OSL. I hope they fit as those triplet racks were rather tall!

Web & memcached servers

Web & memcached servers

Here's a look at a bank of their web & memcached servers. You can find the memcached servers with the large banks of RAM in the front of them (72Gs in each server). The web servers were running the Intel open compute boards while the memcached servers were using AMD. The blue LED's on the servers cost Facebook an extra $0.05 per unit compared to green LED's.

Hot aisle

Hot aisle

The hot aisle is shown here and was amazing quiet. Actually, the whole room was fairly quiet which is strange compared to our datacenter. Its because of the design of the open compute servers and the fact that they are using negative/positive airflow in the whole facility to push cold/hot air.



They had a lot of generators behind the building each a size of a bus easily. You can see their substation in the background. Also note the camera in the foreground, they were everywhere not to mention security because of Green Peace.

The whole trip was amazing and was just blown away by the sheer scale. Facebook is planning on building another facility next to this one within the next year. I was really happy that all of the OSL students were able to attend the trip as well as they rarely get a chance to see something like this.

We missed seeing Mark Zuckerburg by minutes unfortunately. We had a three hour drive back and it was around 8:10PM when we left and he showed up at 8:15PM. Damnit!

If you would like to see more of the pictures I took, please check out my album below.

Facebook Prineville Datacenter

Facebook Prineville Datacenter

Thanks David for inviting us!

by lance at April 17, 2011 01:38 AM