29 January

The Key to Snapchat’s Profitability: It’s Dirt Cheap to Run

Note: this article was published in Wired. Check out my handy service storage and bandwidth calculator.

Ever since Snapchat turned down a $3 billion all-cash offer from Facebook this past November, there’s been no shortage of discussion about it and the rest of its photo-sharing-and-messaging service cohort, including WhatsApp, Kik, Japan-based LINE, China-based WeChat, and Korea-based Kakao Talk. Explanations for this phenomenon have ranged from the need to redefine identity for the social-mobile era to the rise of ephemeral, disposable media.

Regardless of why this trend is taking off, however, it’s clear that the so-called messaging “wars” are heating up. As always, the euphoria over “hockey-stick” user growth numbers are beginning to give way to the sobriety of analysis, yielding the inevitable question: can they monetize? Snapchat, with its massive (paper) valuation is at the vanguard of such criticism, especially given the irony that the service is essentially deleting its biggest asset.

So, how can Snapchat effectively monetize without its user data? By operating its service an order of magnitude cheaper than its competitors.

Surprisingly little time has been spent examining how one can rethink a storage-centric infrastructure model for disappearing data. This isn’t just relevant to engineers; it has important implications for helping services like Snapchat save — and therefore make — money. (By the way, that would need to be about $500 million revenue and $200 million profit to justify its $3 billion valuation in November.)

It’s very simple: If the appeal of services like SnapChat is in the photos (“the fuel that social networks run on”), then the costs are in operating that photo sharing-and-serving service, as well as of running any monetization — such as ads — that will be built on top of that infrastructure. But I’d even go so far to argue that making use of advanced infrastructure protocols could let Snapchat get away with paying almost no bandwidth costs for a large subset of media.

How? Well, let’s begin by comparing Snapchat’s infrastructure to that of a more traditional social network: its erstwhile suitor, Facebook.

According to publicly available data, Facebook users upload 350 million images a day. Back when users were adding 220 million photos weekly in 2009, the company was serving upwards of 550,000 images per second at peak — and they did it by storing five copies of each image, downsampled to various levels, in a photo storing-and-serving infrastructure called Haystack. (For obvious reasons, the exact architecture of these systems is not known.)

That gives you a sense of the scope of the infrastructure. But the salient detail here is the total cost of this serving-and-storage — including all-in per-byte cost of bandwidth — which I estimate to be more than $400 million a year.

If you want the details or to play around on your own, here’s a handy service storage and bandwidth calculator.  As a quick summary, here’s what went into my calculation, which also includes ancillary costs such as power, capital for servers, human maintenance, and redundancy. The most important variables in my cost calculation are:

  • the number of images/videos uploaded each month (estimated at ~ 400M photos daily)

  • the size of each image/video (estimated at 3MB)

  • the average number of images/videos served each month (estimated at 9.5% of all images)

  • all-in per-byte bandwidth/serving cost (estimated at $5*10-11)

  • all-in per-byte storage cost (estimated at $5*10-11)

  • exponential growth rate coefficient (r, estimated at ~ 0.076, using Pt = P0ert).

To compare Facebook’s costs to Snapchat’s, however, we also have to include these variables: the mean number of recipients of each Snapchat message (estimated very conservatively at 2.5); and the fraction of total messages that are undelivered (estimated at 10%).

Obviously, we are comparing a much larger service that has advertising — Facebook — to one that is much smaller in scope and doesn’t have any advertising (yet). But I’d argue that this doesn’t really matter, in principle. Because even though Facebook has to make sure its infrastructure can store and serve the data needed to sell ads, the reality is that much of the information that helps advertisers target users is the metadata of user interactions — with whom, where, how, and when (as well as what they ‘like’) — as opposed to the content of what those users are actually saying.

This means that despite their differences, storing and analyzing only the metadata would still allow Snapchat to build similar profiles of its users. This would allow Snapchat to sell ads that target users just as Facebook does (assuming of course that their product can attract a consistent customer base) — and with one huge advantage: lower costs, since Snapchat doesn’t need to store or serve any messages after they’ve been delivered.

This kind of approach to user targeting, with its metadata-centric infrastructure and associated cost savings — is by no means unique to Snapchat. The public revelations about NSA’s surveillance operations point to a similar architecture: Storing the entire content of all intercepted communication would be prohibitive in terms of cost and space, but not so for metadata. In fact, the way the metadata is (theoretically) used to target whatever individuals and groups NSA agents deem to be a threat is not dissimilar to how advertising targeting works. But that’s a separate concern.

What makes Facebook’s (and any other traditional social network’s) photo-serving costs so expensive is having to keep data in a high-availability, low-latency, redundant, multi-master data store that can withstand temporary spikes in traffic load. But much of this expense is unnecessary for storing and processing metadata. Based on some additional assumptions (such as the number of recipients of each message), we can estimate that, even if their per-byte storage costs were 5x higher, Snapchat would only need to pay $35 million a year (under 9% of Facebook’s total estimated infrastructure costs) to handle a similar load — all while accruing a trove of data with similar targeting value.

It’s like getting a mile when you’re only giving an inch.

How could Snapchat reduce their bandwidth and storage costs even further? The key, again, is in the seemingly mundane: infrastructure. There are a number of more complicated optimizations that could make the system even cheaper to operate. For example, Snapchats between parties that are concurrently online could be delivered via peer-to-peer messaging (think Skype). Because these messages would never even flow over Snapchat’s network, it would reduce Snapchat’s delivery costs to nearly nothing. Firewalls are an impediment, of course, but a number of solutions, including proxy servers in the edge of the network, or ICE (RFC 5245) could make this doable relatively soon.  Snapchat could even store encrypted, undelivered messages on other users’ phones, ensuring availability by using erasure coding with sufficient redundancy.  (This means that they could split your media up into many overlapping pieces, only a few of which are needed to reconstitute the entire picture/movie.  Each piece would be given to a different user, encrypted so that no one other than the recipient would be able to glean any information about the data, and so that with high probability, enough users will be online at any time to reconstruct the data.) While it’s hard to guess what fraction of messages are exchanged between parties that are online, the impact of such an infrastructure design would definitely be substantial.

Despite not having to store and service large bits of content, a new generation of messaging services are emerging that can use cost-effective infrastructure operate an order of magnitude more cheaply than the Facebooks of the world.  By storing only the metadata of interactions, they can effectively target users and monetize these systems.  The only questions that remains is whether they can make a compelling enough product to keep users coming back for more.

29 January

Service storage and bandwidth cost calculator

This page allows you to simulate the costs of storage and bandwith for a simple web service like Facebook or Snapchat. For more info on why this might be interesting, please read my post about infrastructure in ephemeral networks.

The defaults are based on data for Facebook based on a number of publicly available data. The system is assumed to consist of a set of people that upload media (photos, for this analysis). In order to model growth, I assume that the number of items uploaded grows according to an exponential function. The default values have been fit using publicly available data about facebook from between 2009 and 2012.

These media are then later assume to be consumed by other users of the system. In the case of services-with-history (e.g. FB), I model the peak and average QPS of data as a fraction of the total amount of data cumulatively stored. In the case of ephemeral networks (e.g. Kik, Snapchat), we model storage and bandwith by estimating the fanout of each message, and the number of messages that are never received (leftover messages are assumed to be stored forever).

Storage and bandwidth costs are estimated based on costs on EC2 and other similar systems.

Feel free to play around with the numbers.

The code is available on github: Web service storage cost simulator.

Questions or comments? Use twitter to reach me: @vijayp (Vijay Pandurangan)

29 October

Your password management strategy stinks: Here’s why

Posted by in mitro, passwords | No Comments

We’ve used passwords since the beginning of computing. But back then you only had to remember one password—a single account let you access all the programs on one computer. You were unlikely to need to get into more than one of those giant machines! As the number of devices and applications we use has exploded, so has the number of passwords with which we have to deal.

Unfortunately, people are quite bad at managing passwords. Recent studiesare full of frightening statistics. Over 60% use the same password on multiple sites — all these sites are then only as secure as the least secure site, a scary thought indeed. People also tend to select passwords that are easier to remember. While this avoids forgotten passwords, it makes them far too easy to guess.

So what gives? Why is this still such a mess?

For consumers, there’s at least been some slow progress. You can log into some services (e.g. Instagram, Quora, Spotify, etc.) using accounts from widely-used platforms such as Google, Facebook, or Twitter instead of creating a new account. But not everyone feels comfortable using platform accounts for fear of being locked out of their data if their platform account becomes inaccessible. Developers are sometimes reticent to agree to provider’s current and future terms of service.

Consumers’ password problems pale in comparison to the corporate world. Since employees have different roles and different data and software access privileges, companies need a way to export business identity — i.e., metadata about an employee and his rights — to cloud providers like Salesforce, Box, or GMail. But this usually requires Software as a Service (SaaS) developers and company administrators to configure and maintain a complex system to speak a special protocol (such as SAML). Consequently, most SaaS sites don’t interface with most business identity systems.

Additionally, business users in particular often share accounts on SaaS services: most services don’t natively support several people working on the same project or data (e.g., tax authority login, Twitter, many analytics products), and this use case is hugely prevalent in the corporate world. In fact, some 40% of business users share accounts, most often via email, spreadsheets or other insecure means.

Companies that try to prevent users from using non-sanctioned services (such as Google Docs, Dropbox, online analytics tools, etc.) for fear of data leakage often end up worsening their situation: in a recent survey, 70% of employees admitted to subverting such policies. Now, not only are the company’s data on third-party sites, IT staff is totally unaware, and have no way of accessing or controlling access to these data.

For large organizations with high employee turnover, the process of creating and deleting new accounts can be quite daunting. Forgetting to terminate users’ accounts means they can continue to access privileged information after they leave. There are some solutions for this in the market now: Okta is geared towards automating the account creation/deletion process for commonly-used SaaS sites, while providing better controls. But it isn’t optimized for the “long tail” of services which are so commonly used and shared in the workplace.

Possible Solutions

Can biometrics be a panacea? Unfortunately not. Passwords are supposed to be secrets, but biometrics are the exact opposite of a secret—you take them and leave copies everywhere you go. Using the same biometric data to authenticate to multiple applications is really no different from using the same password on multiple sites. (Bruce Schneier has a good summary of biometrics here.)

Developers should do a better job of storing passwords. Passwords should not be stored in plaintext (because anyone with access to the database will immediately know all the passwords). Proper practices require hashing and salting passwords. This is a daunting challenge. Even well-established sites often don’t follow best practices: 6.5 million poorly-hashed LinkedIn passwords were recently leaked. Others store passwords in plain text—Troy Hunt has put together a list of the most egregious violators here.

Using the power of modern processors, passwords are surprisingly easy to crack. Based on recent research, a random lower-case 8-character MD5-hashed password would cost only about $190 to crack on AWS (assuming $2.1/hour/machine and ~2B MD5 hashes/sec). Cracking a similar password stored with a stronger hash (e.g., PBKDF2) would cost around $2M.

One of the simplest ways of securing systems is the use of multi-factor authentication (MFA). MFA requires the use of a separate number generated by a program or device, or sent to you via SMS, in addition to a password when logging in to services, or performing certain actions. A good example of this is Google’s Two-Step Authentication. This means that having a user’s password is insufficient to impersonate him.

Unfortunately, it’s unlikely that most services will adopt oAuth, OpenID, SAML, or similar interfaces, store passwords correctly, or support multi-user collaboration in the near future due to the difficulty of implementation and limited payoff. Instead, we should try to build a better framework for managing business identity — starting by better managing passwords. We need a system that gets universal buy-in from the multitude of players in the space by providing a tangible benefit to each: developers, users, IT administrators, security officers, and regulators. It must be easier for customers to use than email, post-it notes, and spreadsheets.

It should enforce using unique passwords on different sites, give administrators more visibility and control over processes, provide strict security guarantees, and provide thorough auditing capabilities. Cryptography needs to be done on the client side so that the service provider does not need to be trusted. Widespread use of such a system will afford a number of other advances including pro-active fraud and suspicious activity detection. Products that address some of these problems exist today (LastPass, 1Password, etc.), but nothing has come close to solving all them.

Wrapping up

As we’ve seen, the password problem is pervasive, serious, and unlikely to get better on its own. What’s needed is software that addresses the challenges listed above, and a shift in both user and developer mindset to take security more seriously. Ideally, the software is designed in a way to galvanize that shift. We’ve recently released a tool for organizations and individuals that starts down that path, but there’s a lot more work to be done. We’d welcome your input and support!

Vijay Pandurangan is the founder and CEO of password collaboration startup Mitro. Vijay learned how to build large software systems at Googlefrom 2002-2008 and at CMU. He worked on cluster management, mobile apps before smartphones and AdWords. He was also the first engineer atJumo (started by Facebook co-founder Chris Hughes), and worked withAcumen Fund (a socially responsible investment firm). He grew up in Montreal and is still a die-hard Canadiens fan.

11 June

Colours in movie posters since 1914

Edit: Buy the movie poster hues (1914-2012) poster

A couple of weeks ago, I was having brunch with Kim-Mai Cutler — we were discussing the new startup I’m building in the enterprise space (if you’re a ui/ux person or awesome engineer looking for something fun to do, drop me a line!) — and I mentioned how I felt that most movie posters these days were very blue and dark. She didn’t fully believe me and challenged me to prove it. I looked around, and found some people had done this with a few posters over the last few years, but I became curious about the longer-term trends and what they would show. So, as any engineer would do, I wrote some code! (The code is open source and lives on github: image analysis.)

Edit: this post is up on Flowing Data, an awesome data visualization blog, YC Hacker news!, and Gizmodo. I will be doing a follow-on post with much better analysis and much more data. Follow @vijayp on twitter and stay tuned!


The number of posters I was able to get varied based on the year:

I first made a unified view of colour trends in movie posters since 1914. Ignoring black and white colours, I generated a horizontal strip of hues in HSL. The width of each hue represents the amount of that hue across all images for that year, and the saturation and lighting were the weighted average for all matching pixels. Since HSL has a fixed order, comparisons can be made between years visually. (You can buy the movie poster hues poster here.) Click on the image below for a more detailed view:

Next, I made a similar unified view of  generic colour trends in movie posters since 1914, but here lightness and saturation are both ignored. This makes the distribution of hues much more clear, but hides the average “darkness” of the photos.

Finally, I have created a pie chart representing the colour distribution of a specific year’s movie posters. (This should probably be animated and a line graph, more on that in the future work section)


First off, it is true that movie posters are much more blue, and much less orange than they used to be. QED :) This page also talks about the blue/orange colours in movies.

This does appears to be a steady trend since 1915. Could this be related to evolution in the physical process of poster printing; what’s the effect of the economics and difficulty of producing posters over time? I also wonder whether moviemakers have become better at figuring out the “optimal” colour distribution of posters over time, and whether we’re asymptotically approaching some quiescent distribution.

I was a bit concerned that some of this might be due to bias in the data: some movies would be over-represented in the intra-year average (remember that some movies have multiple posters and I normalize over posters, not movies). I think this is not actually a huge issue because it’s reasonable to assume that a movie’s marketing budget is roughly proportional to the number of posters that it has produced for itself. This means that the skew, if any, would be similar to the perceived average.

I presented these preliminary data to some friends of mine who are more steeped in the world of graphics and arts. Cheryle Cranbourne, (she used to be a graphics designer and has just finished a Masters in interior architecture at RISD) had a number of good thoughts:

[Edit: I had misquoted this earlier] The movies whose posters I analysed “cover a good range of genres. Perhaps the colors say less about how movie posters’ colors as a whole and color trends, than they do about how genres of movies have evolved. For example, there are more action/thriller/sci-fi [films] than there were 50-70 years ago, which might have something to do with the increase in darker, more ‘masculine’ shades.”

This is backed up a bit by data from under consideration’s look at movie posters. They didn’t go back very far, but there did seem to be a reasonable correlation between movie age rating and palette.

She also pointed out that earlier posters were all illustrated/ hand painted, with fewer colors and less variation in tone. Perhaps the fact that white and black have become more prevalent is due to the change from illustration to photography. Painted skin might also over-represent orange and under-represent other hues that happen in real life.


I downloaded ~ 35k thumbnailed-size images (yay wget — “The Social Network” inspired me to not use curl) from a site that has a lot of movie posters online. I then grouped the movie posters by the year in which the movie they promoted was released. For each year, I counted the total number of pixels for each colour in the year. After normalizing and converting to HSL coordinates, I generated the above visualizations.


I was inspired by Tyler Neylon’s great work on colour visualizations. I ended up writing my own code to do these image analysis visualizations, but I will try to integrate it with his work.

Future work:

There’s a bunch of stuff I still have to / want to do, but since I’m working on my startup, I don’t really have much time to focus on it right now. Here’s a long list of stuff:

  1. Follow up on all the open questions about the reasons for this change.
  2. Use other metadata (not just year) for movies to search for patterns. A simple machine learning algorithm should suffice if I throw all the attributes in at once. This should be able to highlight whether genre is important, and what other factors are crucial
  3. “main colour” analysis. I should run some kind of clustering (as Tyler does in his code). His code uses a handwritten (?) k-means clustering algorithm, which is a bit slow when faced with thousands of pictures worth of data. There are some faster albeit slightly less accurate versions that I could use.
  4. I need to move the pie charts to use gcharts js api, so they’re interactive
  5. I should probably make nicer/fancier js onhover stuff
  6. I should look at Bollywood and other sources to see whether this holds across countries.
  7. My visualizations and javascript aren’t so good. I have to learn how to do this stuff better!
16 January

Update / Solution for broken Android calendar syncing

In my last post, I described how Android’s calendar syncing was broken for me. I noticed that my calendar on my phone was out of date, and when I manually refreshed, I’d get a force-close error.

After downloading the Android source, figuring out how to build, and playing with it on the emulator and my device for some time, I have figured out what the problem is, and have a work-around for it. Essentially some repeated events can have a start-date Android is unhappy with (I believe it’s due to a start time of UTC 0). This causes an Android core library to throw an TimeFormatException which is never properly handled, preempting syncing. This is a pretty big bug — that exception should be caught by Google’s common calendar code, but the exception is ignored. (This is because of the misuse of unchecked exceptions — android.util.TimeFormatException inherits from RuntimeException for no good reason at all that I can see. Checked exceptions are one of the best features of Java, and inheriting from RuntimeException for things that should be handled is a really bad idea, IMO.).

Here is the text of the item that was breaking my calendar syncing:


This was in the private url for my feed. You can see yours here:
https://www.google.com/calendar/feeds/USER_NAME%40gmail.com/private/full. I think this event was added by Outlook somehow, but I’m not really sure. The web UI and other clients have no problem dealing with this event, but Android’s date parser is unhappy with it. If you’re seeing repeated calendar syncing crashes, go to the above url, replace USER_NAME with your user id, and see if you have something similar to this string. If so, deleting that event ought to fix syncing.

How Google should fix this

If someone on Android or Calendar is reading this, there are two ways this should be fixed. Please do both of them!

  1. Fix Android to handle these errors gracefully. I patched the provider code to fix this bug. Someone should fix this, and include it in the next ICS update. Here’s the diff:

    vijayp@thecoon:/mnt/largelinux/bigfiles/as2/frameworks/opt/calendar/src/com/android/calendarcommon$ git diff -w
    diff --git a/src/com/android/calendarcommon/RecurrenceSet.java b/src/com/android/calendarcommon/RecurrenceSet.java
    index 3b91a1d..8e1117e 100644
    --- a/src/com/android/calendarcommon/RecurrenceSet.java
    +++ b/src/com/android/calendarcommon/RecurrenceSet.java
    @@ -178,6 +178,7 @@ public class RecurrenceSet {
    public static boolean populateContentValues(ICalendar.Component component,
    ContentValues values) {
    + try {
    ICalendar.Property dtstartProperty =
    String dtstart = dtstartProperty.getValue();
    @@ -233,6 +234,11 @@ public class RecurrenceSet {
    values.put(CalendarContract.Events.DURATION, duration);
    values.put(CalendarContract.Events.ALL_DAY, allDay ? 1 : 0);
    return true;
    + } catch (TimeFormatException e) {
    + // This happens when the data is out of range.
    + Log.i(TAG, "BAD data: " + component.toString());
    + return false;
    + }

  2. Patch the calendar FE server to remove things that break android. Fixing Android is the correct solution because it’s unclear that the data it is passing are actually bad. But since the Calendar Frontend can be fixed in a few days, and it might take months (or years!) to get carriers to agree to roll out an Android update, it’s best to just patch the Calendar FE to filter out data that might cause Android to crash. It can even be enabled based on the useragent.

Anyway, I really hope someone at Google reads and fixes this. I spent a lot of unnecessary time tracking this down!

14 January

Android calendar syncing is broken for me!

Posted by in code, software | One Comment

For the past couple of weeks, (shortly after my nexus s upgraded itself to ICS), the calendar on my phone has not been syncing with Google. This has required me to use the calendar website on my phone, which is not a pleasant experience at all. So today, I hooked my phone up to my computer and decided to do some debugging. Using adb logcat, I found this stack trace:

E/AndroidRuntime(15353): FATAL EXCEPTION: SyncAdapterThread-2
E/AndroidRuntime(15353): android.util.TimeFormatException: Parse error at pos=2
E/AndroidRuntime(15353): at android.text.format.Time.nativeParse(Native Method)
E/AndroidRuntime(15353): at android.text.format.Time.parse(Time.java:440)
E/AndroidRuntime(15353): at com.android.calendarcommon.RecurrenceSet.populateContentValues(RecurrenceSet.java:189)
E/AndroidRuntime(15353): at com.google.android.syncadapters.calendar.EventHandler.entryToContentValues(EventHandler.java:1138)
E/AndroidRuntime(15353): at com.google.android.syncadapters.calendar.EventHandler.applyEntryToEntity(EventHandler.java:616)
E/AndroidRuntime(15353): at com.google.android.syncadapters.calendar.CalendarSyncAdapter.getServerDiffsImpl(CalendarSyncAdapter.java:2223)
E/AndroidRuntime(15353): at com.google.android.syncadapters.calendar.CalendarSyncAdapter.getServerDiffsForFeed(CalendarSyncAdapter.java:1954)
E/AndroidRuntime(15353): at com.google.android.syncadapters.calendar.CalendarSyncAdapter.getServerDiffsOrig(CalendarSyncAdapter.java:945)
E/AndroidRuntime(15353): at com.google.android.syncadapters.calendar.CalendarSyncAdapter.innerPerformSync(CalendarSyncAdapter.java:417)
E/AndroidRuntime(15353): at com.google.android.syncadapters.calendar.CalendarSyncAdapter.onPerformLoggedSync(CalendarSyncAdapter.java:302)
E/AndroidRuntime(15353): at com.google.android.common.LoggingThreadedSyncAdapter.onPerformSync(LoggingThreadedSyncAdapter.java:33)
E/AndroidRuntime(15353): at android.content.AbstractThreadedSyncAdapter$SyncThread.run(AbstractThreadedSyncAdapter.java:247)
W/ActivityManager( 153): Force finishing activity com.google.android.calendar/com.android.calendar.AllInOneActivity
V/CalendarSyncAdapter(15353): GDataFeedFetcher thread ended: mForcedClosed is true

Thanks to Evan I was able to clone the git repo for the Calendar app (https://android.googlesource.com/platform/packages/apps/Calendar.git) , and spent some time today trying to track down this bug.

Unfortunately, the buggy code is in calendarcommon, which isn’t included as part of the git file, and is actually nearly impossible to find. At any rate, with some more digging, the closest I could get is the code here


I think there needs to be a try/except block for that whole method (around line 189) that returns false if an exception is thrown. For some reason that TimeFormatException is derived from RuntimeError (!!). The common code doesn’t seem to be installed as part of the calendar app. From quickly looking at the code, It appears as if it is installed as part of the os and registers itself as the handler for calendar uris.

So if I wanted to fix this myself, I’m wonder whether I would have to fork the code above, and install it as a new handler, then somehow hide the one with the OS? I have to think about this a bit more. The other problem is that since this is a common library, many other calendar apps might suffer from the same exception when they attempt to sync.

In the meantime, I’m going to try to figure out what event is causing this error (not easy since there are no logs that can help me) and/or think of buying an iPhone.

If you know anyone on Android who could help with this, please let me know.

I’m downloading the entire android source code, and I think I’m going to try to re-build a patched version of the common code, uninstall the existing common code, and push the new one over it. I’ll update this post with progress …

17 December

Partychat — migrating from Google App Engine to EC2

Posted by in python, software | 17 Comments

I’m Vijay Pandurangan, and I’ve been working with some other super talented folks to help maintain the partychat code, and help pay for its services. Because of the latter, I was especially motivated to keep Partychat’s costs under control!

Recently, Google App Engine made some substantial pricing changes. This affected a lot of people, but especially partychat, a service with over 13k one-day active users.
In this blog post (and a couple of more to follow), I’ll describe various aspects of the pricing change and our ensuing migration from App Engine to Amazon’s EC2, the impact on users, including cost structures and calculations.

For those of you who don’t have the time to read everything, here’s the

tl;dr version:

  • Google’s new pricing was totally out of line; we were able to re-create similar service at about 5% of the cost of running the service on App Engine.
  • Google’s policy has resulted in higher costs for them (XMPP messages used to be entirely within their network, but now have to be sent to and from EC2), and reduced their revenue (we, and others will likely shun App Engine)
  • App Engine requires a very different design paradigm from “normal” system design.
  • Some App Engine modules lock you in to the platform. We had to make all our users transition from channel@partychapp.appspotchat.com to channel@im.partych.at because Google does not allow us to point domain names elsewhere.
  • You can operate things on EC2 substantially more cheaply than App Engine if you design correctly.
  • I strongly recommend against running anything more complicated than a toy app on App Engine; if Google decides to arbitrarily change its pricing or change the services they offer, you’ll be screwed. There is really no easy migration path. Random pricing changes coupled with lack of polish in appscale means that any solution differing even slightly from the ordinary is “stuck” on appengine.
  • Transitioning a running app is pretty hard, especially when it’s an open source project done on spare time.
  • By (partially) transitioning off of app engine, we’ve actually reduced our cost from the pre-increase regime and can deliver roughly equivalent capacity for the same cost:

Before: $2/day.
With price increase: ~$20/day.
On EC2(no prepay)/App engine hybrid: $1/day.
On EC2(annual prepay)/App engine hybrid: $0.80/day.
At this point, residual app engine charges still amount to approximately $0.50/day.
On EC2(annual prepay)/App engine hybrid using more memcache: $0.60/day.


Partychat is a group chat service. By going to a web site, or sending an IM, users create virtual chat rooms. All messages sent to that chat room alias are then broadcast to all other users in that channel. Here is an example of two channels with two sending users.

This is what happens from the perspective of our app:

Google’s service calls an HTTP POST for every inbound message to a channel, and outbound messages are sent via API calls.

Cost estimates:

Before Google’s pricing changes, our daily cost to process messages was about $2/day. Before the new pricing went into effect, I used some anonymous logging information to forecast how much the service would cost to operate in the new pricing regime.

As you can see from the graph, even limiting the maximum room size to 200 people (which would be a major disruption to our services), would have cost us well over $10/day, which is really unacceptable.


Since this was an open-source project I decided to take the simplest approach possible to make this migration. I’d create an XMPP server on EC2 that would simply do all the sending and receiving of XMPP messages instead of App Engine.

Note that as a result of their policy, Google makes less money AND has higher costs! (All XMPP traffic now gets routed through EC2, which is taking up bandwidth on their peering links)

Our App Engine app still does almost all of the processing of messages (including deciding who gets sent which messages), but does not do the actual fanout (i.e. creating n identical messages for a broadcast message). That is handled in the proxy.

XMPP Server:

In order to run an XMPP proxy, we need to deploy an XMPP server with the ability to federate to other servers, and code that interfaces with that server and receives and sends messages.

There are a bunch of XMPP servers out there, but the overall consensus is that ejabberd, a server written in Erlang (!) is the best and most stable. It’s proven to be extremely stable, and efficient. The big issue is that configuration is really difficult and not very well documented. A couple of important points that took forever to debug:

  • ejabberd has a default traffic shaping policy. Traffic shaping restricts in-bound and outbound network traffic to according to a policy. Traffic that exceeds the limits are buffered until the buffer is full, then dropped. Partychat’s message load can often be substantially higher than the default policy’s limit for sustained periods, resulting in randomly-dropped messages.
  • if multiple servers associate with one component (more on this in the next section) ejabberd will round robin messages between the connections. This means that your servers have to run on roughly equal machines.

Proxy code:

XMPP supports two connection types, Client and Component. A client connection is what your chat client uses to connect to a server. It requires one connection per user, and the server remembers all the important state (who you are friends with, etc..). This is by far the simplest solution for writing something like partychat, but there are a few problems. It requires the server to keep track of some state that we don’t care about (Are the users online? What are their status messages? etc..) which adds load to the server’s database. This can be solved by increasing database capacity, but this is wasteful since these data are not used. More importantly, using client connections will require one TCP connection per user (see the image below). This means that for our service, with ~ 50k users, our server will need to handle 50k TCP connections. This is already a really large number, and will not scale that well.

The alternative (which I selected) was to use a component interface (see above image), which essentially gives you the raw XML stream for any subdomain. Your code is then responsible for maintaining all state, responding to presence and subscription requests.

Initially we used SleekXMPP, a python library to manage component connections. The state was stored in RAM and then periodically serialized and written out to disk. Since XMPP has provisions for servers to rebuild their state in case of data loss without human involvement, losing state is not catastrophic, though it results in substantially higher load on the system while redundant subscription messages are dispatched and processed. The state that we store currently contains:

user := string (email)
channel:= string (name of channel, utf-8)
inbound_state := {UNKNOWN, PENDING, REJECTED, OK}
outbound_state:= {UNKNOWN, PENDING, REJECTED, OK}
last_outbound_request := timestamp

The last outbound request timestamp is required to prevent subscription loops in case sequence ids are not supported (the spec details why this is important.)

Each inbound message results in an https request to the app engine server. The server responds with JSON containing a message that is to be sent, a list of all recipients, and the channel from which it is to be sent.

The python library was OK, but did not really operate too well at our scale. Profiling showed that much time was spent copying data around. The server periodically crashed or hung, resulting in dropped messages and instability. The inefficiency required us to use a medium instance ($0.17/hour) to serve traffic, which put us at about $4/day. Still substantially lower than App Engine, but too high!

The server was then re-written in C++, using gloox, libcurl, openssl, and pthreads. Each message is dispatched to a threadpool. Blocking https calls are made to the App Engine server, and the results are then sent via ejabberd. This server is able to handle our max load (~ 12-16 inbound messages per second, resulting in around 400-600 outbound messages per second) at under 5% cpu load on a micro instance (at $0.02/hour).

The system is mostly stable (a lingering concurrency bug causes it to crash and automatically restart about once every 12 hours) and should provide us with substantial room to scale in the future.


Google Apps for Your Domain has grown in popularity recently. This presents us with a problem; in order for XMPP messages to be delivered to users on a domain, a DNS SRV record needs to exist for this domain. Google does not automatically create this record, but messages between google apps domains get delivered correctly (so we never saw this issue with app engine).

Another issue that slowed down development a lot was that some clients (generally iTeleport) send invalid XML over the wire, which cause the python XML code to throw namespace exceptions. This made SleekXMPP even less reliable, and required making code changes in core Python libraries. The C++ libraries handle this gracefully.

Future work:

In the near future, we will be disabling old @partychat.appspot.com aliases. Other future work includes billing for large channels, and reducing the number of write operations on App Engine.


Working on this project has been quite educational. First of all, migrating a running service with many users from Google App Engine is hard, especially if it uses esoteric services, such as XMPP. Appscale is a reasonable platform that could help with the transition, but it is difficult to use, and may not be fully ready for production. Google App Engine’s insistence on a different paradigm for development makes migration extremely difficult, since moving to a new platform requires rearchitecting code.

An even bigger problem is the fact that some aspects of your system (e.g. XMPP domains) are not under your control. We had to migrate our users to a new chat domain, because Google did not allow us to point our domain elsewhere. This was a huge inconvenience for our users. Since our service is free, it was less of a big deal, but for an actual paid service, this would be a serious problem.

Since pricing is subject to rapid, arbitrary changes, and transitioning is difficult, no system that is likely to become productionized at scale should be written on App Engine. EC2’s monitoring and auto-scaling systems (more on this in a subsequent post) are excellent, and don’t require buying into a specific design paradigm. In fact, moving this proxy to a private server, or rackspace would be quite trivial.

Edit: I wanted to add this, just to clarify my point:
It’s more than just a price/stability tradeoff. The problem is, as an App Engine user, one is totally at the mercy of any future price changes on App Engine because it is nearly impossible to seamlessly migrate away. The DNS aliases can only point to Google, and the API does not really work well on any other platform. So, at any time in the future, one’s service could suddenly cost 10x as much, and one won’t really have the option to move quickly. If one intends to scale, it’s better to never get into that state in the first place, and develop on EC2 instead. If EC2 raises its prices (highly unlikely since computing power is increasing and costs are decreasing), one can always move to rackspace or just get a private server.

It’s of course true that writing stuff on App Engine can sometimes require a lot less engineering work. But the difference is not really that substantial when compared to the possibility of being stuck on a platform that all of a sudden makes your company unprofitable. Changing a running service is very hard. Avoiding the problem by not getting stuck on App Engine is not trivial, but in my opinion the right call.

14 November

Migrated Partychat rooms and Google Apps domains

Due to App Engine cost changes, I’ve been working with the partychat folks to migrate our services to a new domain (new rooms are channel@im.partych.at).

We’re seeing a lot of people who are using accounts on Google Apps domains having difficulty connecting to the new Partychat services.

Simple solutions

If you are using a Google Apps domain, these instructions (from Google) will help you get partychat working again. This will require help from someone with access to your domain settings (probably a system administrator).

If you don’t have access to DNS records, or can’t find someone who does, you will have to use a @gmail.com account instead.

Technical Details

Every domain needs to have a SRV DNS record to tell other XMPP servers where to connect (if the bare domain does has no record). The SRV record’s name should be “_xmpp-server._tcp.domain.com.” This doesn’t just affect partychat, it prevents most people on non-Google third-party domains from being able to talk to you.

You can check if your server has one by executing the following (change mydomain.com to the name of your domain):

vijayp@ike:~/src$ nslookup
> set q=SRV
> _xmpp-server._tcp.mydomain.com

** server can't find _xmpp-server._tcp.mydomain.com: NXDOMAIN

As you can see, mydomain.com doesn’t have a record, so our servers don’t know where to send your chat messages. Here is an example of a properly configured domain:

vijayp@ike:~/src$ nslookup
> set q=SRV
> _xmpp-server._tcp.q00p.net

Non-authoritative answer:
_xmpp-server._tcp.q00p.net service = 5 0 5269 xmpp-server.l.google.com.
_xmpp-server._tcp.q00p.net service = 20 0 5269 xmpp-server1.l.google.com.
_xmpp-server._tcp.q00p.net service = 20 0 5269 xmpp-server2.l.google.com.
_xmpp-server._tcp.q00p.net service = 20 0 5269 xmpp-server3.l.google.com.
_xmpp-server._tcp.q00p.net service = 20 0 5269 xmpp-server4.l.google.com.

29 September

Why Eclipse’s “Check for Updates” is horribly slow (and how to fix it)

I recently installed Eclipse Indigo. I wanted to add a few plugins to it, so I tried to use the UI to check for new updates and install some new packages. I let it run for a while, and after about 45 minutes, it looked to be about 20% done. Eventually, it displayed a few errors about timing out.

The issue is that Eclipse appears to be trying to contact mirrors that don’t have a proper copy of all the files it’s expecting. My solution was to invoke eclipse with the following flag. Add it after “eclipse”, or in eclipse.ini

29 September

Attaching a physical (raw) disk to VMWare Fusion 4 without BootCamp

I wanted to boot and run my Linux installation from a physical disk inside Mac OS X. There’s no easy guide for this on the web; most want you to use a vmware tool that existed in previous versions in /Library/Application Support/VM* but that file didn’t exist for me.
I think the new VMWare Fusion can read BootCamp config data automatically, but I didn’t want to use BootCamp (long story). Since I had VirtualBox installed, this wasn’t too difficult.

First off, figure out what the mac thinks your disk(s) are called:

chef:ubuntu_test.vmwarevm vijayp$ diskutil list
0: *64.0 GB disk1
0: GUID_partition_scheme *2.0 TB disk3
1: Linux Swap 16.5 GB disk3s1
2: Microsoft Basic Data 983.5 GB disk3s2
3: Microsoft Basic Data Untitled 899.4 GB disk3s3

My main drive was /dev/disk1 (for some reason, I decided to use the entire disk for the linux partition) and the data partition was /dev/disk3s2.

After installing VMWare fusion 4, I created a new custom VM set up as Ubuntu 64-bit. This turned up in my Documents folder:

chef:~ vijayp$ cd ~/Documents/Virtual\ Machines.localized/
chef:Virtual Machines.localized vijayp$ ls
Ubuntu 64-bit.vmwarevm
chef:Virtual Machines.localized vijayp$ cd Ubuntu\ 64-bit.vmwarevm/
chef:Ubuntu 64-bit.vmwarevm vijayp$ ls
Ubuntu 64-bit-s001.vmdk Ubuntu 64-bit-s007.vmdk Ubuntu 64-bit.vmdk
Ubuntu 64-bit-s002.vmdk Ubuntu 64-bit-s008.vmdk Ubuntu 64-bit.vmsd
Ubuntu 64-bit-s003.vmdk Ubuntu 64-bit-s009.vmdk Ubuntu 64-bit.vmx
Ubuntu 64-bit-s004.vmdk Ubuntu 64-bit-s010.vmdk Ubuntu 64-bit.vmx.lck
Ubuntu 64-bit-s005.vmdk Ubuntu 64-bit-s011.vmdk Ubuntu 64-bit.vmxf
Ubuntu 64-bit-s006.vmdk Ubuntu 64-bit.plist vmware.log

VMWare has created a default disk that’s striped into 11 pieces (see the *.vmdk files). In order to access the physical drives, I used virtualbox’s toolkit:

chef:Ubuntu 64-bit.vmwarevm vijayp$ sudo VBoxManage internalcommands createrawvmdk -filename disk1.vmdk -rawdisk /dev/disk1
chef:Ubuntu 64-bit.vmwarevm vijayp$ sudo VBoxManage internalcommands createrawvmdk -filename disk3s2.vmdk -rawdisk /dev/disk3s2
chef:Ubuntu 64-bit.vmwarevm vijayp$ sudo chown $USER disk*.vmdk

Next you have to edit the VMWare file manually to add the disks, and remove the default one. I’m not sure why the UI won’t let you select these vmdks, but it doesn’t. Make sure the vm is NOT RUNNING, then edit the file. The diffs are pretty trivial:

@@ -2,16 +2,20 @@
config.version = "8"
virtualHW.version = "8"
vcpu.hotadd = "TRUE"
scsi0.present = "TRUE"
scsi0.virtualDev = "lsilogic"
+scsi1.present = "TRUE"
+scsi1.virtualDev = "lsilogic"
memsize = "1024"
mem.hotadd = "TRUE"
scsi0:0.present = "TRUE"
-scsi0:0.fileName = "Ubuntu 64-bit.vmdk"
+scsi0:0.fileName = "disk1.vmdk"
+scsi1:0.present = "TRUE"
+scsi1:0.fileName = "disk3s2.vmdk"
ide1:0.present = "TRUE"
-ide1:0.autodetect = "TRUE"
+ide1:0.fileName = "cdrom0"
ide1:0.deviceType = "cdrom-raw"
ethernet0.present = "TRUE"
ethernet0.connectionType = "nat"
ethernet0.virtualDev = "e1000"
ethernet0.wakeOnPcktRcv = "FALSE"

Now you can delete the Ubuntu 64-bit*.vmdk files.

I still haven’t figured out how to set the UUID on these disks so linux mounts them correctly, but it’s probably one of ddb.uuid.image and ddb.longContentID in the vmdk file. But it boots, so I can get some work done. I’ll revisit the uuid stuff soon.