Acceptable Downtime

Brad Feld has a great post about the difficulty in quantifying the acceptable unexpected downtime for software sold as a service, and the delicate ongoing balance between minimizing risk (investing in redundancy and infrastructure) and driving demand (new features).
At QuickBase, our umbrella term (coined by Jana, the GM) for these efforts was “Business Reasonable.” Vague? Yes. But the important idea behind it is that any notion of reliability and redundancy has to be defined relative to the kind of customers you have at the time, and how they’re using the product. So instead of reflexively falling back on axioms like 5 nines, you use your empathy and sense of the customer to try to answer the question: what sort of downtime they will they consider reasonable? Just as Geoffrey Moore’s early adopters and visionaries are willing to overlook holes in a product’s functionality, those same customers are often, by temperament, willing to give away more more “free passes” than later stage customers.
During our periods of early rapid growth, we definitely used up one or two of those precious free passes. We all spent time calling customers to explain. And ultimately, after some incredible efforts by our engineering and operations team, we became a service our customers could depend on.