Kolton Andrus and on-stage co-host Ben Freeberg talk about Preparing For Disaster, which will be presented at TEDxAsburyPark on May 18, 2019.
The following is an excerpt from the interview. To read and hear the full interview, CLICK HERE
Ben Freeberg: Welcome. I’m one of the on-stage hosts of TEDxAsburyPark. Today I’m here with Kolton Andrus, who will be a speaker at this year’s TEDxAsburyPark Conference on May 18, 2019. Welcome, Kolton.
Kolton Andrus: Thank you. It’s a pleasure.
Ben Freeberg: To start off, we’d love to hear a little bit about yourself, how you’re going to be pitching your idea, and what your talk is about.
Kolton Andrus: By trade I’m a Chaos Engineer, which sounds cool, but let me break that down a little bit. In my job, we do Chaos Engineering. It’s this kind of counterintuitive idea that we want to go break our systems in order to make them stronger. The analogy I always use when I’m home for the holidays or with my family is that of the flu shot or another vaccine; if you go back 200 years and say, “Hey, I’m going to inject you with this disease. Is that cool?” chances are good that you might have received more than a little bit of pushback.
That’s kind of where we are on the technical side when we run this by engineers and say, “Hey, we want to break our systems in order to find the weak spots and make them stronger. How do you feel about it?” Some get very excited, some are like, “Whoa, really? Is that a good idea?”
Ben Freeberg: So then who are you pitching in those different groups? Is it the engineers themselves or is it someone in management? Who gives the most pushback? Who’s the most receptive?
Kolton Andrus: The messaging is certainly for the engineers. My co-founder and I were both on-call engineers at Amazon, Netflix, and Salesforce, so we carried the pagers. When Amazon broke at 2:00 in the morning, I hopped on a call and figured out the problem with a bunch of other people and fixed it. So that’s really who the messaging is for. We want to save that pain. We want to prevent those outages. We want to prevent people from getting woken up at 2am. We’re lazy engineers. We just want the system to work well.
Ben Freeberg: Are there any types of businesses in particular that have found your approach exceptionally insightful or engaging?
Kolton Andrus: We’ve certainly focused within the software industry in particular, and since we came from Amazon.com, the e-commerce world, the financial world, and the SaaS (Software as a Service) world make a lot of sense. Those are businesses that people expect to always be online and when they’re down, the businesses are losing a lot of money. Take Amazon. If they’re not taking orders for a minute they could be losing anywhere from $10,000 to greater than $100,000, so it’s well worth the time and investment to try to prevent every minute of downtime–and the customer pain that comes with them.
Ben Freeberg: Definitely makes sense. So spending time on that side of e-commerce, where do you see e-commerce more broadly (aside from Amazon) within the next few years?
Kolton Andrus: It’s been fun to watch the e-commerce space grow. Amazon definitely pushed people on and you’ve seen the rest of the market and the rest of the industry change their approach. I think this was a real competitive advantage for Amazon. They used this to offer a higher degree of quality and a higher degree of reliability than their competitors. When you go back 10, 15, or 20 years, people would wait 30 seconds to a minute for a web page to load.
I used to get these AOL (software update) discs in the mail. It would take an hour or two to download them. You contrast that with the world we live in today; people get frustrated if they need to wait more than a second or two for their pages to load, and so the bar has just risen. People have higher expectations and it’s more important that things work when we need them to.
Ben Freeberg: So true. With that understanding, let’s get back to your TEDxAsburyPark talk. What’s the title?
Kolton Andrus: Embracing Chaos is part of the title. We’re still working on it, but the gist is this idea that our world has become much more complex. Whether it’s our software systems, our government, or our transportation, people are ever more reliant on this technology. When an airline has an outage, people are unable to travel for work, they’re unable to see their loved ones. Outages can have a huge impact on society; think about how it affects medical technology, government, the ability for people to get help, to be able to get loans, to finance.
So we live in this world where everything has to work, but it’s gotten much more complex. This idea of Chaos Engineering is really about taming that complexity. It’s not about causing chaos; this is one of the misnomers. A lot of people think “Oh, we’re going to chaotically affect our environment to understand how it happened.” It’s kind of the other way around. Our environment is chaotic. We’re using this approach to really understand how the pieces fit together and how the failures occur so that we can better understand it, make it more stable, make it more reliable.
Ben Freeberg: So how could some of the less technical audience members apply that idea to their day-to-day or their business?
Kolton Andrus: In essence we’re just using the Scientific Method. We have a hypothesis, okay? There was a big S3 (Amazon Simple Storage Service) outage a couple of years ago. S3 is where people store a lot of their documents and data and a lot of people on the Internet had an outage and for 2 hours things didn’t work . So that’s one of our hypotheses. “Hey, if we lose this ability to store data in the Cloud, how will our systems react?”
From that, there are some measurements, there’s some understanding of what we would do instead. Is there a way we could get around that? That thought process really gets us to the point where we’re diving into that complex system and we’re sussing out the side effects and the little details that might have prevented it.
Ben Freeberg: That makes sense. So why now? What’s going on with either our world from the technological standpoint, or just in terms of the social side, where you think that this is an important time to share this idea? I know you’ve been working on it for some time and you’re a bit ahead of the curve, but what’s going on today?
Kolton Andrus: For 10+ years, I’ve been working on this idea, so I am a little bit ahead of the curve. On the software side, our software’s become a lot more complex. There’s this concept of microservice architectures and so now we have, think of it almost like a graph. There are all these points and there are all of these interconnected relationships. It’s not five or ten, it’s hundreds of these points.
Kolton Andrus: So there’s an enormous number of ways things can interact and fail or have these side effects, so that complexity is a big part of it. On the software side individuals and corporates are running less of our own hardware in data centers. In the move to the Cloud, people are now trusting Amazon, Microsoft, and Google to host and run their infrastructure, but the truth is failure happens at scale often. If you have 10,000 machines, a failure that could happen one out of 10,000 times could happen every day. Those failures are somewhat unavoidable, so we just need to be able to prepare for them.
Ben Freeberg: What are some of the key milestones in terms of partnerships or “lucky breaks” that propelled you forward and kept you guys going?
Kolton Andrus: We had the opportunity to take this idea and approach and build it at Amazon. We had the opportunity to take it to Netflix, build it there, see a lot of value, see lots of money saved, and lots of engineering time saved, so that really got us started. I had the opportunity to get some VC funding. It’s kind of a fun story. I got in an argument with a venture capitalist in the lobby of a conference, arguing about why I wasn’t going to take money and I was going to bootstrap, and I have five kids, I live in California. It’d be a bit of a financial burden to really bootstrap the kind of company we wanted to build.
So I think that was kind of our lucky break. It was an opportunity to find some people who really believed in what we were doing, who were willing to back us and support us and help us build the business and understand how to provide value to our customers.
Ben Freeberg: That’s great. What were a few big things you started once you had a little bit more of the discretionary spending side or that ability? What were some of the big things you put it towards?
Kolton Andrus: It’s funny…we came from Amazon. One of the core values there is being frugal so we’re very thoughtful about how to spend our money. Again, with a large family, we were always budget-conscious. So part of it was just being able to work on it full time. We quit our day jobs. We were all-in on this company. We were going out and building the first version. It allowed us to go talk to the customers, to understand what pain they were facing, what kind of solution they needed.
Further down the road it let us really build a much bigger team. Our company’s almost 50 now and we’ve grown 4x, 5x last year, so that really lets us lean in and we’ve found the experts in the space, the people who have felt this pain, and recruited them to our cause and to help us make our customers’ lives better.
Ben Freeberg: That’s very cool. On a personal note, what are some things that inspire you and propel you that you’d want to share with our listeners? Other TED Talks, books, or ideas?
Kolton Andrus: There are a few there. Obviously there are a lot of great TED Talks. It’s hard to pick a few. I love Simon Sinek’s on The Power Of Why -why do people care about things. That’s a favorite of mine and I enjoy his book.
On the technical side, Nassim Nicholas Taleb has written a few of books, Antifragile: Things That Gain from Disorder ,The Black Swan: The Impact of the Highly Improbable. Skin in the Game: Hidden Asymmetries in Daily Life.
The crux of those books is that the people feeling the pain are the most motivated to fix it. We’ve seen this with DevOps SRE (Site Reliability Engineering) and the DevOps trend. When the engineers who write the software have to be on-call for it and get paged when it breaks, they care a lot more about the quality and preventing those things from occurring.
Another book I recommend a lot to learn the negotiation and business side is called Never Split The Difference by Chris Voss. He used to work for the FBI as a lead hostage negotiator and so he really learned how to negotiate well–and when lives were on the line. He’s a very, very good writer. It’s stories about what he learned and how people think and how to influence them in the right way. So I make all of my sales team members read that book
Ben Freeberg: One more question. If our audience wants to have a bigger relationship with you and this idea, what are some easy ways they could do so?
Kolton Andrus: Twitter’s a good one for that kind of bite-sized content. I’m @KoltonAndrus.
Our website talks a bit about this idea of Chaos Engineering. We’ve actually spent a lot of time just teaching people about this concept and how to do it well and so we also have a public Slack community that people can join through our website.
Ben Freeberg: Kolton, thank you so much for taking the time chat with us.
Just a reminder to get your tickets for the largest and highest-rated TEDx conference on the east coast, TEDxAsburyPark . On May 18, 2019 you’ll get to hear our friend Kolton speak and share words of wisdom on chaos.
Read more about Kolton Andrus here