Rafal Studnicki - Keeping real-time auctions running during rollout.
Full Title: Keeping real-time auctions running during rollout. From white-knuckle to continuous deployments. https://2023.elixirconf.com/presenters#speaker-rafal-studnicki-elixirconf-us-2023 Deploying an Elixir cluster that keeps stateful connections with the clients and manages distributed state is usually a much more challenging task than in the case of stateless services. At Whatnot, we learned this the hard way. With every deployment, there was a big risk of data inconsistencies that were very disruptive to auctions in progress. Which, of course, led to the buyers’ dissatisfaction and the sellers’ financial losses. Consequently, we limited deployments to off-peak hours. In this talk, we will present a case study of how we drastically increased the reliability of our Elixir service. We did this by automatically verifying the system against most of the problems we’ve been experiencing in various conditions. We tested the deployments and locally simulated cases where nodes went up and down randomly. Having included these new tests in our CI pipeline, we gained enough confidence to deploy to production after every single commit at any time of the day.