With the leaves beginning to change, the days shortening, and Starbucks bringing back The One True latte flavor, autumn is clearly here. For some, this brings a mad scramble before the Black Friday “freeze.”
Black Friday Freeze (n): An annual event in which some e-commerce retailers and vendors freeze their website’s code or infrastructure to prevent anything from interrupting their slice of the shopping frenzy.
Money and reputation are on the line, and failure to acknowledge that fact is neither productive nor helpful. Let’s try to understand the risks that e-commerce retailers are facing; and the tools, strategies, and modern technical practices they are using to mitigate those risks. We can swap a fear-driven “freeze” for a holistic data-driven approach that allows your business to operate across the technical and non-technical domains with agility.
e-commerce is a wide bucket that has many kinds of sites lumped into it. A few common ones I’ve seen are:
- Single Tenant Sites – where the site serves a single business. It may be running software like Spree, OpenCart, Magento, WordPress or something homegrown to process orders.
- Multi-Tenant Sites – where the site offers to host the online presence of other businesses, but offers them their own dedicated storefront like a single-tenant site would. These sites are normally provided by vendors like Wix, Shopify or BigCommerce.
- Platform Sites – where the site aggregates a collection of storefronts into a single end user experience. These sites make up some of the largest internet giants like Amazon or eBay and can also show up in small niches like TcgPlayer.
While the presentation and format may change from site to site, all the different site archetypes share a common goal: to take some form of electronic payment in exchange for goods or services. This is going to be most evident on Black Friday, the busiest shopping day.
For many e-commerce retailers, Black Friday represents the highest order volume and average order value they will see all year. For some retailers, this could be the difference between being in the red or the black. Many companies prepare for months for Black Friday with questions like:
- Do we need to increase staffing levels to handle the increased order volume?
- Do we need to bring on additional vendors?
- Are we going to lose customers if we can’t get them their products before Christmas?
- What kinds of marketing pushes should we make to attract potential customers?
- What is our conversion rate?
- How are we going to handle abandoned carts?
- How are we going to handle fraudulent purchases?
To freeze or not to freeze on Black Friday?
The answers to these questions drive business plans well in advance of Black Friday. For example, needing more staff to handle the increased volume of orders or needing more stock in the warehouse to quickly deliver to customers requires time and capital outlay. Retailers make these spends months in advance, expecting to recoup the capital outlay on or after Black Friday.
The truth for many e-commerce retailers is: to recover the capital we spent in preparation for Black Friday, our site must stay operational.
A number of retailers choose to “freeze” their website and infrastructure as their first, and possibly only, mitigation against technical changes which could negatively impact their operations. For weeks before Black Friday – and sometimes weeks after, a company will prevent any change to its production system to ensure the website remains as stable as possible. The idea being that if we don’t change the system once it’s reached a known acceptable state, we can eliminate the risk of that state being disrupted.
I take umbrage with the concept of a “freeze” as a whole. In the past, you could get away with locking your system down in that fashion, but we’ve moved past where that makes sense as an industry. A modern e-commerce website has too many components provided by too many vendors for this idea to make sense. If I’m trying to “freeze” my e-commerce site, am I able to:
- Prevent any changes that my credit card processor may make to their system?
- Prevent my hosting provider from performing potentially risky maintenance to their data center?
- Prevent my analytics vendor from pushing bad JavaScript, which breaks my checkout form?
- Prevent my shipping vendors’ systems from going down due to the Black Friday load?
- Say for certain that the act of updating my site mitigates a previously unknown memory leak?
- Say for certain any possibility of a login being changed during Black Friday is impossible?
Understanding risk through observability
The answer to many of these questions for all but the largest companies is “no.” Your company isn’t going to be able to force these demands onto anyone. The environment your website runs in and your company operates in, being an e-commerce retailer, is inherently filled with this kind of technical risk. It’s table stakes to have a plan to handle and react to these challenges.
For any challenge, being able to quantify and understand the risk is paramount to accepting those risks. When it comes to your retail website and the supporting infrastructure, observability is the way to understand how to quantify, handle and react to risks. With a modern observability platform, you can gain insight into the state of your running system and all its components from the underlying infrastructure to the interactions on your customer’s device. This understanding lets you handle the risks of operating an e-commerce business without “freezing” in place for weeks or months.
While it’s easy to talk about the problems, getting to an actionable solution is much harder. Instead, let’s take a look at what you’re able to achieve with a modern observability platform like Instana on an e-commerce workload. To do this, we can set up a small “Single Tenant” example using our AutoTrace instrumentation for PHP and look at a copy of OpenCart. Doing this will give us:
- Full request and response tracing between the end user, the application, and any of its external services; this includes things like logs of every database query, the health of the infrastructure, and a mapping of all the services
- Monitoring of the end user device, including asset timings and JavaScript errors
- Business-relevant SLAs and dashboards generated based on application health and metrics
And we’ll do that in my next blog, so hurry back.