KubeCon / CloudNativeCon 2022 is wrapped, I’ve had a chance to recharge my batteries, and it’s time to talk about what we’ve learned and what we’re excited about.
If you aren’t already aware, KubeCon is a 3-day conference hosted by the Cloud Native Computing Foundation (CNCF). There are an additional two days of “co-located” events to round out the beginning of the week.
This year’s most exciting events for us were (of course) focused on observability — with Open Observability Day on Monday and a community-organized OTel Unplugged event on Tuesday.
Here are our top four Observability take aways from the week:
Take-away #1: New data types are coming to OTel
Everybody is familiar with metrics, traces, and logs. But new datatypes are coming soon to OpenTelemetry — sort of.
Profiles will be a new data type used to represent stack traces. They will be different from spans in that they will be combined with a metric and will be designed to drill deeper than spans into the operations of a single service.
Events are the next exciting new data type. You may be asking yourself, “Aren’t logs already a form of event?” — and you’d be correct. I asked the same thing. The reason why the new event data type is noteworthy is the specific structure — events are structured logs that are designed to be compared (e.g. across different time windows).
Take-away #2: Distributed tracing is really hard
Let’s talk about distributed tracing. It’s really important — arguably the “killer app” that Observability has brought to the table. But it’s also really difficult to do right. Why? Because to really get value out of distributed tracing it has to be end-to-end and without any gaps.
Most organizations have applications comprised of myriad services across clouds and on premise environments. Some run in containers, some in Kubernetes clusters, and others on VMs or bare metal. Getting end-to-end tracing across this patchwork is much more difficult than simply installing an Operator and moving on.
Take-away #3: Observability best-practices are still elusive
In observability now, the word “should” is coming up in a lot of questions. We’re using brand new toolchains to observe cutting-edge tech stacks and nobody knows where all of the potential pitfalls are.
Even in the most ancient pillar of observability — logging — best practices remain elusive to many. Capture too much, and you’ll end up paying for it on your cloud bill. Capture too little, and you’ll risk missing the crucial data point that helps solve a production outage.
Take-away #4: Observability requires a practice
For real-world use cases at scale, observability is not as simple as choosing a vendor and installing an agent or plugin. It requires dedicated individuals keeping up with best practices, setting examples for their organizations, and working to fight the unrelenting decay of complex systems.
Organizations have begun to recognize these and other challenges. We’re starting to see a lot of horizontal Observability Teams as well as individual developers and team leads working to bring the benefits of observability to the entire software development process.