Tech Stack

ExaVault’s API-driven application is containerized using Docker. Most of the API is built using PHP, some services run on NodeJS. The FTP server runs using ProFTPD. The application also uses React for the user interface, and Gluster for storage. It’s all built on CentOS.

Challenge

Gaining visibility into API performance and drilling down to specific customer issues.

Solution

Leveraging the Instana SDK for error tracking, debugging and alerting.

Results

  • MTTR reduced by 56.6%
  • 99.99% availability

“We’re mission-critical for a lot of companies,” explains David Ordal, CEO at ExaVault. The majority of ExaVault customers are performing automated, system-to-system file transfers, like moving data from a point-of-sale system to an analytics platform or an inventory management system. “If we go down, our customers start losing money,” Ordal says. The API handles an average of 35,000 requests per minute and over 50 million calls daily. While the file transfers are automated, parties on both sides of the transfer rely on these automations to make business decisions.

“If we go down, our customers start losing money,”

Not only are the stakes high for individual ExaVault customers, they also each use ExaVault in a slightly different way, often creating custom functionality through the developer API. ExaVault’s entire customer base isn’t affected by all issues — in fact, often only a single customer experiences a slowdown. If that happens, ExaVault’s team needs to be able to see what the customer is experiencing and debug the customer-specific problem.

Before moving to Instana, ExaVault was using a monitoring system that made getting customer-granular information nearly impossible. “We couldn’t tag transactions with their user ID, and then filter down into the specific customer issue,” Tom Fite, senior backend engineer at ExaVault explains. Specific customer issues can be completely lost in averages — if a single customer is experiencing a slowdown, it won’t show up at all on a monitoring system that only gives a holistic view.

The search

When ExaVault started looking for a new monitoring solution, the top priority was the ability to break down metrics by account and see what ‘edge case’ customers were experiencing. Other top criteria were cost and user interface, both of which had been pain points with previous vendors.

“Some APM vendors are prohibitively expensive,” explains Fite. “Especially when you are talking about scaling your application and you have your monitoring running on more than a few boxes.”

ExaVault considered factors like stack traces, database calls, throughput, data retention policies and infrastructure monitoring. But a graphical user interface that makes sense to non-technical users was also a key reason to choose Instana.

“I’m a sucker for a good user interface,” Fite says. “But it can also help me explain to other people on our team, especially people who are less technically savvy than me, that we have fixed an issue.”

Using Instana

ExaVault uses Instana to monitor API performance and for error tracking, debugging and alerting. The most important metric ExaVault looks at on a day-to-day basis is latency. “We need to make sure every customer is having a good experience,” Fite says. “If a customer is waiting more than a couple seconds, they might leave.”

With Instana, though, Fite doesn’t have to actually look at the dashboard all day. Instead, Instana sends an alert to a dedicated Slack channel if anything is out of the ordinary.

When it comes to account-level monitoring, ExaVault uses the Instana SDK to assign metadata to each API call as it comes in. As a result, Fite can filter by a huge number of variables. The most common use case, though, is filtering by account or even by individual users in an account. “If a user is having a problem that we don’t see at the high level, we can drill down and really troubleshoot just looking at their information,” Fite says.

“If a user is having a problem that we don’t see at the high level, we can drill down and really troubleshoot just looking at their information,”

Since ExaVault started using Instana in April of 2019, the mean time to resolution for customer-impacting bugs has dropped by 56.6 percent. In addition, the platform’s slow or downtime has decreased substantially. It was at 99.51% uptime before, it’s now at 99.99%. “We’re accomplishing the goal that we set out to do,” Fite explains. “The reason we were able to do that is we had better visibility into our problems.”

In some cases, there were bugs ExaVault didn’t even know existed before using Instana. Within days of getting set up with Instana, ExaVault realized there was a bug in the software that was querying the memory cache too frequently, and wasn’t saving correctly. Fixing the previously-invisible bug immediately reduced the load on application servers.

The future

“Our tech debt has decreased because we’re able to get through stuff a lot faster,” Eddie Castillo, head of marketing at ExaVault, explains. “Our team is able to dedicate more time towards new features and road map planning, instead of smashing bugs all day.”

There are a few major projects on the horizon. Without the robust internal testing possible with Instana, Fite would be a lot more worried about the potential for bugs to slip as they deploy improvements to the API. “Instana is going to help us ensure that the changes work better than the current version, “ Fite says.

ExaVault is also working on moving from a homegrown container orchestration system to Kubernetes. Lastly, ExaVault is excited to start using Instana’s deployment tracking to compare performance metrics before and after deployments in the future.

“With the upcoming roadmap, if we didn’t have these tools, it would be impossible to keep an eye on our tech stack,” Castillo says. “Tom used to have a million terminal windows open on his desktop. But having these tools in place, it gives us visibility as we diversify and add more complexity to our overall architecture.”