According to AWS:
“[Graviton2 is] custom built by Amazon Web Services using 64-bit Arm Neoverse cores to deliver the best price performance for your cloud workloads running in Amazon EC2”
At Instana we are looking at ARM with extreme interest, and we decided to port our PHP tracer to AWS Graviton2. I wanted to share what we learned in the process, both in terms of what we needed to do, as well as the performance analysis we performed to validate the port and comparing the results to AWS’s own benchmark for PHP on Graviton.
What are ARM, RISC and CISC?
Advanced RISC Machine (ARM) is a family of architectures for Central Processing Unit (CPU) architectures. ARM instruction sets are grouped in the Reduced Instruction Set Computing (RISC) category of CPU architectures, as opposed to the Complex Instruction Set Computer (CISC) category to which x86 and x64 belong and that power most of the personal computers and cloud servers out there.
Mobile computing (like the phones you hold in your hands) and embedded systems have historically been heavily leaning towards ARM architectures. The simpler instruction sets lend themselves better to small System on a Chip (SoC) designs, that is, highly-integrated, chipsets designed to optimize their physical size and energy consumption. However, the x86 and x64 have been powering most of the personal computers and cloud servers out there, and the two go hand-in-hand: albeit oversimplifying the argument a fair bit, developers like to minimize the cost and risk of deploying software (and the associated bugs) by developing on hardware as similar to the one they can develop on as possible. While many runtimes do a fairly good job or being portable (most Java apps will run similarly on JVMs built for different architectures) or offer cross-platform compilation (see, for example, Go), there is a price to pay in using different architectures at different stages like development and production in terms of complexity of tooling and build systems, cognitive complexity, harder-to-troubleshoot bugs, and more.
Why ARM matters
Going back to AWS Graviton2, there are probably many reasons why AWS came up with their own CPU, which is no small undertaking by any standard of measurement. Optimization is certainly one, both in terms of computing power provided, as well as electricity consumption. There is probably also a solid side of economy of scale and saving as compared to buying the components from vendors. As a matter of fact, Amazon is far from the only Cloud provider or large SaaS vendor running “on their own silicon” (meaning: using hardware built in-house).
Together with the release of Apple-based ARM laptops using Apple Silicon, the writing is on the wall: there is a pretty solid chance that, in the next few months and years, we will see much, much more ARM in the cloud and maybe even private data centers than we do today.
You can expect more and more of Instana to become available to monitor ARM-based hosts, containers and processes in 2021. After all, our agent is written in highly-portable Java and has been supporting Linux hosts running on aarch64 (a general term for 64-bit ARM architectures) for years. Some of our tracers have architecture-specific components to them, like .NET Core and Python, but we think ARM is here to stay.
Porting Instana PHP extension to AWS Graviton2
All in all, porting the Instana PHP extension to AWS Graviton2 was a rather simple matter. Instana provides support for AutoTrace, tracing each and every request served by your PHP processes, the processing calling them, and dependencies like databases and other services, as well as AutoProfile, Instana’s production-ready, always-on, low-overhead profiler.
AutoProfile for PHP in action, showing you where you CPU cycles are spent.
Little coding effort, much build pipeline refactoring
There was virtually nothing we needed to change in the code of the tracer itself, so much so that our engineers had a build already running and passing a first battery of tests in less than an afternoon. Of course, we needed to invest quite some time in terms of build automation, integration tests and delivery of another build on an architecture unknown to our previous build.
The PHP build pipeline in Concourse CI, shaping those bits up. Since the changes in the build pipeline needed to support a new architecture were bound to be expensive, we took the chance to move our build pipeline from a rather monolithic Jenkins job, to a Concourse CI pipeline (we positively love the functional feeling of Concourse!) that runs on a GKE cluster, builds and tests locally the x86 and x64 builds, while delegating the aarch64 builds and tests to Google Cloud Build.
Comparable binary size
One thing we were wondering when getting into the porting effort, is whether the size of the resulting binary would differ much. After all, aarch64 is a vastly different instruction set than x86. However, it turned out that the difference is rather negligible; let us compare the builds for the Instana PHP extension v1.14.0 on different architectures:
- x86-glibc: 642.81 KB
- x86-musl: 636.54 KB
- aarch64: 646.66 KB
The x86 architecture is built for two different flavours of the Standard C Library, specifically the glibc (the GNU C Library) flavor in use in most linux distributions like Debian and RedHat derivatives, and muslc for Alpine. For aarch64, we currently support Amazon Linux 2 and Ubuntu, which is itself glibc-based. We intend to support other operating systems, and especially Amazon Linux 2, but we first need to overcome some pesky business with mismatching glibc versions on AWS Code Build.
Performance of PHP on Graviton2
We run a series of load tests using the same PHP application on PHP 7 running on:
- a Graviton-based m6g.xlarge EC2 host
- an amd64-based m5.xlarge EC2 host
All the hosts we used for the tests run the latest Amazon Linux 2 and have access to 4 vCPUs and 16 GB of RAM. Our tests are similar in spirit to those run by AWS, but we used a custom, simple web application, rather than the Zend Benchmark.
The source code of the application and our test scripts are available on GitHub. We used bombardier, a Go-based HTTP(S) benchmarking tool beloved at Instana, to fire for 60 seconds 100 concurrent requests to the PHP servers (./bombardier -d 60s -c 100 http://127.0.0.1/de/blog/).
We also took the opportunity to benchmark different combinations of Instana monitoring: no PHP monitoring at all, PHP AutoTrace activated, and both PHP AutoTrace and our shiny, new PHP AutoProfile activated. PHP AutoProfile is one of the latest additions to Instana’s profiling capabilities, that allows you to collect CPU, ThreadWait and Memory statistics from your PHP processes. It is designed to have extremely low overhead, and be always on in production.
We repeated each test 5 times and threw away the best and worst results in order to smooth out potential outliers, but the results were pretty close with one another. The aggregated results are as follows:
|Architecture||No Instana monitoring||Instana AutoTrace||Instana AutoTrace and Instana AutoProfile|
4 vCPUs, 16 GB Ram
4 vCPUs, 16 GB Ram
The results are pretty interesting:
- PHP 7 is indeed faster on m6g.xlarge machines than on m5.xlarge.
- Instana AutoTrace on x86 is blazing fast, with barely a 1% reduction in throughput
- Instana AutoTrace on Graviton2 has a slightly higher overhead than on x86, and we’ll look into bringing it on par with x86.
- The overhead of Instana AutoProfile is negligible in both scenarios, so much so that for the m5.xlarge case, the tests seem to run a little faster with AutoProfile than without, but it is actually within statistical error margins.
AWS Graviton2 is an interesting development for ARM adoption in Cloud Computing. Instana ported its PHP AutoTrace and PHP AutoProfile technologies to AWS Graviton2 to offer the best-in-class PHP tracing and profiling in the brave, new ARM-based PHP world.
It is time for you to reap the benefits in terms of performance and price that AWS Graviton2 has to offer for your PHP workloads. And if you do not have Instana just yet, a free trial of Instana, no string attached, is just one click away.
- It is seldom in IT that you make a computer do more stuff, and be faster at it than before. But it occasionally happens. When we ran load tests of our Java tracer on the Spring Music application, the Java tracer actually made the application faster because the structure of the Instana Java instrumentation helped the Java Virtual Machine perform better Just-In-Time (JIT) compilation. But that is a story for another blog post.