Redis, PHP, Node.JS AWS Graviton 2 [C6g/M6g] processor performance vs C5, M5, A1 instances

M6G(large): AWS Graviton2 Processor, 2 vCPU, 8GiB memory, up to 10 Gigabit networking performance — $56.21 (US-WEST-2 OREGON)

or

C6G(large): AWS Graviton2 Processor, 2 vCPU, 4GiB memory, up to 10 Gigabit networking performance — $49.64 (US-WEST-2 OREGON)

M5(large): Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz (Skylake), 2 vCPU, 8 GiB memory, up to 10 Gigabit networking performance —$70,08 (US-WEST-2 OREGON)

A1(large): 2.3 GHz AWS Graviton Processor, 2 vCPU, 4 GiB memory, up to 10 Gigabit networking performance — $37.23 (US-WEST-2 OREGON)

C5(large): 3 GHz Intel Xeon Platinum 8124M, 2 vCPU, 4 GiB memory, up to 10 Gigabit networking performance — $62.05 (US-WEST-2 OREGON)

T3(micro): Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz, 2 vCPU, 1 GiB memory — $7.59

C5.large Redis performance

====== GET ======
100000 requests completed in 1.02 seconds
100 parallel clients
3 bytes payload
keep alive: 1100.00% <= 1 milliseconds
100.00% <= 1 milliseconds
97656.24 requests per second

A1.large Redis performance

====== GET ======
100000 requests completed in 1.69 seconds
100 parallel clients
3 bytes payload
keep alive: 194.96% <= 1 milliseconds
100.00% <= 1 milliseconds
59066.75 requests per second

T3.large Redis performance

====== GET ======
100000 requests completed in 1.27 seconds
100 parallel clients
3 bytes payload
keep alive: 199.94% <= 1 milliseconds
100.00% <= 1 milliseconds
78926.60 requests per second

Azure B analog (Intel(R) Xeon(R) CPU E5–2673 v3 @ 2.40GHz):

====== GET ======
100000 requests completed in 1.35 seconds
100 parallel clients
3 bytes payload
keep alive: 1
host configuration “save”: 900 1 300 10 60 10000
host configuration “appendonly”: no
multi-thread: no
67.37% <= 1 milliseconds
99.16% <= 2 milliseconds
99.92% <= 3 milliseconds
100.00% <= 3 milliseconds
74128.98 requests per second

C6g.lage Redis Performance

====== GET ======
100000 requests completed in 0.60 seconds
100 parallel clients
3 bytes payload
keep alive: 1
host configuration "save":
host configuration "appendonly": no
multi-thread: no
0.00% <= 0.2 milliseconds
67.49% <= 0.3 milliseconds
98.99% <= 0.4 milliseconds
99.33% <= 0.5 milliseconds
99.59% <= 0.6 milliseconds
99.69% <= 0.7 milliseconds
99.75% <= 0.8 milliseconds
99.90% <= 0.9 milliseconds
99.95% <= 1.1 milliseconds
100.00% <= 1.1 milliseconds
168067.22 requests per second
====== GET ======
100000 requests completed in 0.70 seconds
100 parallel clients
3 bytes payload
keep alive: 1
host configuration "save": 900 1 300 10 60 10000
host configuration "appendonly": no
multi-thread: no
0.00% <= 0.2 milliseconds
0.02% <= 0.3 milliseconds
98.83% <= 0.4 milliseconds
99.06% <= 0.5 milliseconds
99.17% <= 0.6 milliseconds
99.38% <= 0.7 milliseconds
99.44% <= 0.8 milliseconds
99.49% <= 0.9 milliseconds
99.65% <= 1.0 milliseconds
99.74% <= 1.1 milliseconds
99.85% <= 1.2 milliseconds
99.92% <= 1.3 milliseconds
100.00% <= 1.3 milliseconds
143061.52 requests per second

Raspberry Pi Redis Raspberry Pi 4 Model B Rev 1.2 performance

====== PING_INLINE ======
100000 requests completed in 4.08 seconds
100 parallel clients
3 bytes payload
keep alive: 1
====== SET ======
100000 requests completed in 4.06 seconds
100 parallel clients
3 bytes payload
keep alive: 1
0.00% <= 1 milliseconds
39.47% <= 2 milliseconds
95.92% <= 3 milliseconds
99.10% <= 4 milliseconds
99.75% <= 5 milliseconds
99.87% <= 6 milliseconds
99.99% <= 7 milliseconds
100.00% <= 7 milliseconds
24606.30 requests per second
====== GET ======
100000 requests completed in 4.09 seconds
100 parallel clients
3 bytes payload
keep alive: 1
0.00% <= 1 milliseconds
35.71% <= 2 milliseconds
96.01% <= 3 milliseconds
99.31% <= 4 milliseconds
99.79% <= 5 milliseconds
99.90% <= 6 milliseconds
100.00% <= 6 milliseconds
24461.84 requests per second

Digital Ocean Droplet Redis Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz

====== SET ======
100000 requests completed in 1.79 seconds
100 parallel clients
3 bytes payload
keep alive: 1
61.06% <= 1 milliseconds
97.56% <= 2 milliseconds
98.74% <= 3 milliseconds
99.27% <= 4 milliseconds
99.43% <= 5 milliseconds
99.46% <= 6 milliseconds
99.91% <= 7 milliseconds
100.00% <= 8 milliseconds
100.00% <= 8 milliseconds
55928.41 requests per second
====== GET ======
100000 requests completed in 1.73 seconds
100 parallel clients
3 bytes payload
keep alive: 1
67.08% <= 1 milliseconds
99.12% <= 2 milliseconds
99.39% <= 3 milliseconds
99.90% <= 4 milliseconds
100.00% <= 5 milliseconds
100.00% <= 5 milliseconds
57770.08 requests per second
Image for post
Image for post
Redis ARM/Intel Performance

PHP test simple operations

<?php $start = microtime(TRUE); /* Start of the code to profile */ for ($a = 0; $a < 10000000; $a++) { $b = $a*$a; } /* End of the code to profile */ $end = microtime(TRUE); echo "The code took ". ($end - $start) . "seconds to complete.";

C6g.large PHP performance (ARM GRAVITON 2 AWS)

The code took 0.26189398765564 seconds to complete.
The code took 0.25753402709961 seconds to complete.
The code took 0.26207208633423 seconds to complete.
The code took 0.25553894042969 seconds to complete.
The code took 0.26356601715088 seconds to complete.
The code took 0.25494980812073 seconds to complete.
The code took 0.25964903831482 seconds to complete.
The code took 0.26054716110229 seconds to complete.
The code took 0.27059602737427 seconds to complete.
The code took 0.25764012336731 seconds to complete.
The code took 0.2580840587616 seconds to complete.
The code took 0.26006889343262 seconds to complete.

A1.large PHP performance (ARM GRAVITON 1 AWS)

The code took 0.38810396194458 seconds to complete.
The code took 0.38808822631836 seconds to complete.
The code took 0.38812589645386 seconds to complete.
The code took 0.38790202140808 seconds to complete.
The code took 0.38788604736328 seconds to complete.
The code took 0.38789796829224 seconds to complete.
The code took 0.387943983078 seconds to complete.
The code took 0.38829302787781 seconds to complete.
The code took 0.38884282112122 seconds to complete.
The code took 0.38802289962769 seconds to complete.

C5.large PHP performance

The code took 0.10848188400269 seconds to complete.
The code took 0.10874104499817 seconds to complete.
The code took 0.10939693450928 seconds to complete.
The code took 0.10821604728699 seconds to complete.
The code took 0.10948896408081 seconds to complete.
The code took 0.10880613327026 seconds to complete.
The code took 1.0755159854889 seconds to complete
The code took 1.065523147583 seconds to complete
The code took 1.0670900344849 seconds to complete.
The code took 1.0554091930389 seconds to complete.
Image for post
Image for post
Digital Ocean 8168 vs AWS 8124M performance

Raspberry Pi 4 Model B 4GB PHP 7.3 performance test

The code took 0.65728902816772 seconds to complete
The code took 0.65119695663452 seconds to complete.
The code took 0.65681409835815 seconds to complete.
The code took 0.65350794792175 seconds to complete.
The code took 0.68937587738037 seconds to complete.
The code took 0.13071990013123seconds to complete.
The code took 0.13261103630066seconds to complete.
The code took 0.1342921257019seconds to complete.
The code took 0.13397622108459seconds to complete.
The code took 0.13413882255554seconds to complete.
The code took 0.13776922225952seconds to complete.
The code took 0.13627886772156seconds to complete.
The code took 0.13335800170898seconds to complete.
The code took 0.13527703285217seconds to complete.
The code took 0.13746809959412seconds to complete.

T3a.micro PHP performance (AMD EPYC 7571)

The code took 0.14892196655273 seconds to complete.
The code took 0.14771103858948 seconds to complete.
The code took 0.15076804161072 seconds to complete.
The code took 0.14975309371948 seconds to complete.
The code took 0.15051293373108 seconds to complete.
The code took 0.15107202529907 seconds to complete.
The code took 0.14903211593628 seconds to complete.
The code took 0.15025115013123 seconds to complete.
The code took 0.15164184570312 seconds to complete.
The code took 0.15445518493652 seconds to complete.
The code took 0.15240597724915 seconds to complete.

T3.micro PHP performance

The code took 0.19494104385376 seconds to complete.
The code took 0.12496709823608 seconds to complete.
The code took 0.15236806869507 seconds to complete.
The code took 0.11733198165894 seconds to complete.
The code took 0.11597108840942 seconds to complete.
The code took 0.15004992485046 seconds to complete.
The code took 0.11619806289673 seconds to complete.
The code took 0.11625099182129 seconds to complete
The code took 0.12474203109741 seconds to complete.
The code took 0.13158011436462 seconds to complete.

Not stable complete-time 0.1949–0.1159

The code took 0.13957500457764seconds to complete.
The code took 0.1271390914917seconds to complete.
The code took 0.12476801872253seconds to complete.
The code took 0.13645815849304seconds to complete.
The code took 0.12669396400452seconds to complete.
The code took 0.14442014694214seconds to complete.
The code took 0.12997102737427seconds to complete.
The code took 0.12927913665771seconds to complete.

R5.large PHP performnce (Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz)

The code took 0.098772048950195 seconds to complete.
The code took 0.098747968673706 seconds to complete.
The code took 0.11380887031555 seconds to complete.
The code took 0.09879994392395 seconds to complete.
The code took 0.098779201507568 seconds to complete.
The code took 0.098815202713013 seconds to complete.
The code took 0.098834037780762 seconds to complete.

M5.xlarge PHP performance (Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz)

The code took 0.12364792823792 seconds to complete
The code took 0.12148904800415 seconds to complete
The code took 0.12157297134399 seconds to complete
The code took 0.12155199050903 seconds to complete
The code took 0.12163090705872 seconds to complete
The code took 0.12223196029663 seconds to complete
The code took 0.1344678401947 seconds to complete
The code took 0.12149095535278 seconds to complete
Image for post
Image for post
PHP performance / AWS instances

With the Redis benchmark Arm cores beat the x86 processors (60–70% faster), whilst on PHP benchmark they’re 2x behind.

What we can check also. C5 is an Intel processor. It has hyper-threading technology. If you are doing a small amount of same long computations, hyperthreading might make it slower.

Suppose that your CPU could do a maximum of 3.5GHz of PHP Magento things, but since it only has one thread/PHP process, it could only do one thing at a time. And the time required to switch from one thread/process to another is about the same.

What the hyperthreading does is that it split the power of one physical CPU into two threads, so that each (PHP, MySQL) thread could lift 1.75GHz/hand.

Let's test PHP performance under the all threads load.

We have M5.xlarge = 4 vCPU = 2 Physical Cores Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz, threads per core (4 threads total). Theoretically, it is 1.25GH per thread when 2 threads doing long PHP execution;

we are running the same command 4 times:

php test.php & php test.php & php test.php & php test.php

Result

The code took 0.2168140411377 seconds to complete.
The code took 0.21747493743896 seconds to complete.
The code took 0.22052907943726 seconds to complete.
The code took 0.22053408622742 seconds to complete.

single command execution time is

The code took 0.1217930316925 seconds to complete.

Run 2 threads:

$ php test.php & php test.php 
The code took 0.12186980247498 seconds to complete.
The code took 0.12197995185852 seconds to complete.

As a result, we can see that multi-threaded load kills PHP performance because it uses and splits performance the same physical Core.

PHP performance per virtual cores

Image for post
Image for post

Magento Commerce Cloud Performance issue

This is a typical issue of Magento Commerce Cloud. The starter cloud plan has 4 virtual cores (2 threads per 2 CPU) with everything running on it. Unoptimized PHP Magento 2 Core code, MySQL with heavy EAV queries, Redis, ElasticSearch, Java, HaProxy, Nginx, ZooKeeper, Magento's heavy Crons, RebitMQ, Docker, NewRelic, GlusterFS Network File Server, other stuff and all these infrastructure elements run twice x2 (production and staging share the same instance). And all these loads 2 outdated physical cores (M4 instance: 2.4 GHz Intel Xeon E5–2676 v3 (Haswell)) without auto-scaling… Imagen the load on that CPU by single HTTP call or by Google bot Crawling..

No comments! How this solution can work properly? And Adobe/Magento gives a one-year free trial on this junk or you need to pay $1500+ with original resource price 200$.

Before starting let’s check the lscpu stat

# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0–31
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2

Here it shows that there are 2 threads per core so we know most likely hyperthreading is enabled

The following files will show all of the logical CPU’s and their HT pair relationships

# grep -H . /sys/devices/system/cpu/cpu*/topology/thread_siblings_list

To determine which CPUs should be disabled, the threads running on the same CPU core have to be identified. The files /sys/devices/system/cpu/cpuN/topology/thread_siblings_list where N is the CPU socket number. This file will contain the logical (HT) CPU numbers for each physical socket.

:~ $ grep -H . /sys/devices/system/cpu/cpu*/topology/thread_siblings_list
/sys/devices/system/cpu/cpu0/topology/thread_siblings_list:0,2
/sys/devices/system/cpu/cpu1/topology/thread_siblings_list:1,3
/sys/devices/system/cpu/cpu2/topology/thread_siblings_list:0,2
/sys/devices/system/cpu/cpu3/topology/thread_siblings_list:1,3

This means that CPU0 and CPU2 are threads on the same core. The same for 1 and 3 and so on. Individual, logical HT CPUs could be turned off as needed for a specific application that is bound to a physical core.

Or the following script would disable all of them, from logical CPU through 2–3

#!/bin/bash
for i in {2..3}; do
echo “Disabling logical HT core $i.”
echo 0 > /sys/devices/system/cpu/cpu${i}/online;
done

Run it from sudo user:

:~ $ sudo bash disable_ht.sh 
Disabling logical HT core 2.
Disabling logical HT core 3.

Run tests 1 process:

php test.php 
The code took 0.12305116653442 seconds to complete.

Run tests 2 processes:

php test.php & php test.php 
The code took 0.12328386306763 seconds to complete.
The code took 0.12351393699646 seconds to complete.

Run test 3 processes:

The code took 0.13314199447632 seconds to complete.
The code took 0.17887496948242 seconds to complete.
The code took 0.18733596801758 seconds to complete.

Run test 4 processes:

The code took 0.24601101875305 seconds to complete.
The code took 0.24705982208252 seconds to complete.
The code took 0.24740982055664 seconds to complete.
The code took 0.24883317947388 seconds to complete.

Hyperthreading improves multiprocessing on the same core, on 10%-20% however it doesn’t work as a separate processor.

To enable hyper-threading again run:

#!/bin/bash
for i in {2..3}; do
echo “Disabling logical HT core $i.”
echo 1 > /sys/devices/system/cpu/cpu${i}/online;
done

Redis Performance without hyper-threading:

====== GET ======
100000 requests completed in 0.96 seconds
100 parallel clients
3 bytes payload
keep alive: 1
100.00% <= 0 milliseconds
103950.10 requests per second

Redis Performance with hyper-threading:

====== GET ======
100000 requests completed in 1.00 seconds
100 parallel clients
3 bytes payload
keep alive: 1
99.95% <= 1 milliseconds
100.00% <= 1 milliseconds
99700.90 requests per second
Image for post
Image for post

Let's create C6g again

2 processes:

php test.php & php test.php
The code took 0.26301407814026 seconds to complete.
The code took 0.26329898834229 seconds to complet

4 processes:

php test.php & php test.php & php test.php & php test.php
The code took 0.50821781158447 seconds to complete.
The code took 0.51693296432495 seconds to complete.
The code took 0.53189015388489 seconds to complete.
The code took 0.53975009918213 seconds to complete.

Graviton 2 NodeJS v14.4.0 performance test

var start = (Date.now() % 1000) / 1000;/* Start of the code to profile */
for (a = 0; a < 10000000; a++)
{
b = a*a;
}
/* End of the code to profile */
var end = (Date.now() % 1000) / 1000;console.log( “The code took “ + (end — start) + “ seconds to complete.”);

M5.xlarge

node test.js 
The code took 0.10499999999999998 seconds to complete.

C6g.large

node test.js
The code took 0.137 seconds to complete.
Image for post
Image for post

Load on all CPUs

M5 4vCPU (2 CPU 4 threads)(140$)

The code took 0.22199999999999998 seconds to complete.
The code took 0.22899999999999998 seconds to complete.
The code took 0.26 seconds to complete.
The code took 0.28200000000000003 seconds to complete.
The code took 0.276 seconds to complete.

C6g 2 vCPU (2 CPU 2 threads) (40$)

2 processes

node test.js & node test.js
[3] 14716
The code took 0.1369999999999999 seconds to complete.
The code took 0.1389999999999999 seconds to complete.

4 processes

 node test.js & node test.js & node test.js & node test.js

The code took 0.277 seconds to complete.
The code took 0.27899999999999997 seconds to complete.
The code took 0.277 seconds to complete.
The code took 0.274 seconds to complete.

Graviton 2 processor is almost 300% cheaper for the Node JS workload with almost the same performance. Because of non-blocking nature, NodeJS code performance notes critical, because, you can save on time on I/O (Network, File) operations.

When Just looping test has the same performance.

var start = (Date.now() % 1000) / 1000;/* Start of the code to profile */
for (a = 0; a < 10000000; a++)
{

}
/* End of the code to profile */
var end = (Date.now() % 1000) / 1000;console.log( “The code took “ + (end — start) + “ seconds to complete.”);

Raspberry Pi 4 Model B NodeJS 10.21

installed from the official repo performance

The code took 0.08500000000000002 seconds to complete.
The code took 0.08499999999999999 seconds to complete.
The code took 0.08500000000000008 seconds to complete.
The code took 0.08500000000000002 seconds to complete.
The code took 0.08500000000000008 seconds to complete.

Written by

Magento/APP Cloud Architect. Melting metal server infrastructure into cloud solutions.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store