Cloud-Native System Performance - Series Introduction 💡
5 min read

Cloud-Native System Performance - Series Introduction 💡

Cloud-Native System Performance - Series Introduction 💡
Photo by Florian Krumm / Unsplash

Web applications are built worldwide to provide various online services to end-users. To think of it from scratch, developing and hosting these services involves hard work and talent. It all begins from the inception of an idea.

But imagine, after putting all that hard work, users cringing about the performance of the system - “It is too slow..”, “I wish I could get the response in this lifetime..”, “The product is good, but not really worth waiting..”, etc. and they go on. On the other hand, if you decide to provide them with the best performance by pimping the poorly architected system - then the infrastructure costs can sore high.

We will also see how making the right trade-offs matter. Music concerts are filled with people wanting to enjoy their favorite acts live. There are so many audio parameters associated with every line of input and output that runs across the stage, and those need to be set at an optimum level. Blasting everything to its full level makes people go away from the concert. Of course, this is not the artist’s fault - but the sound engineer who makes them sound good.

After all, it is a production system - similar to IT production environments. In IT, managing the system performance essentially means managing the tradeoffs well. Of course, there are obvious choices, but at times, making those obvious choices is not so obvious.

When it comes to architecting any IT system, performance is one of the key aspects. This blog post announces a series of posts dedicated to Cloud-Native System Performance. We will divide this series into three parts (excluding this introductory post).

  1. Cloud-Native System Performance - Series Introduction (This post)
  2. Cloud-Native System Performance - Improvising Compute
  3. Cloud-Native System Performance - Better Approaches To Storage & Memory
  4. Cloud-Native System Performance - Better Network Performance

I shall append this list as and when required. For the sake of this post, let us take a look at some of the challenges posed in the above areas and a few pointers toward cloud-native solutions.

I have compiled this series into a FREE eBook with more details and deeper insights. Link below! (PDF & ePub formats)

Latency and throughput

But before we do that, let us understand how the performance matters. The most significant impact underperforming systems have is on user experience. The attention spans have reduced. You only have a few seconds to hook the audience with your offerings unless they are in desperate need of using your service.

Even if they are desperate, you are not the only one providing this service on the free internet. It becomes crucial for the modern systems to perform well within seconds in both cases.

Managing performance is like a double-edged sword. If you choose to ignore it, it will cost you your customers. If you overdo it, it will cost you a fortune.

In simple words, latency is the time consumed by the system between a request and its response. The word is used with a notion of ‘delay’. It is a round trip that starts from the moment a user clicks on “submit” until they get the information.

Throughput on the other hand is the amount of processing that was done by a component in a given amount of time. It can be associated with any aspect of the system -

  1. A processor processes a certain number of instructions in a given time
  2. A network connection can transfer a certain amount of data from point A to B
  3. A database produces a join with a certain number of tables, rows, and columns

Every aspect of cloud-native system architecture deals with a certain throughput, that is directly associated with the amount of time required to perform a certain task.

Both latency and throughput - deal with time. All types of systems inherently spend time for traversing the route, processing the data, and responding with a relevant response. Although today this time is consumed into nano and milliseconds, it has the potential to bounce off the user interest.

Web applications are not as easy as they seem on the front. There are a lot of hops, both in the public and private networks. There are many processing steps, memory IO, security gates, etc. that can add up to this time. In general, when the system takes long - it is said to have high latency and low throughput.

It is desired to have the lowest latency, and the highest throughput possible. Latency and throughput are inversely proportional. The table below summarizes the desired and undesired states.

Desired Not Desired
Latency Low High
Throughput High Low

Big picture

The diagram below represents a simple scenario with various components. A user uses their personal computer and requests to access certain data from a database within a private network. The request is routed via the internet to the web application within the private network, which hands it over to the business logic processing microservice cluster. Response travels the same path backward.

Tuning the system performance requires the architects and developers to think and implement solutions with that mindset. It is not just a matter of following specific rules and be sure about the system to deliver outstanding performance.

Writing better code alone is not enough. Normalizing the database tables alone for quick query results is not enough. Using high network bandwidth alone is not enough. Using extra CPU cores alone is not enough.

Moreover, the interdependencies between the above concepts can often cause failure at multiple points in exchange for success at one point. For example, improving the code to avoid context switching, implementing async routines, and using better algorithms improve compute performance. However, on the other hand, if the infrastructure chosen for the database cannot cope, it can build long processing queues. Thus, we have either shifted the performance from one component to another.

To improve the performance of any system, we are required to think about the big picture. Analysis and balanced improvisation of all the components are essential to eliminate the bottlenecks; otherwise we end up transferring the bottleneck from one part of the system to another.

This post addresses performance challenges posed in 3 main areas - compute, storage & memory, and network. The focus will be on achieving the desired state of latency and throughput. Below is a quick summary of each of the topics.



  1. Efficient resource utilization
  2. Application design
  3. Context switching
  4. Queueing
  5. Heap memory
  6. Concurrency
  7. Deadlocks
  8. Garbage collectors
  9. Hardware


  1. VMs, Containers, Serverless
  2. Autoscaling
  3. Smaller process
  4. Avoid context switching
  5. Optimize queries
  6. Single-threaded model
  7. Thread pool size
  8. Addressing deadlocks
  9. Better hardware

Memory & Storage


  1. Database schemas
  2. Data structures
  3. Disk IO
  4. Heap memory
  5. Buffer memory for databases


  1. Connection pool
  2. Maximizing throughput
  3. Buffer memory optimization
  4. Sequential and indexed database
  5. Compute over storage
  6. Leverage caching
  7. Minimize lock contention
  8. Better hardware



  1. Caching
  2. Hardware
  3. Size of data in transit
  4. Encryption delays


  1. Caching
  2. Compression
  3. Encryption mechanism
  4. SSL caching
  5. Reverse proxy optimization
  6. Better hardware

I will update the list of topics given above with better information in due course of time. Meanwhile, stay tuned for further posts in this “Cloud Native System Performance” series.

I have compiled this series into a FREE eBook with more details and deeper insights. Link below! (PDF & ePub formats)