Load Testing

What Is Load Testing?

Load testing is a type of performance testing that simulates realistic user traffic against an application to measure how it behaves under expected and peak load conditions. The goal is to identify performance bottlenecks, determine maximum capacity, and verify that the system meets its response time and throughput requirements before users encounter problems in production.

Unlike functional testing, which asks “does this feature work correctly?”, load testing asks “does this feature work correctly when 10,000 users are using it simultaneously?” A search endpoint that responds in 50 milliseconds for a single request might take 5 seconds when 500 users search at the same time. A database query that is instant on a test dataset might lock up when the production table has 10 million rows. Load testing reveals these performance cliffs before they cause outages.

Load testing covers several sub-categories. Baseline testing establishes normal performance under typical load. Stress testing pushes the system beyond normal capacity to find its breaking point. Soak testing (or endurance testing) runs sustained load over hours or days to detect memory leaks, connection pool exhaustion, and other time-dependent degradation. Spike testing simulates sudden traffic surges, such as a flash sale or viral social media event, to verify the system handles rapid scaling.

How It Works

Load testing tools generate virtual users (VUs) that simulate real user behavior by sending HTTP requests, interacting with APIs, and following predefined workflows. The tool measures response times, error rates, and throughput while gradually increasing the number of virtual users.

Here is an example using k6, a popular open-source load testing tool:

// load-test.js
import http from "k6/http";
import { check, sleep } from "k6";

export const options = {
  stages: [
    { duration: "2m", target: 100 },  // ramp up to 100 users
    { duration: "5m", target: 100 },  // stay at 100 users
    { duration: "2m", target: 200 },  // ramp up to 200 users
    { duration: "5m", target: 200 },  // stay at 200 users
    { duration: "2m", target: 0 },    // ramp down to 0
  ],
  thresholds: {
    http_req_duration: ["p(95)<500"],  // 95th percentile < 500ms
    http_req_failed: ["rate<0.01"],    // error rate < 1%
  },
};

export default function () {
  // Simulate a user browsing products
  const products = http.get("https://staging.example.com/api/products");
  check(products, {
    "products status is 200": (r) => r.status === 200,
    "products response time < 500ms": (r) => r.timings.duration < 500,
  });

  sleep(1); // Think time between actions

  // Simulate searching
  const search = http.get(
    "https://staging.example.com/api/search?q=keyboard"
  );
  check(search, {
    "search status is 200": (r) => r.status === 200,
  });

  sleep(2);

  // Simulate adding to cart
  const cart = http.post(
    "https://staging.example.com/api/cart",
    JSON.stringify({ productId: 42, quantity: 1 }),
    { headers: { "Content-Type": "application/json" } }
  );
  check(cart, {
    "cart status is 201": (r) => r.status === 201,
  });

  sleep(1);
}

Running this with k6 run load-test.js produces output like:

scenarios: (100.00%) 1 scenario, 200 max VUs, 16m30s max duration

     ✓ products status is 200
     ✓ products response time < 500ms
     ✓ search status is 200
     ✓ cart status is 201

     http_req_duration...: avg=127ms  min=23ms  p(90)=289ms  p(95)=412ms
     http_req_failed.....: 0.34%
     http_reqs...........: 45231  47.12/s
     vus.................: 200    min=0   max=200

In Python, you can use Locust for load testing:

# locustfile.py
from locust import HttpUser, task, between

class ShopUser(HttpUser):
    wait_time = between(1, 3)

    @task(3)
    def browse_products(self):
        self.client.get("/api/products")

    @task(2)
    def search(self):
        self.client.get("/api/search?q=keyboard")

    @task(1)
    def add_to_cart(self):
        self.client.post("/api/cart", json={
            "productId": 42, "quantity": 1
        })

Running locust -f locustfile.py --host=https://staging.example.com starts a web interface where you can configure the number of users and ramp-up rate, then watch real-time charts of response times, request rates, and failures.

Why It Matters

Performance problems are among the most expensive bugs to fix in production. A slow page load increases bounce rates — Google has documented that a 100-millisecond delay in page load time reduces conversion rates by 7%. An application that crashes under peak load during a product launch or holiday sale can cost millions in lost revenue and damage brand reputation.

Load testing catches these problems before they reach users. By simulating realistic traffic patterns in a staging environment, teams can identify bottlenecks (slow database queries, under-provisioned servers, inefficient algorithms), determine scaling limits (the maximum users the system can serve within acceptable response times), and validate that autoscaling policies work correctly.

Load testing also provides data for capacity planning. When you know that your application handles 5,000 concurrent users with 200ms response times on the current infrastructure, you can estimate the infrastructure needed for projected growth. Without load testing data, capacity planning is guesswork.

Best Practices

Test with realistic scenarios. Virtual users should behave like real users: browse, search, pause (think time), and perform actions in realistic proportions. A load test where every user hammers the same endpoint is not realistic.
Set clear performance budgets. Define acceptable thresholds before testing: “95th percentile response time under 500ms” and “error rate below 1%.” Fail the test if thresholds are exceeded, just as you would fail a functional test.
Test against a production-like environment. Load testing results are only meaningful if the test environment closely matches production hardware, configuration, and data volume. Testing against a single-node staging server tells you nothing about production capacity.
Run load tests regularly. Performance can regress with any code change. Include load tests in CI (at reduced scale) and run full-scale tests before major releases.
Monitor infrastructure during tests. Collect CPU, memory, disk I/O, and network metrics alongside application response times. A high response time might be caused by CPU saturation, garbage collection pauses, or disk contention — infrastructure metrics reveal the root cause.

Common Mistakes

Testing with unrealistic data volumes. An API that is fast when the database has 1,000 rows might be slow with 10 million rows. Populate the test database with production-scale data to get realistic results.
Ignoring think time. Virtual users that send requests with no delay between them simulate a denial-of-service attack, not real user behavior. Include realistic think times (1-5 seconds between actions) to model actual usage patterns.
Only testing peak load. Many performance bugs manifest under sustained normal load, not just peak spikes. Memory leaks, connection pool exhaustion, and log file growth only appear after hours of continuous use. Include soak tests alongside spike tests.
Not testing failure scenarios. Load tests should verify graceful degradation: What happens when the database is slow? When a downstream service is unavailable? When the cache fails? A system that crashes instead of degrading gracefully under partial failure is a significant production risk.