Rethinking Performance Testing: The Value of Gatling

Gatling and test perfromance

Performance Testing is a non-functional testing technique that observes how a system behaves under controlled load to measure its stability, response times, and throughput. Tools like Gatling let you validate critical requirements, identify infrastructure bottlenecks, and understand how a system reacts when traffic gets serious.


The Strategic Importance of Performance Testing

A system that passes every functional test but buckles under load, in production, is a defective system — even if no bug ever shows up in the functional test reports. That’s where the whole reasoning around performance testing starts.

Unlike functional tests, which check what the system does (e.g. “Does sending a request return status 200?”), performance tests check how the system behaves under pressure (e.g. “With 500 concurrent users, does the API respond in under 300ms?”).

The different shades of load

Different scenarios call for different load profiles:

  • Load Test: checks whether the system holds up under expected production traffic sustained over time.
  • Stress Test: pushes the system beyond its expected limits to find the breaking point and observe how it degrades.
  • Soak (Endurance) Test: a constant load applied for hours or days to surface memory leaks or slow performance degradation.
  • Spike Test: examines how the system reacts to sudden, violent traffic surges, typical of promotional events or flash sales.

Validating SLAs, sizing infrastructure capacity properly, and — the part that actually makes the difference in practice — catching regressions before they reach the end user: that’s where these tests pay back the investment.


Tooling Landscape: A Choice of Philosophy

There’s no shortage of open source options: from veterans like Apache JMeter to more recent ones like k6 (Go/JS) and Locust (Python), down to minimal command-line utilities like ab (Apache Benchmark) handy for quick smoke tests. When it comes down to choosing, though, the real comparison almost always boils down to two opposing philosophies: JMeter’s GUI-driven approach versus Gatling’s code-first one.

Comparison Table: JMeter vs Gatling

FeatureApache JMeterGatling
ApproachGUI-based and XMLCode-first (Java, Kotlin, Scala)
Execution ModelOne dedicated thread per virtual userAsynchronous / Event-driven (Akka, Netty)
ConfigurationComplex JMX filesExpressive, readable DSL
ScalabilityLimited by JVM thread countVery high (thousands of users with few threads)
CI/CD IntegrationPossible, but cumbersomeNative (Maven, SBT, Gradle)
StrengthsWide plugin ecosystem, maturityGit versioning, code review, and raw performance

Focus on Gatling: Performance as Code

Gatling is an open source tool written entirely in Scala, runs on the JVM, and uses Netty as its network engine: this combination lets it simulate tens of thousands of virtual users even from a single local machine, because threads aren’t tied one-to-one to virtual users the way they are in JMeter. For developers working in Java/Spring Boot, the fact that you can treat it as a library — with an official Maven plugin to boot — makes test configuration and maintenance very fast.

Anatomy of a Gatling test

A test (or Simulation) in Gatling is organized around four key concepts:

  1. Simulation: The entry-point class that ties all the pieces together.
  2. Scenario: the logical sequence of actions each virtual user runs (e.g. login → search → purchase), typically with random pauses between calls to mimic real user behavior.
  3. Injection: the load profile, i.e. the curve of users over time — for instance, a linear ramp from 0 to 50 users per second over 30 seconds, followed by a constant-load phase.
  4. Assertions: the acceptance criteria (e.g. average response time, maximum error percentage) that determine whether the build passes or fails.

Feeders: dynamic data to simulate real traffic

A detail that makes a real difference in result quality is the way input data is injected. Reusing the same user over and over leads to false positives (e.g. anti–brute force lockouts) and to database-side caches that skew response times. Gatling’s feeder feature solves the problem: it reads a CSV or JSON file (for example, with 1,000 user credentials) and picks a row per virtual user, supplying dynamic variables to the scenario. More complex scenarios combine multiple feeders — one for credentials, one for business-entity IDs — so that each virtual user operates on a different, plausible context.


Practical Case

A practical example of Gatling usage was adopted for one of Bitrock’s main clients. The repository is structured to encourage code reuse and collaboration between whoever writes the tests and whoever maintains the application. The concrete goal, in a recent project, was to verify that an application based on Spring Boot, Java, and MySQL would hold up under a modest average load (around 0.7 requests per second, derived from roughly 15 million business operations per year) but with violent peaks — in the order of 3,000–4,000 requests per minute during promotional events.

Methodology and Structure

The repo is laid out as follows:

  • simulations/: contains classes specific to load profiles (Smoke, Load, Stress, Spike)
  • scenarios/: reusable call chains that model business flows.
  • resources/data/: CSV or JSON files (feeders) used to inject realistic data (credentials, product IDs) and prevent database caches from skewing the results.

A realistic scenario is rarely a single call: creating an appointment, for example, requires six API calls in sequence (login, search for available slots, reservation, confirmation, and so on), with values extracted from one response and reused in the following ones via Gatling’s session. Modeling these end-to-end flows is what makes the test representative of real traffic.

The execution flow is fully automated on the build side: the mvn gatling:test command kicks off the simulation, collects metrics, and generates an interactive HTML report with detailed statistics on percentiles (p50, p95, p99) and assertion outcomes.

Environments, analysis, and bottleneck identification

Tests are first run in the development environment for functional validation of scenarios and for relative before/after comparisons (e.g., adding an index on a table), and then in UAT, where the hardware configuration is closer to production and absolute numbers carry weight. When geographically distributed load needs to be simulated, Gatling integrates with platforms like BlazeMeter, which allows launching the same simulation from multiple regions (Europe, the United States, Australia…) and reading the output segmented by region of origin.

Gatling’s HTML report is the starting point, not the finish line. The real analysis comes from cross-referencing application logs, APMs, and observability dashboards (typically Grafana and Kibana): that’s where you identify the slowest SQL queries, saturated connection pools, or endpoints that exceed time limits. In the project mentioned above, for instance, the slot-search API turned out to be the most problematic — it concentrated most of the timeouts — and the MySQL database proved to be the main bottleneck, due to both indexing choices and the configuration of the connection pool.

Versioning and test data management

Scenarios are versioned in Git repositories, with one branch per application release: this ensures you always know which version of the tests corresponds to which version of the system under test. Data created by simulations (appointments, orders, various records) are usually negligible compared to the UAT database volume, but for more sensitive projects, it’s common practice to pair the tests with an external job — typically a Kubernetes job — that cleans up residual data before or after execution.

Today, tests are launched on specific occasions, usually in sessions coordinated with the other teams. Integration with the CI/CD pipeline for automated nightly runs, with response-time thresholds as pass/fail criteria, is the natural next step: technically, everything is already in place; it’s an organizational choice.


Conclusions

Adopting Gatling shifts when you deal with performance: no longer a one-off check before release, but something that can run on every push thanks to CI/CD integration. Performance regressions show up immediately, while they’re still cheap to fix.

The value we add at Bitrock isn’t in writing the scripts — that’s the easy part. It’s in defining scenarios that actually reflect how real users behave (not what we think they do), in tracking down the cause of non-obvious bottlenecks — saturated thread pools, queries whose execution plans change under load, exhausted DB connections — and in pointing out where it makes sense to invest in infrastructure and where a code change is enough.


Main Author: Federico Vidali, Software Engineer @ Bitrock

Do you want to know more about our services? Fill in the form and schedule a meeting with our team!