BenchmarkDotNet: DOTNET Lab & Demo

Let’s build a single, clean demo project that you can use in training as:

Live demo
Hands-on lab
Reference code kit

It will show:

Micro-benchmarks with BenchmarkDotNet
LINQ vs loops, Span vs array
Allocations & GC impact
Time complexity in action (O(N), O(N²))
Metrics analysis (Mean, Error, StdDev, Allocated, Gen0, etc.)
Async/await overhead and behavior

Let’s build a single, clean demo project that you can use in training as:

Live demo
Hands-on lab
Reference code kit

It will show:

Micro-benchmarks with BenchmarkDotNet
LINQ vs loops, Span vs array
Allocations & GC impact
Time complexity in action (O(N), O(N²))
Metrics analysis (Mean, Error, StdDev, Allocated, Gen0, etc.)
Async/await overhead and behavior

0. Lab Goals & Target Framework (.NET 10)

We’ll structure the project so it works with current .NET (8/9) and is ready for .NET 10.

Target Framework Moniker for .NET 10 will almost certainly be: net10.0.
Until .NET 10 SDK is available on your machine, you can temporarily use net8.0 or net9.0.
All code is “future-safe”: no APIs that should break in .NET 10.

1. Step 1 – Create the Project

dotnet new console -n BenchmarkDotNetLab
cd BenchmarkDotNetLab
Code language: JavaScript (javascript)

Open the .csproj and set TargetFramework:

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <!-- For current SDKs, use net8.0 or net9.0 and later switch to net10.0 -->
    <TargetFramework>net10.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
  </PropertyGroup>
</Project>
Code language: HTML, XML (xml)

If your SDK doesn’t yet support net10.0, swap to net8.0/net9.0 for now. Everything else stays the same.

2. Step 2 – Add BenchmarkDotNet

Install the package:

dotnet add package BenchmarkDotNet

(This will pull the latest version, which supports modern .NET.)

3. Step 3 – Setup Benchmark Entry Point

We’ll use BenchmarkSwitcher so we can run all benchmarks or filter by class from the command line.

Program.cs

using BenchmarkDotNet.Running;
using System;

namespace BenchmarkDotNetLab
{
    public class Program
    {
        public static void Main(string[] args)
        {
            // This lets you run: dotnet run -c Release -- --filter *AlgoBenchmarks*
            var switcher = BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly);
            switcher.Run(args);
        }
    }
}
Code language: JavaScript (javascript)

4. Step 4 – Micro-Benchmarks + LINQ vs Loops + Span vs Array

Create a new file AlgoBenchmarks.cs:

How to run this lab

dotnet run -c Release -- --filter *AlgoBenchmarks*

You’ll see output like:

|              Method       | Mean      | Error   | StdDev | Gen0  | Allocated |
|---------------------------|-----------|---------|--------|------|----------|
| Sum_ForLoop (Baseline)    |  X ns     |   ...   |  ...   |  0.0 |      0 B |
| Sum_Linq                  |  Y ns     |   ...   |  ...   |  0.1 |   240 B  |
| SumEven_ArrayFor          |  ...      |         |        |      |      0 B |
| SumEven_SpanFor           |  ...      |         |        |      |      0 B |
| Contains_Linq             |  ...      |         |        |      |   224 B  |
| BinarySearch_Array        |  ...      |         |        |      |      0 B |
Code language: JavaScript (javascript)

How to understand during training

Mean: average time per operation (lower = faster).
Allocated: bytes allocated per operation.
Show how:
- LINQ often allocates (closures, enumerators).
- Raw loops/Span can be faster and allocate 0 bytes.
- BinarySearch has better time complexity than Contains for large N.

5. Step 5 – Time Complexity Lab (O(N) vs O(N²))

Create ComplexityBenchmarks.cs:

How to run this lab

dotnet run -c Release -- --filter *ComplexityBenchmarks*

You’ll see rows for each value of N:

|            Method          | N    | Mean   |
|----------------------------|------|--------|
| LinearScan (Baseline)      | 100  |  A ns  |
| Quadratic_ParityPairs      | 100  |  B ns  |
| LinearScan (Baseline)      | 500  |  C ns  |
| Quadratic_ParityPairs      | 500  |  D ns  |
| LinearScan (Baseline)      | 1000 |  E ns  |
| Quadratic_ParityPairs      | 1000 |  F ns  |

How to talk about Time Complexity Equation

Pick LinearScan:

For N=100 → Mean ≈ t₁
For N=500 → Mean ≈ t₂
For N=1000 → Mean ≈ t₃

Explain:

If algorithm is O(N), time grows roughly in proportion to N – we can approximate:
T(N) ≈ k * N
Ratio example:
- N from 100 → 500 (×5), time ~×5
- N from 500 → 1000 (×2), time ~×2

Then Quadratic:

O(N²) behaves more like: T(N) ≈ k * N²
If N ×2 ⇒ time roughly ×4
Ask participants to compute approximate k:
- k ≈ T(N) / N²

This makes Big-O real and visible.

6. Step 6 – Allocations & GC Impact Lab

We already have [MemoryDiagnoser] on all classes, but let’s add a dedicated benchmark showing very heavy allocations vs optimized.

Create AllocationBenchmarks.cs:

How to run this lab

dotnet run -c Release -- --filter *AllocationBenchmarks*

Watch the Allocated and Gen0 columns:

|                  Method              | N    | Mean   | Gen0   | Allocated |
|--------------------------------------|------|--------|--------|-----------|
| StringConcat_PlusOperator (Baseline) | 1000 |  X µs  |  Y     |  Z KB     |
| StringConcat_StringBuilder           | 1000 |  A µs  |  B     |  C KB     |

How to interpret during training?

Allocated: total bytes allocated per operation.
Gen0: approximate number of Gen0 GCs per operation.
Show how:
- + concatenation allocates many intermediate strings → more allocation → more GC → slower.
- StringBuilder minimizes allocations → less GC → better throughput.

Connect to application performance:

High allocations → more GC → pauses → higher latency in web APIs, services, batch jobs.
Optimizing hot paths can drastically reduce GC overhead.

7. Step 7 – Async/Await Deep Dive Lab

Create AsyncBenchmarks.cs:

We’ll compare:

Sync CPU-bound code
CPU-bound wrapped in Task.Run (bad practice)
“Fake I/O” with Task.Delay to show async cost vs benefits
ValueTask vs Task

How to run this lab

dotnet run -c Release -- --filter *AsyncBenchmarks*

Interpretation:

Fibonacci_Sync should be fastest and no allocations (if no closures).
Fibonacci_TaskRun:
- Higher Mean (async overhead + scheduling).
- Non-zero Allocated (Task object, state machine, closure).
SimulatedIo_Task vs SimulatedIo_ValueTask:
- Similar latency (because of Task.Delay(10)), but ValueTask may have fewer allocations.

How to explain Async/Await deep dive

Key teaching points:

Async is not “faster”; it helps scale I/O-bound workloads by freeing threads.
For CPU-bound work, adding Task.Run introduces overhead without benefit.
Each async method:
- Compiles to a state machine
- May allocate Task / State objects
ValueTask can reduce allocations in high-throughput paths—but must be used carefully.

Connect to real apps:

CPU-bound APIs: prefer synchronous code or dedicated worker threads.
I/O-bound APIs: async is necessary to scale (database, HTTP, file I/O).
Over-using async in very tight loops can harm performance.

8. Step 8 – Running Specific Labs in Training

You can now run each part independently during training:

Micro + LINQ vs loops + Span dotnet run -c Release -- --filter *AlgoBenchmarks*
Time Complexity O(N) vs O(N²) dotnet run -c Release -- --filter *ComplexityBenchmarks*
Allocations & GC dotnet run -c Release -- --filter *AllocationBenchmarks*
Async/Await Deep Dive dotnet run -c Release -- --filter *AsyncBenchmarks*

Or run everything:

dotnet run -c Release

9. How to Read BenchmarkDotNet Metrics (For Students)

In each summary table, focus on:

Mean
Average time per operation. Main metric for latency.
Error / StdDev
How “noisy” the measurement is.
- High StdDev → unstable environment (CPU throttling, background processes).
Gen0/Gen1/Gen2
Approx GCs per 1,000 operations.
- More frequent GCs → more pauses → potential latency spikes.
Allocated
Bytes allocated per operation.
- One of the most important metrics for high-throughput systems (web APIs, microservices).
- Reducing allocated bytes reduces GC overhead and CPU usage.

Tie everything back to:

Latency (how fast a single request completes)
Throughput (how many requests/second)
GC Pressure (how much CPU time is lost to GC)
Scalability (how well the app handles larger workloads or N)

10. Do’s and Don’ts for This Lab (and Real Life)

✅ DOs

✅ Run benchmarks using Release configuration: dotnet run -c Release
✅ Close other heavy apps (browsers, VMs) while benchmarking.
✅ Use [MemoryDiagnoser] on all training benchmarks.
✅ Use [GlobalSetup] for creating test data – don’t measure setup.
✅ Use [Params] to show time complexity behavior for different N.
✅ Compare a baseline method against alternatives (Baseline = true).
✅ Explain metrics (Mean, Allocated, Gen0) every time you show a table.
✅ Emphasize that benchmarks measure micro performance, not full system behavior.

❌ DON’Ts

❌ Don’t run benchmarks in Debug mode.
(JIT optimizations are disabled → meaningless results.)
❌ Don’t benchmark I/O to real network or disk in training demos
(noise from the environment will dominate).
❌ Don’t include Console.WriteLine inside [Benchmark] methods
(I/O destroys timings).
❌ Don’t allocate large objects in [GlobalSetup] for each benchmark run; use fields.
❌ Don’t assume async = faster. Show Task.Run overhead in the lab.
❌ Don’t trust a single run; mention that BDN uses multiple iterations + statistics.

Rajesh Kumar

I’m a DevOps/SRE/DevSecOps/Cloud Expert passionate about sharing knowledge and experiences. I have worked at Cotocus. I share tech blog at DevOps School, travel stories at Holiday Landmark, stock market tips at Stocks Mantra, health and fitness guidance at My Medic Plus, product reviews at TrueReviewNow , and SEO strategies at Wizbrand.

Do you want to learn Quantum Computing?

Please find my social handles as below;

Rajesh Kumar Personal Website
Rajesh Kumar at YOUTUBE
Rajesh Kumar at INSTAGRAM
Rajesh Kumar at X
Rajesh Kumar at FACEBOOK
Rajesh Kumar at LINKEDIN
Rajesh Kumar at WIZBRAND

Rajesh Kumar DailyLogs