Let’s build a single, clean demo project that you can use in training as:
- Live demo
- Hands-on lab
- Reference code kit
It will show:
- Micro-benchmarks with BenchmarkDotNet
- LINQ vs loops, Span vs array
- Allocations & GC impact
- Time complexity in action (O(N), O(N²))
- Metrics analysis (Mean, Error, StdDev, Allocated, Gen0, etc.)
- Async/await overhead and behavior
Let’s build a single, clean demo project that you can use in training as:
- Live demo
- Hands-on lab
- Reference code kit
It will show:
- Micro-benchmarks with BenchmarkDotNet
- LINQ vs loops, Span vs array
- Allocations & GC impact
- Time complexity in action (O(N), O(N²))
- Metrics analysis (Mean, Error, StdDev, Allocated, Gen0, etc.)
- Async/await overhead and behavior
0. Lab Goals & Target Framework (.NET 10)
We’ll structure the project so it works with current .NET (8/9) and is ready for .NET 10.
- Target Framework Moniker for .NET 10 will almost certainly be:
net10.0. - Until .NET 10 SDK is available on your machine, you can temporarily use
net8.0ornet9.0. - All code is “future-safe”: no APIs that should break in .NET 10.
1. Step 1 – Create the Project
dotnet new console -n BenchmarkDotNetLab
cd BenchmarkDotNetLab
Code language: JavaScript (javascript)
Open the .csproj and set TargetFramework:
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<!-- For current SDKs, use net8.0 or net9.0 and later switch to net10.0 -->
<TargetFramework>net10.0</TargetFramework>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
</PropertyGroup>
</Project>
Code language: HTML, XML (xml)
If your SDK doesn’t yet support
net10.0, swap tonet8.0/net9.0for now. Everything else stays the same.
2. Step 2 – Add BenchmarkDotNet
Install the package:
dotnet add package BenchmarkDotNet
(This will pull the latest version, which supports modern .NET.)
3. Step 3 – Setup Benchmark Entry Point
We’ll use BenchmarkSwitcher so we can run all benchmarks or filter by class from the command line.
Program.cs
using BenchmarkDotNet.Running;
using System;
namespace BenchmarkDotNetLab
{
public class Program
{
public static void Main(string[] args)
{
// This lets you run: dotnet run -c Release -- --filter *AlgoBenchmarks*
var switcher = BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly);
switcher.Run(args);
}
}
}
Code language: JavaScript (javascript)
4. Step 4 – Micro-Benchmarks + LINQ vs Loops + Span vs Array
Create a new file AlgoBenchmarks.cs:
How to run this lab
dotnet run -c Release -- --filter *AlgoBenchmarks*
You’ll see output like:
| Method | Mean | Error | StdDev | Gen0 | Allocated |
|---------------------------|-----------|---------|--------|------|----------|
| Sum_ForLoop (Baseline) | X ns | ... | ... | 0.0 | 0 B |
| Sum_Linq | Y ns | ... | ... | 0.1 | 240 B |
| SumEven_ArrayFor | ... | | | | 0 B |
| SumEven_SpanFor | ... | | | | 0 B |
| Contains_Linq | ... | | | | 224 B |
| BinarySearch_Array | ... | | | | 0 B |
Code language: JavaScript (javascript)
How to understand during training
- Mean: average time per operation (lower = faster).
- Allocated: bytes allocated per operation.
- Show how:
- LINQ often allocates (closures, enumerators).
- Raw loops/Span can be faster and allocate 0 bytes.
BinarySearchhas better time complexity thanContainsfor large N.
5. Step 5 – Time Complexity Lab (O(N) vs O(N²))
Create ComplexityBenchmarks.cs:
How to run this lab
dotnet run -c Release -- --filter *ComplexityBenchmarks*
You’ll see rows for each value of N:
| Method | N | Mean |
|----------------------------|------|--------|
| LinearScan (Baseline) | 100 | A ns |
| Quadratic_ParityPairs | 100 | B ns |
| LinearScan (Baseline) | 500 | C ns |
| Quadratic_ParityPairs | 500 | D ns |
| LinearScan (Baseline) | 1000 | E ns |
| Quadratic_ParityPairs | 1000 | F ns |
How to talk about Time Complexity Equation
Pick LinearScan:
- For N=100 → Mean ≈ t₁
- For N=500 → Mean ≈ t₂
- For N=1000 → Mean ≈ t₃
Explain:
- If algorithm is O(N), time grows roughly in proportion to N – we can approximate:
T(N) ≈ k * N - Ratio example:
- N from 100 → 500 (×5), time ~×5
- N from 500 → 1000 (×2), time ~×2
Then Quadratic:
- O(N²) behaves more like:
T(N) ≈ k * N² - If N ×2 ⇒ time roughly ×4
- Ask participants to compute approximate k:
k ≈ T(N) / N²
This makes Big-O real and visible.
6. Step 6 – Allocations & GC Impact Lab
We already have [MemoryDiagnoser] on all classes, but let’s add a dedicated benchmark showing very heavy allocations vs optimized.
Create AllocationBenchmarks.cs:
How to run this lab
dotnet run -c Release -- --filter *AllocationBenchmarks*
Watch the Allocated and Gen0 columns:
| Method | N | Mean | Gen0 | Allocated |
|--------------------------------------|------|--------|--------|-----------|
| StringConcat_PlusOperator (Baseline) | 1000 | X µs | Y | Z KB |
| StringConcat_StringBuilder | 1000 | A µs | B | C KB |
How to interpret during training?
- Allocated: total bytes allocated per operation.
- Gen0: approximate number of Gen0 GCs per operation.
- Show how:
+concatenation allocates many intermediate strings → more allocation → more GC → slower.StringBuilderminimizes allocations → less GC → better throughput.
Connect to application performance:
- High allocations → more GC → pauses → higher latency in web APIs, services, batch jobs.
- Optimizing hot paths can drastically reduce GC overhead.
7. Step 7 – Async/Await Deep Dive Lab
Create AsyncBenchmarks.cs:
We’ll compare:
- Sync CPU-bound code
- CPU-bound wrapped in
Task.Run(bad practice) - “Fake I/O” with
Task.Delayto show async cost vs benefits - ValueTask vs Task
How to run this lab
dotnet run -c Release -- --filter *AsyncBenchmarks*
Interpretation:
Fibonacci_Syncshould be fastest and no allocations (if no closures).Fibonacci_TaskRun:- Higher Mean (async overhead + scheduling).
- Non-zero Allocated (Task object, state machine, closure).
SimulatedIo_TaskvsSimulatedIo_ValueTask:- Similar latency (because of
Task.Delay(10)), butValueTaskmay have fewer allocations.
- Similar latency (because of
How to explain Async/Await deep dive
Key teaching points:
- Async is not “faster”; it helps scale I/O-bound workloads by freeing threads.
- For CPU-bound work, adding
Task.Runintroduces overhead without benefit. - Each
asyncmethod:- Compiles to a state machine
- May allocate
Task/Stateobjects
ValueTaskcan reduce allocations in high-throughput paths—but must be used carefully.
Connect to real apps:
- CPU-bound APIs: prefer synchronous code or dedicated worker threads.
- I/O-bound APIs: async is necessary to scale (database, HTTP, file I/O).
- Over-using async in very tight loops can harm performance.
8. Step 8 – Running Specific Labs in Training
You can now run each part independently during training:
- Micro + LINQ vs loops + Span
dotnet run -c Release -- --filter *AlgoBenchmarks* - Time Complexity O(N) vs O(N²)
dotnet run -c Release -- --filter *ComplexityBenchmarks* - Allocations & GC
dotnet run -c Release -- --filter *AllocationBenchmarks* - Async/Await Deep Dive
dotnet run -c Release -- --filter *AsyncBenchmarks*
Or run everything:
dotnet run -c Release
9. How to Read BenchmarkDotNet Metrics (For Students)
In each summary table, focus on:
- Mean
Average time per operation. Main metric for latency. - Error / StdDev
How “noisy” the measurement is.- High StdDev → unstable environment (CPU throttling, background processes).
- Gen0/Gen1/Gen2
Approx GCs per 1,000 operations.- More frequent GCs → more pauses → potential latency spikes.
- Allocated
Bytes allocated per operation.- One of the most important metrics for high-throughput systems (web APIs, microservices).
- Reducing allocated bytes reduces GC overhead and CPU usage.
Tie everything back to:
- Latency (how fast a single request completes)
- Throughput (how many requests/second)
- GC Pressure (how much CPU time is lost to GC)
- Scalability (how well the app handles larger workloads or N)
10. Do’s and Don’ts for This Lab (and Real Life)
✅ DOs
- ✅ Run benchmarks using Release configuration:
dotnet run -c Release - ✅ Close other heavy apps (browsers, VMs) while benchmarking.
- ✅ Use
[MemoryDiagnoser]on all training benchmarks. - ✅ Use
[GlobalSetup]for creating test data – don’t measure setup. - ✅ Use
[Params]to show time complexity behavior for different N. - ✅ Compare a baseline method against alternatives (
Baseline = true). - ✅ Explain metrics (Mean, Allocated, Gen0) every time you show a table.
- ✅ Emphasize that benchmarks measure micro performance, not full system behavior.
❌ DON’Ts
- ❌ Don’t run benchmarks in Debug mode.
(JIT optimizations are disabled → meaningless results.) - ❌ Don’t benchmark I/O to real network or disk in training demos
(noise from the environment will dominate). - ❌ Don’t include
Console.WriteLineinside[Benchmark]methods
(I/O destroys timings). - ❌ Don’t allocate large objects in
[GlobalSetup]for each benchmark run; use fields. - ❌ Don’t assume async = faster. Show
Task.Runoverhead in the lab. - ❌ Don’t trust a single run; mention that BDN uses multiple iterations + statistics.
I’m a DevOps/SRE/DevSecOps/Cloud Expert passionate about sharing knowledge and experiences. I have worked at Cotocus. I share tech blog at DevOps School, travel stories at Holiday Landmark, stock market tips at Stocks Mantra, health and fitness guidance at My Medic Plus, product reviews at TrueReviewNow , and SEO strategies at Wizbrand.
Do you want to learn Quantum Computing?
Please find my social handles as below;
Rajesh Kumar Personal Website
Rajesh Kumar at YOUTUBE
Rajesh Kumar at INSTAGRAM
Rajesh Kumar at X
Rajesh Kumar at FACEBOOK
Rajesh Kumar at LINKEDIN
Rajesh Kumar at WIZBRAND