当前位置：首页 > news >正文

高性能编程，C++的无锁和有锁编程方法的性能对比

news 2025/4/21 23:49:22

在这里插入图片描述

C++中的无锁编程（lock-free programming）是一种并发编程技术，它避免使用传统的互斥锁（如std::mutex）来保护共享资源。相反，它依赖于原子操作和内存屏障（memory barriers）来确保线程安全，同时保持高吞吐量和低延迟。
用一个例子来说明C++用锁来实现多线程共享和不用锁来实现的性能差异。
代码主要功能就是经典的多个线程对同一个共享变量加1的程序，源代码没有用lock的情况如下：

#include <atomic>
#include <thread>
#include <vector>
#include <iostream>std::atomic<int> counter(0);void increment_counter(int num_increments) {for (int i = 0; i < num_increments; ++i) {counter.fetch_add(1, std::memory_order_relaxed);}
}int main() {const int num_threads = 4;const int increments_per_thread = 1000;std::vector<std::thread> threads;for (int i = 0; i < num_threads; ++i) {threads.emplace_back(increment_counter, increments_per_thread);}for (auto& t : threads) {t.join();}std::cout << "Final counter value: " << counter.load(std::memory_order_relaxed) << '\n';return 0;
}

用Google Benchmark来进行基准测试，代码如下：

#include <benchmark/benchmark.h>
#include <atomic>
#include <vector>
#include <thread>// 定义一个静态函数BM_AtomicCounter，它是一个基准测试函数，
// 接受一个benchmark::State对象作为参数，用于控制和报告基准测试的状态。
static void BM_AtomicCounter(benchmark::State& state) {// 创建一个原子整数变量counter，并初始化为0。std::atomic保证了对这个变量的所有操作都是线程安全的。std::atomic<int> counter(0);// 定义一个lambda表达式increment_counter，它接受一个整数参数num_increments，// 表示要执行的递增次数。该lambda捕获了外部的counter变量（通过引用）。auto increment_counter = [&counter](int num_increments) {// 使用for循环执行指定次数的递增操作。for (int i = 0; i < num_increments; ++i) {// fetch_add是原子操作，它将counter增加1，并返回增加前的值。// std::memory_order_relaxed表示我们不需要任何同步或排序保证，只关心原子性。counter.fetch_add(1, std::memory_order_relaxed);}};// 定义线程的数量，这里是4个线程。可以根据需要调整此数值。const int num_threads = 4;// increments_per_thread表示每个线程将执行的递增次数。// state.range(0)从命令行参数获取，允许在不同范围内运行测试。const int increments_per_thread = state.range(0);// 进入基准测试的主循环，state会自动控制迭代次数。for (auto _ : state) {// 暂停计时器，因为我们正在重置状态，这部分时间不应该计入性能测量中。state.PauseTiming();// 将counter重置为0，以确保每次迭代开始时都处于相同状态。counter = 0;// 创建一个空的线程向量，用来存储创建的线程。std::vector<std::thread> threads;// 创建num_threads个线程，每个线程都将执行increment_counter lambda。// 每个线程都会尝试递增counter increments_per_thread次。for (int i = 0; i < num_threads; ++i) {threads.emplace_back(increment_counter, increments_per_thread);}// 恢复计时器，现在我们将开始测量性能。state.ResumeTiming();// 等待所有线程完成它们的工作。for (auto& t : threads) {t.join();}}// 设置已处理的项数，以便Google Benchmark可以计算吞吐量等指标。// 这里是总迭代次数乘以线程数再乘以每个线程的递增次数。state.SetItemsProcessed(state.iterations() * num_threads * increments_per_thread);
}// 注册BM_AtomicCounter函数为一个基准测试，并指定测试范围（最小值和最大值）。
// Range宏指定了测试时使用的参数范围，这里是从2^10到2^20。
BENCHMARK(BM_AtomicCounter)->Range(1 << 10, 1 << 20);// 调用BENCHMARK_MAIN()来运行所有注册的基准测试。
// 这个宏会展开成main函数，并且负责调用所有已注册的基准测试函数。
BENCHMARK_MAIN();

再实现一个用锁来进行互斥的例子，做同样的事情：

#include <benchmark/benchmark.h>
#include <mutex>
#include <vector>
#include <thread>static void BM_LockedCounter(benchmark::State& state) {int counter = 0;std::mutex mtx;auto increment_counter = [&]() {for (int i = 0; i < state.range(0); ++i) {std::lock_guard<std::mutex> lock(mtx);++counter;}};const int num_threads = 4;for (auto _ : state) {state.PauseTiming();counter = 0;std::vector<std::thread> threads;for (int i = 0; i < num_threads; ++i) {threads.emplace_back(increment_counter);}state.ResumeTiming();for (auto& t : threads) {t.join();}}// Ensure the final value is as expectedif (counter != num_threads * state.range(0)) {state.SkipWithError("Counter value mismatch");}
}// Register the function as a benchmark with ranges
BENCHMARK(BM_LockedCounter)->Range(1 << 10, 1 << 20);
BENCHMARK_MAIN();

上面的程序分别保存在不同的文件中，用如下命令可以编译：

g++ xxx.cpp -lbenchmark

执行两个程序，获得的运行结果如下：
没有锁的情况：

-----------------------------------------------------------------------------------
Benchmark                         Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------
BM_AtomicCounter/1024        218907 ns        25363 ns        29741 items_per_second=161.494M/s
BM_AtomicCounter/4096        283931 ns        22764 ns        30315 items_per_second=719.728M/s
BM_AtomicCounter/32768      2175233 ns        31289 ns        10000 items_per_second=4.18913G/s
BM_AtomicCounter/262144    16887237 ns        46045 ns         1000 items_per_second=22.7731G/s
BM_AtomicCounter/1048576   65236758 ns        61318 ns          100 items_per_second=68.4021G/s

有锁情况的

-------------------------------------------------------------------
Benchmark                         Time             CPU   Iterations
-------------------------------------------------------------------
BM_LockedCounter/1024        289778 ns        23980 ns        10000
BM_LockedCounter/4096       1188268 ns        25827 ns        10000
BM_LockedCounter/32768     10438569 ns        45650 ns         1000
BM_LockedCounter/262144    81558166 ns        61394 ns          100
BM_LockedCounter/1048576  315529881 ns       119023 ns           10