其他
百度C++工程师的那些极限优化(并发篇)
The following article is from 百度Geek说 Author 凯文神父
二、为什么我们需要并发?
三、单线程中的并行执行
3.1 SIMD
3.2 OoOE
3.3 TMAM
int array[1024];
for (size_t i = 0; i < 1024; i += 2) {
int a = array[i];
int b = array[i + 1];
for (size_t j = 0; j < 1024; ++j) {
a = a + b;
b = a + b;}
array[i] = a;
array[i + 1] = b;
}
int array[1024];
for (size_t i = 0; i < 1024; i += 4) {
int a = array[i];
int b = array[i + 1];
int c = array[i + 2];
int d = array[i + 3];
for (size_t j = 0; j < 1024; ++j) {
a = a + b;
b = a + b;
c = c + d;
d = c + d;
}
array[i] = a;
array[i + 1] = b;
array[i + 2] = c;
array[i + 3] = d;
}
struct Line {
char data[64];
};
Line* lines[1024]; // 其中乱序存放多个缓存行
for (size_t i = 0; i < 1024; ++i) {
Line* line = lines[i];
for (size_t j = 0; j < 64; ++j) {
line->data[j] += j;
}
}
for (size_t i = 0; i < 1024; i += 2) {
Line* line1 = lines[i];
Line* line2 = lines[i + 1];
...
for (size_t j = 0; j < 64; ++j) {
line1->data[j] += j;
line2->data[j] += j;
...
}
}
3.4 总结一下单线程并发
四、多线程并发中的临界区保护
4.1 什么是临界区
4.1.1 Mutual Exclusion
4.1.2 Lock Free
4.1.3 Wait-Free
4.2 无锁不是万能的
// 在一个cache line上进行指定步长的斐波那契计算来模拟临界区计算负载
uint64_t calc(uint64_t* sequence, size_t size) {
size_t i;
for (i = 0; i < size; ++i) {
sequence[(i + 1) & 7] += sequence[i & 7];
}
return sequence[i & 7];
}
{ // Mutual Exclusion
::std::lock_guard<::std::mutex> lock(mutex);
sum += calc(sequence, workload);
}
{ // Lock Free / Atomic CAS
auto current = atomic_sum.load(::std::memory_order_relaxed);
auto next = current;
do {
next = current + calc(sequence, workload);
} while (!atomic_sum.compare_exchange_weak(
current, next, ::std::memory_order_relaxed));
}
{ // Wait Free / Atomic Modify
atomic_sum.fetch_add(calc(sequence, workload), ::std::memory_order_relaxed);
}
4.3 并发计数器优化案例
4.4 并发队列优化案例
- EOF -
关注『CPP开发者』
看精选C++技术文章 . 加C++开发者专属圈子
点赞和在看就是最大的支持❤️