当前位置: 首页 > news >正文

windows C++ 并行编程-编写parallel_for 循环

示例:计算两个矩阵的乘积

以下示例显示了 matrix_multiply 函数,可计算两个方阵的乘积。

// Computes the product of two square matrices.
void matrix_multiply(double** m1, double** m2, double** result, size_t size)
{for (size_t i = 0; i < size; i++) {for (size_t j = 0; j < size; j++){double temp = 0;for (int k = 0; k < size; k++){temp += m1[i][k] * m2[k][j];}result[i][j] = temp;}}
}
示例:并行计算矩阵乘法

以下示例演示了该 parallel_matrix_multiply 函数,该函数使用 parallel_for 算法并行执行外部循环。

// Computes the product of two square matrices in parallel.
void parallel_matrix_multiply(double** m1, double** m2, double** result, size_t size)
{parallel_for (size_t(0), size, [&](size_t i){for (size_t j = 0; j < size; j++){double temp = 0;for (int k = 0; k < size; k++){temp += m1[i][k] * m2[k][j];}result[i][j] = temp;}});
}

此示例并行化外部循环只是因为它执行了足够的工作,可从并行处理的开销中受益。 如果将内部循环并行化,你将不会获得性能提升,因为内部循环执行的少量工作并不能克服并行处理的开销问题。 因此,仅并行化外部循环是在大多数系统上最大程度地发挥并发优势的最佳方式。

 示例:完成的 parallel_for 循环代码示例

以下更完整的示例比较了 matrix_multiply 函数与 parallel_matrix_multiply 函数的性能。

// parallel-matrix-multiply.cpp
// compile with: /EHsc
#include <windows.h>
#include <ppl.h>
#include <iostream>
#include <random>using namespace concurrency;
using namespace std;// Calls the provided work function and returns the number of milliseconds 
// that it takes to call that function.
template <class Function>
__int64 time_call(Function&& f)
{__int64 begin = GetTickCount();f();return GetTickCount() - begin;
}// Creates a square matrix with the given number of rows and columns.
double** create_matrix(size_t size);// Frees the memory that was allocated for the given square matrix.
void destroy_matrix(double** m, size_t size);// Initializes the given square matrix with values that are generated
// by the given generator function.
template <class Generator>
double** initialize_matrix(double** m, size_t size, Generator& gen);// Computes the product of two square matrices.
void matrix_multiply(double** m1, double** m2, double** result, size_t size)
{for (size_t i = 0; i < size; i++) {for (size_t j = 0; j < size; j++){double temp = 0;for (int k = 0; k < size; k++){temp += m1[i][k] * m2[k][j];}result[i][j] = temp;}}
}// Computes the product of two square matrices in parallel.
void parallel_matrix_multiply(double** m1, double** m2, double** result, size_t size)
{parallel_for (size_t(0), size, [&](size_t i){for (size_t j = 0; j < size; j++){double temp = 0;for (int k = 0; k < size; k++){temp += m1[i][k] * m2[k][j];}result[i][j] = temp;}});
}int wmain()
{// The number of rows and columns in each matrix.// TODO: Change this value to experiment with serial // versus parallel performance. const size_t size = 750;// Create a random number generator.mt19937 gen(42);// Create and initialize the input matrices and the matrix that// holds the result.double** m1 = initialize_matrix(create_matrix(size), size, gen);double** m2 = initialize_matrix(create_matrix(size), size, gen);double** result = create_matrix(size);// Print to the console the time it takes to multiply the // matrices serially.wcout << L"serial: " << time_call([&] {matrix_multiply(m1, m2, result, size);}) << endl;// Print to the console the time it takes to multiply the // matrices in parallel.wcout << L"parallel: " << time_call([&] {parallel_matrix_multiply(m1, m2, result, size);}) << endl;// Free the memory that was allocated for the matrices.destroy_matrix(m1, size);destroy_matrix(m2, size);destroy_matrix(result, size);
}// Creates a square matrix with the given number of rows and columns.
double** create_matrix(size_t size)
{double** m = new double*[size];for (size_t i = 0; i < size; ++i){m[i] = new double[size];}return m;
}// Frees the memory that was allocated for the given square matrix.
void destroy_matrix(double** m, size_t size)
{for (size_t i = 0; i < size; ++i){delete[] m[i];}delete m;
}// Initializes the given square matrix with values that are generated
// by the given generator function.
template <class Generator>
double** initialize_matrix(double** m, size_t size, Generator& gen)
{for (size_t i = 0; i < size; ++i){for (size_t j = 0; j < size; ++j){m[i][j] = static_cast<double>(gen());}}return m;
}

四核上的输出如下:

serial: 3853
parallel: 1311

http://www.mrgr.cn/news/26402.html

相关文章:

  • Hugging Face NLP课程学习记录 - 0. 安装transformers库 1. Transformer 模型
  • Day 11-12:查找
  • 大模型LLM常见下载方式
  • 航空航司reese84逆向
  • Agent实战——使用 Dify 和 Moonshot API 构建 AI 工作流
  • 计算机毕业设计Python+Tensorflow股票推荐系统 股票预测系统 股票可视化 股票数据分析 量化交易系统 股票爬虫 股票K线图 大数据毕业设计 AI
  • [Golang] Channel
  • 三星ZFlip5/ZFlip4/W7024刷安卓14国行OneUI6.1系统-高级设置-韩/欧/港版
  • 【计网】从零开始使用UDP进行socket编程 --- 客户端与服务端的通信实现
  • Android 测试机
  • 链表之判空,删除
  • Android 12.0 Launcher修改density禁止布局改变功能实现
  • 微信小程序实现转盘抽奖,可以自定义编辑奖项列表
  • mysql学习教程,从入门到精通,SQL IN BETWEEN 运算符(13)
  • 51单片机快速入门之独立按键
  • Linux下抓包分析Java应用程序HTTP接口调用:基于tcpdump与Wireshark的综合示例
  • 吃透高频考点:Android中的ANR问题及其解决策略万字教程
  • 前端开发第三节课
  • 了解计算机安全性【技术、管理与法律】
  • shell 循环语句总结