Linux NVIDIA GPU linpack 测试
环境
操作系统信息
lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.3 LTS
Release: 22.04
Codename: jammy
cpu 信息
lscpu
Architecture: x86_64CPU op-mode(s): 32-bit, 64-bitAddress sizes: 43 bits physical, 48 bits virtualByte Order: Little Endian
CPU(s): 64On-line CPU(s) list: 0-63
Vendor ID: AuthenticAMDModel name: AMD EPYC 7542 32-Core ProcessorCPU family: 23Model: 49Thread(s) per core: 2Core(s) per socket: 32Socket(s): 1Stepping: 0Frequency boost: enabledCPU max MHz: 2900.0000CPU min MHz: 1500.0000BogoMIPS: 5800.17Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitxcpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es
Virtualization features:Virtualization: AMD-V
Caches (sum of all):L1d: 1 MiB (32 instances)L1i: 1 MiB (32 instances)L2: 16 MiB (32 instances)L3: 128 MiB (8 instances)
NUMA:NUMA node(s): 1NUMA node0 CPU(s): 0-63
Vulnerabilities:Gather data sampling: Not affectedItlb multihit: Not affectedL1tf: Not affectedMds: Not affectedMeltdown: Not affectedMmio stale data: Not affectedRetbleed: Mitigation; untrained return thunk; SMT enabled with STIBP protectionSpec rstack overflow: Mitigation; safe RETSpec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccompSpectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitizationSpectre v2: Mitigation; Retpolines, IBPB conditional, STIBP always-on, RSB filling, PBRSB-eIBRS Not affectedSrbds: Not affectedTsx async abort: Not affected
安装 cuda 和 nvidia-driver
说明:安装 cuda的时候会把 nvidia driver 也安装上,所以不需要单独安装 nvidia driver。
安装说明参见 NVIDIA 官网
具体选择参考自己的操作系统。本文使用的ubuntu 22.04 操作系统,所以选择如下,另外 Installer Type 根据自己喜好,本文选择的是 deb(local)
所有安装步骤如下
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.6.3/local_installers/cuda-repo-ubuntu2204-12-6-local_12.6.3-560.35.05-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-6-local_12.6.3-560.35.05-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-6-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-6sudo apt-get install -y nvidia-open
sudo apt-get install -y cuda-drivers
验证
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Oct_29_23:50:19_PDT_2024
Cuda compilation tools, release 12.6, V12.6.85
Build cuda_12.6.r12.6/compiler.35059454_0
nvidia-smi
Mon Dec 16 11:32:12 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01 Driver Version: 565.57.01 CUDA Version: 12.7 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA TITAN V Off | 00000000:41:00.0 Off | N/A |
| 29% 43C P8 27W / 250W | 1MiB / 12288MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA TITAN V Off | 00000000:42:00.0 Off | N/A |
| 28% 38C P8 27W / 250W | 1MiB / 12288MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------++-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
安装 Intel® oneAPI Math Kernel Library (oneMKL)
参考官网
选择 Download
选择 Linux — Offline Installer — APT 然后执行Prerequisites for First-Time Users 和 Install with APT 下的命令
所有的安装步骤如下
sudo apt update
sudo apt install -y gpg-agent wget
wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
sudo apt update
sudo apt install intel-oneapi-mkl
sudo apt install intel-oneapi-mkl-devel
cat > /etc/ld.so.conf.d/intel-mkl.conf << EOF
/opt/intel/oneapi/mkl/2025.0/lib/intel64/
/opt/intel/oneapi/mkl/2025.0/lib/
/opt/intel/oneapi/compiler/2025.0/lib/
EOF
ldconfig -v
安装 openmpi
安装包见官网
本文选择的是2.1版本的openmpi,下载见官网
所有安装命令
#本文直接使用wget命令下载安装包
wget https://download.open-mpi.org/release/open-mpi/v2.1/openmpi-2.1.6.tar.gz
./configure --prefix=/usr/local/openmpi
make
#如果你想加速安装,可以使用 make -j [cpu 核心数],比如 make -j 64
make install
设置环境变量
MPI_HOME=/usr/local/openmpi
export PATH=${MPI_HOME}/bin:$PATH
export LD_LIBRARY_PATH=${MPI_HOME}/lib:$LD_LIBRARY_PATH
export MANPATH=${MPI_HOME}/share/man:$MANPATH
测试
cd openmpi-2.1.6/examples/
make
mpirun --allow-run-as-root -np 4 hello_c
输出
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port. As such, the openib BTL (OpenFabrics
support) will be disabled for this port.Local host: titan-v1Local device: mlx5_0Local port: 1CPCs attempted: rdmacm, udcm
--------------------------------------------------------------------------
Hello, world, I am 2 of 4, (Open MPI v2.1.6, package: Open MPI root@titan-v1 Distribution, ident: 2.1.6, repo rev: v2.1.5-29-gd9b9e59, Nov 28, 2018, 120)
Hello, world, I am 1 of 4, (Open MPI v2.1.6, package: Open MPI root@titan-v1 Distribution, ident: 2.1.6, repo rev: v2.1.5-29-gd9b9e59, Nov 28, 2018, 120)
Hello, world, I am 0 of 4, (Open MPI v2.1.6, package: Open MPI root@titan-v1 Distribution, ident: 2.1.6, repo rev: v2.1.5-29-gd9b9e59, Nov 28, 2018, 120)
Hello, world, I am 3 of 4, (Open MPI v2.1.6, package: Open MPI root@titan-v1 Distribution, ident: 2.1.6, repo rev: v2.1.5-29-gd9b9e59, Nov 28, 2018, 120)
[titan-v1:309715] 7 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port
[titan-v1:309715] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
安装 NVIDIA HPL
安装包见官网
通过网盘分享的文件:hpl-2.0_FERMI_v15.tgz
链接: https://pan.baidu.com/s/16lGx0uOX_IXzxalkfdjAOg?pwd=62pc 提取码: 62pc
全部安装命令
将安装包放到 /opt/ 目录下
tar zxvf hpl-2.0_FERMI_v15.tgz
cd hpl-2.0_FERMI_v15
修改 Make.CUDA
TOPdir = /opt/hpl-2.0_FERMI_v15 #hpl所在的路径目录MPdir = /usr/local/openmpi #openmpi的安装目录
MPinc = -I$(MPdir)/include
MPlib = $(MPdir)/lib/libmpi.soLAdir = /opt/intel/oneapi/mkl/2025.0/lib/intel64/ #MKL库文件目录
LAlib = -L $(TOPdir)/src/cuda -ldgemm -L/usr/local/cuda/lib64 -lcuda -lcudart -lcublas -L$(LAdir) -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -L/opt/intel/oneapi/compiler/2025.0/lib/ # 这行最后增加了 -L/opt/intel/oneapi/compiler/2025.0/lib/
CC = /usr/local/openmpi/bin/mpicc
编译安装
make arch=CUDA clean_arch_all
make arch=CUDA
修改后的 Make.CUDA 文件如下
cat Make.CUDA
#
# This is just a sample Make.
# The user may need to edit:
# 1.) TOPdir
# 2.) MPI variables (MPdir,MPinc,MPlib)
# 3.) MKL BLAS variables (LAdir, LAinc, LAlib)
# 4.) The Compiler and Compiler/Linker Options (CC,CCFLAGS)
##
# -- High Performance Computing Linpack Benchmark (HPL)
# HPL - 1.0a - January 20, 2004
# Antoine P. Petitet
# University of Tennessee, Knoxville
# Innovative Computing Laboratories
# (C) Copyright 2000-2004 All Rights Reserved
#
# -- Copyright notice and Licensing terms:
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions, and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# 3. All advertising materials mentioning features or use of this
# software must display the following acknowledgement:
# This product