当前位置：首页 > news >正文

Linux NVIDIA GPU linpack 测试

news 2024/12/17 11:45:07

环境

操作系统信息

lsb_release -a

No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.3 LTS
Release:	22.04
Codename:	jammy

cpu 信息

lscpu

Architecture:            x86_64CPU op-mode(s):        32-bit, 64-bitAddress sizes:         43 bits physical, 48 bits virtualByte Order:            Little Endian
CPU(s):                  64On-line CPU(s) list:   0-63
Vendor ID:               AuthenticAMDModel name:            AMD EPYC 7542 32-Core ProcessorCPU family:          23Model:               49Thread(s) per core:  2Core(s) per socket:  32Socket(s):           1Stepping:            0Frequency boost:     enabledCPU max MHz:         2900.0000CPU min MHz:         1500.0000BogoMIPS:            5800.17Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitxcpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es
Virtualization features:Virtualization:        AMD-V
Caches (sum of all):L1d:                   1 MiB (32 instances)L1i:                   1 MiB (32 instances)L2:                    16 MiB (32 instances)L3:                    128 MiB (8 instances)
NUMA:NUMA node(s):          1NUMA node0 CPU(s):     0-63
Vulnerabilities:Gather data sampling:  Not affectedItlb multihit:         Not affectedL1tf:                  Not affectedMds:                   Not affectedMeltdown:              Not affectedMmio stale data:       Not affectedRetbleed:              Mitigation; untrained return thunk; SMT enabled with STIBP protectionSpec rstack overflow:  Mitigation; safe RETSpec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl and seccompSpectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitizationSpectre v2:            Mitigation; Retpolines, IBPB conditional, STIBP always-on, RSB filling, PBRSB-eIBRS Not affectedSrbds:                 Not affectedTsx async abort:       Not affected

安装 cuda 和 nvidia-driver

说明：安装 cuda的时候会把 nvidia driver 也安装上，所以不需要单独安装 nvidia driver。

安装说明参见 NVIDIA 官网
具体选择参考自己的操作系统。本文使用的ubuntu 22.04 操作系统，所以选择如下，另外 Installer Type 根据自己喜好，本文选择的是 deb（local)
在这里插入图片描述
所有安装步骤如下

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.6.3/local_installers/cuda-repo-ubuntu2204-12-6-local_12.6.3-560.35.05-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-6-local_12.6.3-560.35.05-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-6-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-6sudo apt-get install -y nvidia-open
sudo apt-get install -y cuda-drivers

验证

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Oct_29_23:50:19_PDT_2024
Cuda compilation tools, release 12.6, V12.6.85
Build cuda_12.6.r12.6/compiler.35059454_0

nvidia-smi
Mon Dec 16 11:32:12 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA TITAN V                 Off |   00000000:41:00.0 Off |                  N/A |
| 29%   43C    P8             27W /  250W |       1MiB /  12288MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA TITAN V                 Off |   00000000:42:00.0 Off |                  N/A |
| 28%   38C    P8             27W /  250W |       1MiB /  12288MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------++-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

安装 Intel® oneAPI Math Kernel Library (oneMKL)

参考官网
选择 Download
在这里插入图片描述
选择 Linux — Offline Installer — APT 然后执行Prerequisites for First-Time Users 和 Install with APT 下的命令

所有的安装步骤如下

sudo apt update
sudo apt install -y gpg-agent wget

wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null

echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list

sudo apt update

sudo apt install intel-oneapi-mkl
sudo apt install intel-oneapi-mkl-devel

cat > /etc/ld.so.conf.d/intel-mkl.conf << EOF
/opt/intel/oneapi/mkl/2025.0/lib/intel64/
/opt/intel/oneapi/mkl/2025.0/lib/
/opt/intel/oneapi/compiler/2025.0/lib/
EOF

ldconfig -v

安装 openmpi

安装包见官网
在这里插入图片描述

本文选择的是2.1版本的openmpi，下载见官网
所有安装命令

#本文直接使用wget命令下载安装包
wget https://download.open-mpi.org/release/open-mpi/v2.1/openmpi-2.1.6.tar.gz
./configure --prefix=/usr/local/openmpi
make
#如果你想加速安装，可以使用 make -j [cpu 核心数]，比如 make -j 64
make install

设置环境变量

MPI_HOME=/usr/local/openmpi
export PATH=${MPI_HOME}/bin:$PATH
export LD_LIBRARY_PATH=${MPI_HOME}/lib:$LD_LIBRARY_PATH
export MANPATH=${MPI_HOME}/share/man:$MANPATH

测试

cd openmpi-2.1.6/examples/
make

mpirun --allow-run-as-root -np 4 hello_c

输出

--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.Local host:           titan-v1Local device:         mlx5_0Local port:           1CPCs attempted:       rdmacm, udcm
--------------------------------------------------------------------------
Hello, world, I am 2 of 4, (Open MPI v2.1.6, package: Open MPI root@titan-v1 Distribution, ident: 2.1.6, repo rev: v2.1.5-29-gd9b9e59, Nov 28, 2018, 120)
Hello, world, I am 1 of 4, (Open MPI v2.1.6, package: Open MPI root@titan-v1 Distribution, ident: 2.1.6, repo rev: v2.1.5-29-gd9b9e59, Nov 28, 2018, 120)
Hello, world, I am 0 of 4, (Open MPI v2.1.6, package: Open MPI root@titan-v1 Distribution, ident: 2.1.6, repo rev: v2.1.5-29-gd9b9e59, Nov 28, 2018, 120)
Hello, world, I am 3 of 4, (Open MPI v2.1.6, package: Open MPI root@titan-v1 Distribution, ident: 2.1.6, repo rev: v2.1.5-29-gd9b9e59, Nov 28, 2018, 120)
[titan-v1:309715] 7 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port
[titan-v1:309715] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

安装 NVIDIA HPL

安装包见官网

通过网盘分享的文件：hpl-2.0_FERMI_v15.tgz
链接: https://pan.baidu.com/s/16lGx0uOX_IXzxalkfdjAOg?pwd=62pc 提取码: 62pc
全部安装命令

将安装包放到 /opt/ 目录下
tar zxvf hpl-2.0_FERMI_v15.tgz
cd hpl-2.0_FERMI_v15
修改 Make.CUDA
TOPdir = /opt/hpl-2.0_FERMI_v15 #hpl所在的路径目录MPdir  = /usr/local/openmpi         #openmpi的安装目录
MPinc = -I$(MPdir)/include
MPlib = $(MPdir)/lib/libmpi.soLAdir 	= /opt/intel/oneapi/mkl/2025.0/lib/intel64/ #MKL库文件目录
LAlib        = -L $(TOPdir)/src/cuda  -ldgemm -L/usr/local/cuda/lib64 -lcuda -lcudart -lcublas -L$(LAdir) -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -L/opt/intel/oneapi/compiler/2025.0/lib/ # 这行最后增加了 -L/opt/intel/oneapi/compiler/2025.0/lib/
CC   = /usr/local/openmpi/bin/mpicc

编译安装

make arch=CUDA clean_arch_all
make arch=CUDA

修改后的 Make.CUDA 文件如下
cat Make.CUDA

#
#     This is just a sample Make.
#     The user may need to edit:
#         1.) TOPdir
#         2.) MPI variables (MPdir,MPinc,MPlib)
#         3.) MKL BLAS variables (LAdir, LAinc, LAlib)
#         4.) The Compiler and Compiler/Linker Options (CC,CCFLAGS)
##
#  -- High Performance Computing Linpack Benchmark (HPL)
#     HPL - 1.0a - January 20, 2004
#     Antoine P. Petitet
#     University of Tennessee, Knoxville
#     Innovative Computing Laboratories
#     (C) Copyright 2000-2004 All Rights Reserved
#
#  -- Copyright notice and Licensing terms:
#
#  Redistribution  and  use in  source and binary forms, with or without
#  modification, are  permitted provided  that the following  conditions
#  are met:
#
#  1. Redistributions  of  source  code  must retain the above copyright
#  notice, this list of conditions and the following disclaimer.
#
#  2. Redistributions in binary form must reproduce  the above copyright
#  notice, this list of conditions,  and the following disclaimer in the
#  documentation and/or other materials provided with the distribution.
#
#  3. All  advertising  materials  mentioning  features  or  use of this
#  software must display the following acknowledgement:
#  This  product