当前位置：首页 > news >正文

HuaWei、NVIDIA 数据中心 AI 算力对比

news 2025/3/12 17:25:06

HuaWei Ascend 910B

Ascend 910B 是 HuaWei 于 2023 年推出的高性能 AI 处理器芯片，其对标产品为 Nvidia A100/A800，其算力表现如下：

峰值算力：Ascend 910B 的半精度（FP16）算力达到 256 TFLOPS（每秒 256 万亿次浮点运算）。
整数精度算力：Ascend 910B 的整数精度（INT8）算力达到 512 Tera-OPS。
单精度算力：Ascend 910B 的单精度（FP32）算力达到 128 TFLOPS。
能效比：Ascend 910B 的每瓦特性能达到 5.2 TFLOPS/W，相较于英伟达 A100 的每瓦特性能 4.7 TFLOPS/W，Ascend 910B 在能效上更优。
内存带宽：Ascend 910B 的内存带宽为 768 GB/s。
互连带宽：Ascend 910B 的芯片间互连带宽为 600GB/s，卡间互连带宽为 PCIe 4.0 x16，理论带宽 31.5GB/s。
功耗：Ascend 910B 的最大功耗为 350W。
AI 算力对比：科大讯飞与华为联合优化后，在他们的场景中 Ascend 910B 已经达到 NVIDIA A100 的性能。

NVIDIA A100

数据精度	A100 80GB PCIe	A100 80GB SXM
FP64	9.7 TFLOPS	9.7 TFLOPS
FP64 Tensor Core	19.5 TFLOPS	19.5 TFLOPS
FP32	19.5 TFLOPS	19.5 TFLOPS
Tensor Float 32 (TF32)	156 TFLOPS	312 TFLOPS
BFLOAT16 Tensor Core	312 TFLOPS	624 TFLOPS
FP16 Tensor Core	312 TFLOPS	624 TFLOPS
INT8 Tensor Core	624 TOPS	1248 TOPS
GPU Memory	80GB HBM2e	80GB HBM2e
GPU Memory Bandwidth	1935 GB/s	2039 GB/s
TDP 功耗	300W	400W
插槽类型	PCIe 4.0	SXM

NVIDIA H100

NVIDIA H100 Tensor Core GPU

数据精度	H100 SXM	H100 NVL
FP64	34 TFLOPS	30 TFLOPS
FP64 Tensor Core	67 TFLOPS	60 TFLOPS
FP32	67 TFLOPS	60 TFLOPS
TF32 Tensor Core	989 TFLOPS	835 TFLOPS
BFLOAT16 Tensor Core	1979 TFLOPS	1671 TFLOPS
FP16 Tensor Core	1979 TFLOPS	1671 TFLOPS
FP8 Tensor Core	3958 TFLOPS	3341 TFLOPS
INT8 Tensor Core	3958 TOPS	3341 TOPS
GPU Memory	80GB	94GB
GPU Memory Bandwidth	3.35TB/s	3.9TB/s
TDP 功耗	700 W	400 W
插槽类型	SXM	PCIe 5.0

基于 PCIe 的 NVIDIA H100 NVL（带有 NVLink 桥接）利用 Transformer Engine、NVLink 和 188GB HBM3 内存，在任何数据中心提供最佳性能和轻松扩展，使大型语言模型成为主流

NVIDIA H200

NVIDIA H200 Tensor Core GPU

数据精度	H200 SXM	H200 NVL
FP64	34 TFLOPS	30 TFLOPS
FP64 Tensor Core	67 TFLOPS	60 TFLOPS
FP32	67 TFLOPS	60 TFLOPS
TF32 Tensor Core	989 TFLOPS	835 TFLOPS
BFLOAT16 Tensor Core	1979 TFLOPS	1671 TFLOPS
FP16 Tensor Core	1979 TFLOPS	1671 TFLOPS
FP8 Tensor Core	3958 TFLOPS	3341 TFLOPS
INT8 Tensor Core	3958 TOPS	3341 TOPS
GPU Memory	141GB	141GB
GPU Memory Bandwidth	4.8TB/s	4.8TB/s
TDP 功耗	700 W	600 W
插槽类型	SXM	PCIe 5.0

基于 NVIDIA Hoppe 架构，NVIDIA H200 是首款提供 141GB（吉字节）HBM3e 内存、内存带宽达 4.8TB/s（太字节每秒）的 GPU

NVIDIA GB200 & GB200 NVL72

数据精度	GB200 NVL72	GB200
Configuration	36 Grace CPU : 72 Blackwell GPUs	1 Grace CPU : 2 Blackwell GPU
FP4 Tensor Core	1440 PFLOPS	40 PFLOPS
FP8/FP6 Tensor Core	720 PFLOPS	20 PFLOPS
INT8 Tensor Core	720 POPS	20 POPS
FP16/BF16 Tensor Core	360 PFLOPS	10 PFLOPS
TF32 Tensor Core	180 PFLOPS	5 PFLOPS
FP32	6480 TFLOPS	180 TFLOPS
FP64	3240 TFLOPS	90 TFLOPS
FP64 Tensor Core	3240 TFLOPS	90 TFLOPS
GPU Memory	Up to 13.5 TB HBM3e	Up to 384 GB HBM3e
GPU Bandwidth	576 TB/s	16 TB/s
NVLink Bandwidth	130TB/s	3.6TB/s
CPU Core Count	2592 Arm Neoverse V2 cores	72 Arm Neoverse V2 cores
CPU Memory	Up to 17 TB LPDDR5X	Up to 480GB LPDDR5X
CPU Bandwidth	Up to 18.4 TB/s	Up to 512 GB/s

GB200 NVL72 架构组成：

将 36 个 Grace Blackwell 超级芯片组合在一起，包含 72 个 Blackwell GPU 和 36 个 Grace CPU，通过第五代 NVLink 技术相互连接
每个 Grace Blackwell 超级芯片包含两个高性能的 NVIDIA Blackwell Tensor Core GPU 和一个 NVIDIA Grace CPU，使用 NVIDIA NVLink-C2C 连接

码字不易，若觉得本文对你有用，欢迎点赞 👍、分享 🚀 ，相关技术热点时时看🔥🔥🔥…

http://www.mrgr.cn/news/81324.html

相关文章：

ThinkPHP接入PayPal支付

Kibana：LINUX_X86_64 和 DEB_X86_64两种可选下载方式的区别

RT-DETR学习笔记（2)

CTFHub disable_functions通关

华为路由器AR101W-S

go语言并发文件备份，自动比对自动重命名（逐行注释）

Require：离线部署 Sourcegraph

Linux驱动开发--字符设备驱动开发

STM32 高级谈一下IPV4/默认网关/子网掩码/DNS服务器/MAC

c++类型判断和获取原始类型

Flutter 实现全局悬浮按钮学习

Linux自动挂载与卸载USB设备

菜鸟带新鸟——基于EPlan2022的部件库制作

免费 IP 归属地接口

C++程序启动报错和启动失败的常见原因分析与排查经验总结

Linux -- 从抢票逻辑理解线程互斥

开发场景中Java 集合的最佳选择

深入理解批量归一化（BN）：原理、缺陷与跨小批量归一化（CmBN）

数据库安全-redisCouchdb

鸿蒙-expandSafeArea使用