设计工具
应用程序

使用美光DDR5和第四代AMD EPYC处理器提升HPC工作负载 

克里希纳Yalamanchi, Sudharshan Vazhkudai | 2022年11月

沙巴体育结算平台的处理器都已经发货了.

\r\n

高性能计算 (HPC) workloads have historically been the domain of some of the world’s fastest supercomputers. 这些都是大规模的, data-intensive workloads split into millions of operations that are run in parallel and use terabytes of data. These complex workloads are dedicated to solving some of humankind’s most challenging problems — weather and climate simulations; seismic modeling; chemical, physics and biological analysis; and more.

\r\n

随着计算机体系结构的进步, these workloads have increasingly been hosted in very large “scale-out” clusters of high-performance 服务器. 这些集群需要最新最好的计算, 织物, 内存和存储基础设施来解决可伸缩性问题, 此类关键工作负载的低延迟和性能需求. 虽然服务器cpu在性能和吞吐量方面有所提高, the past several years have seen the bandwidth provided by DDR4 memory become a bottleneck. There is just not enough memory bandwidth to supply the growing number of high-performance cores.

\r\n"}}' id="text-9cd33482d3">

The goal of the AMD and 微米 collaboration is to deliver best-in-class user experiences across client and data center platforms. 为此目的, 这两家公司在奥斯汀有一个联合服务器实验室, working to ensure we are reducing time to validate server memory and performing joint workload testing throughout validation and launch. 在这个博客中, we look at some common HPC-workload benchmark results that use 微米 DDR5 data center memory with 4th Gen和EPYCTM 这两种沙巴体育结算平台的处理器都已经发货了.

高性能计算 (HPC) workloads have historically been the domain of some of the world’s fastest supercomputers. 这些都是大规模的, data-intensive workloads split into millions of operations that are run in parallel and use terabytes of data. These complex workloads are dedicated to solving some of humankind’s most challenging problems — weather and climate simulations; seismic modeling; chemical, physics and biological analysis; and more.

随着计算机体系结构的进步, these workloads have increasingly been hosted in very large “scale-out” clusters of high-performance 服务器. 这些集群需要最新最好的计算, 织物, 内存和存储基础设施来解决可伸缩性问题, 此类关键工作负载的低延迟和性能需求. 虽然服务器cpu在性能和吞吐量方面有所提高, the past several years have seen the bandwidth provided by DDR4 memory become a bottleneck. There is just not enough memory bandwidth to supply the growing number of high-performance cores.

微米ddr5信息

美光DDR5内存和全新的AMD Zen 4服务器架构th 新一代AMD EPYC处理器改变了这一点. 现在, server CPUs and memory can be in much better balance to unlock performance and efficiency for the most demanding workloads. DDR5 memory helps organizations reach those insights faster, whether on premises or in the cloud. Consider some of the following proof points generated while testing 微米 DDR5 with the latest AMD Zen 4 96-core CPU with an industry-standard HPC workload benchmark. 我们所有的测试结果都显示了两倍的性能改进. 

Double the memory bandwidth with 微米 DDR5 + 4th Gen和EPYC Processors using 流

1 是一个简单的,众所周知的基准,用于测量HPC计算机中的内存带宽. 它为HPC系统捕获峰值内存带宽

用于此工作负载的软件堆栈

  • Alma 9 Linux内核5.14
  • 流.f  11-29-2021版本
微米 ddr5提供更多带宽,显示条形图

测试设置

  • DDR4系统rd 代AMD EPYC处理器64核和3.7 GHz; DDR4 3200 MHz system2 is fully populated with 64GB RDIMM
  • DDR5系统th 代AMD EPYC处理器,96核和3核.7 GHz; DDR5 4800 MHz system3 is fully populated with 64GB RDIMM

测试结果

  • 是单插槽DDR5系统378 GB/s内存带宽的两倍
  • This result means that customers can run larger artificial intelligence/machine learning (AI/ML) projects or do more HPC computations with increased memory bandwidth from DDR5.
柱状图显示相对增益ddr5与ddr4

天气研究及预报(WRF)4 采用美光DDR5运行速度快两倍

这个HPC工作负载代码被天气和气候社区使用, 该模型被广泛应用于气象领域. WRF typically performs well on traditional HPC architectures that support high floating-point processing, 高内存带宽和低延迟网络. 对于这一努力,美国大陆(CONUS)在2.横向分辨率选择5km.

用于此工作负载的软件堆栈 

  • Alma 9 Linux内核5.14 
  • WRF 2.3.5 & 4.3.3 
  • 打开MPI v4.1.1

测试设置

  • DDR4系统rd 代AMD EPYC处理器64核和3.7 GHz; DDR4 3200 MHz system2 is fully populated with 64GB RDIMM
  • DDR5系统th 代AMD EPYC处理器,96核和3核.7 GHz; DDR5 4800 MHz system3 is fully populated with 64GB RDIMM

测试结果

  • 我们能够执行1.每秒3567步 using 微米 DDR5 and 4th Gen和EPYC Processors as compared to 2.每秒8533个时间步.
  • Faster execution time means weather forecasters can either choose bigger datasets or run more models. 这两项努力都改善了预测.

OpenFOAM5 搭载美光DDR5的芯片运行速度快了两倍

OpenFOAM是用于计算流体动力学(CFD)的开源HPC工作负载。, 广泛用于各种行业,以减少开发时间和成本. It simulates physical interactions in 应用程序 ranging from consumer product design to aerospace design. One of the simulations included in the data set features a motorbike turbulence simulation. For this model, OpenFOAM calculates steady air flow around a motorcycle and rider. OpenFOAM load balances calculations according to the number of processes specified by the user, 然后将网格分解成各个部分进行求解. After the solve is complete, the mesh and solution is recomposed into a single domain.

用于此工作负载的软件堆栈

  • OpenFOAM CFD软件 (v8)摩托车网格尺寸为600 x 240 x 240
  • Alma 9 Linux内核5.14 
  • 打开MPI v4.1.1

测试设置

  • DDR4系统rd 代AMD EPYC处理器64核和3.7 GHz; DDR4 3200 MHz system2 is fully populated with 64GB RDIMM
  • DDR5系统th 代AMD EPYC处理器,96核和3核.7 GHz; DDR5 4800 MHz system3 is fully populated with 64GB RDIMM

测试结果

我们的测试显示是2.4倍于OpenFOAM相对增益, which is seen as among the top 5 HPC software platforms with a large open-source community. 广泛应用于高校和科研院所&D中心, the high parallelization nature of the software takes advantage of both memory (increased bandwidth) and CPU features like denser cores.

分子动力学6 搭载美光DDR5的芯片运行速度快了两倍

CP2K is an open-source quantum chemistry tool that can be used for a number of 应用程序, 包括固态生物系统的模拟. CP2K provides a general framework for different modeling methods such as DFT using the mixed Gaussian and plane wave approaches GPW and GAPW. The example that we looked at was linear-scaling density functional theory (DFT) of water (H2O) consisting of 6144 atoms in a 39-cubic-angstrom box (2048 water molecules in total).

用于此工作负载的软件堆栈

  • H2O-DFT-LS.NREP4 & H2O-DFT-LS
  • Alma 9 Linux内核5.14

测试设置

  • DDR4系统rd 代AMD EPYC处理器64核和3.7 GHz; DDR4 3200 MHz system2 is fully populated with 64GB RDIMM
  • DDR5系统th 代AMD EPYC处理器,96核和3核.7 GHz; DDR5 4800 MHz system3 is fully populated with 64GB RDIMM

测试结果

我们的测试显示是2.分子动力学的相对增益为1倍, 它可以很好地扩展更多的内核和更多的内存带宽.

Summary

上面的结果只是一个开始—并且只是HPC工作负载的几个示例. 更好地匹配高性能的能力, high-bandwidth memory with the incredible performance offered by new server processors such as the 4th Gen和EPYC Processors stands to be a watershed moment for HPC customers. We can expect to see many more such proof points that demonstrate how enterprise data center and cloud operators can use 微米 DDR5 on these new platforms to unlock new levels of performance and efficiency. 我们期待在接下来的几个月里与你分享这些. 要了解有关微米 DDR5和数据中心工作负载优势的更多信息,请访问 微米.com/ddr5.

1. 我们的流基准 设置为2.50亿矢量大小流基准- AMD 运行与1个CPU系统
2. AMD DDR4 system is an AMD EPYC 7763 64 core with DDR4-3200 MHz fully populated with 64GB RDIMMs
3. AMD DDR5 system is an AMD EPYC 9654 96 core with DDR5-4800 MHz fully populated with 64GB RDIMMs
4. 带12的WRF.5-km CONUS ran for 929 seconds on the DDR4 system and 287 seconds on the DDR5 system while counting storage I/O as well. 上面的例子来自WRF 2.5公里CONUS跑了2公里.每秒8533步,1.每秒3567步.
5. 对于OpenFOAM,我们运行了三个变体:
5a. 1004040运行时间=在DDR4系统上1144秒,在DDR5系统上478秒
5b. 1084646 runtimes = 1,633 seconds on DDR4 system and 698 seconds on the DDR5 system
5c. 1305252 runtimes = 2,522 seconds on DDR4 system and 1,091 seconds on the DDR5 system
6. 分子动力学工作负载运行为2,在DDR4系统上为519秒,在DDR5系统上是242秒

高级经理,生态系统实现

克里希纳Yalamanchi

Krishna is a Senior Ecosystem Development Manager, focusing on DDR5 and CXL solutions. 以前, Krishna领导英特尔IT的SAP HANA迁移, launched 3rd and 4th generation Intel Xeon for SAP workloads via their partner ecosystem for SI’s, OEM和云服务提供商.

工作量分析总监

Sudharshan Vazhkudai

Dr. Sudharshan年代. Vazhkudai是美光公司系统架构/工作负载分析总监. 他带领的团队遍布奥斯汀和海德拉巴, 印度, 专注于理解内存/存储(DDR)的可组合性, CXL, HBM and NVMe) product hierarchy and optimize system architectures for data center workloads.