IBM Brings the Speed of Light to the Generative AI Era With Optics Breakthrough
IBM Brings the Speed of Light to the Generative AI Era With Optics Breakthrough
YORKTOWN HEIGHTS, N.Y. – Dec. 9, 2024: IBM (NYSE: IBM) has unveiled breakthrough research in optics technology that could dramatically improve how data centers train and run generative AI models. Researchers have pioneered a new process for co-packaged optics (CPO), the next generation of optics technology, to enable connectivity within data centers at the speed of light through optics to complement existing short reach electrical wires. By designing and assembling the first publicly announced successful polymer optical waveguide (PWG) to power this technology, IBM researchers have shown how CPO will redefine the way the computing industry transmits high-bandwidth data between chips, circuit boards, and servers.
YORKTOWN HEIGHTS, N.Y. – Dec. 9, 2024: IBm (NYSE: IBM) has unveiled breakthrough research in optics technology that could dramatically improve how data centers train and run generative AI models. Researchers have pioneered a new process for co-packaged optics (CPO), the next generation of optics technology, to enable connectivity within data centers at the speed of light through optics to complement existing short reach electrical wires. By designing and assembling the first publicly announced successful polymer optical waveguide (PWG) to power this technology, IBm researchers have shown how CPO will redefine the way the computing industry transmits high-bandwidth data between chips, circuit boards, and servers.
Today, fiber optic technology carries data at high speeds across long distances, managing nearly all the world's commerce and communications traffic with light instead of electricity. Although data centers use fiber optics for their external communications networks, racks in data centers still predominantly run communications on copper-based electrical wires. These wires connect GPU accelerators that may spend more than half of their time idle, waiting for data from other devices in a large, distributed training process which can incur significant expense and energy.
Today, fiber optic technology carries data at high speeds across long distances, managing nearly all the world's commerce and communications traffic with light instead of electricity. Although data centers use fiber optics for their external communications networks, racks in data centers still predominantly run communications on copper-based electrical wires. These wires connect GPU accelerators that may spend more than half of their time idle, waiting for data from other devices in a large, distributed training process which can incur significant expense and energy.
IBM researchers have demonstrated a way to bring optics' speed and capacity inside data centers. In a technical paper, IBM introduces a new CPO prototype module that can enable high-speed optical connectivity. This technology could significantly increase the bandwidth of data center communications, minimizing GPU downtime while drastically accelerating AI processing. This research innovation, as described, would enable:
IBm researchers have demonstrated a way to bring optics' speed and capacity inside data centers. In a technical paper, IBm introduces a new CPO prototype module that can enable high-speed optical connectivity. This technology could significantly increase the bandwidth of data center communications, minimizing GPU downtime while drastically accelerating AI processing. This research innovation, as described, would enable:
- Lower costs for scaling generative AI through a more than 5x power reduction in energy consumption compared to mid-range electrical interconnects [1], while extending the length of data center interconnect cables from one to hundreds of meters.
- Faster AI model training, enabling developers to train a Large Language Model (LLM) up to five times faster with CPO than with conventional electrical wiring. CPO could reduce the time it takes to train a standard LLM from three months to three weeks, with performance gains increasing by using larger models and more GPUs.[2]
- Dramatically increased energy efficiency for data centers, saving the energy equivalent of 5,000 U.S. homes' annual power consumption per AI model trained.[3]
- Lower costs for scaling generative AI through a more than 5x power reduction in energy consumption compared to mid-range electrical interconnects [1], while extending the length of data center interconnect cables from one to hundreds of meters.
- 更快的AI模型訓練,使開發者能夠使用CPO以比傳統電力線路快五倍的速度訓練大型語言模型(LLM)。CPO可以將訓練一個標準LLM所需的時間從三個月縮短到三週,隨着使用更大的模型和更多的gpu芯片,性能提升將更爲顯著。
- 顯著提高了數據中心的能源效率,節省了相當於5000戶美國家庭年用電量的能源,每訓練一個人工智能模型。
"As generative AI demands more energy and processing power, the data center must evolve – and co-packaged optics can make these data centers future-proof," said Dario Gil, SVP and Director of Research at IBM. "With this breakthrough, tomorrow's chips will communicate much like how fiber optics cables carry data in and out of data centers, ushering in a new era of faster, more sustainable communications that can handle the AI workloads of the future."
「隨着生成性人工智能對能源和處理能力的需求增加,數據中心必須進化——而聯合打包的光學技術可以讓這些數據中心具備未來保障,」 IBm的高級副總裁兼研究主管Dario Gil說。「通過這一突破,明天的芯片將像光纖電纜在數據中心內外傳遞數據那樣進行通信,開創一個能夠處理未來人工智能工作負載的新更快、更可持續的通信時代。」
Eighty times faster bandwidth than today's chip-to-chip communication
相比於今天的芯片到芯片通信,帶寬快了八十倍
In recent years, advances in chip technology have densely packed transistors onto a chip; IBM's 2 nanometer node chip technology can contain more than 50 billion transistors. CPO technology aims to scale the interconnection density between accelerators by enabling chipmakers to add optical pathways connecting chips on an electronic module beyond the limits of today's electrical pathways. IBM's paper outlines how these new high bandwidth density optical structures, coupled with transmitting multiple wavelengths per optical channel, have the potential to boost bandwidth between chips as much as 80 times compared to electrical connections.
近年來,芯片技術的進步使得晶體管在芯片上密集堆積;IBM的2納米節點芯片技術可容納超過500億個晶體管。CPO技術旨在通過使芯片製造商能夠在電子模塊上添加連接芯片的光學通道,從而在加速器之間擴大互連密度,超越今日電氣通道的限制。IBM的論文概述了這些新型高帶寬密度光學結構,加上每個光通道傳輸多個波長,提升芯片之間帶寬的潛力相比於電連接可達80倍。
IBM's innovation, as described, would enable chipmakers to add six times as many optical fibers at the edge of a silicon photonics chip, called "beachfront density," compared to the current state-of-the-art CPO technology. Each fiber, about three times the width of a human hair, could span centimeters to hundreds of meters in length and transmit terabits of data per second. The IBM team assembled a high-density PWG at 50 micrometer pitch optical channels, adiabatically coupled to silicon photonics waveguides, using standard assembly packaging processes.
IBM的創新如上所述,將使芯片製造商能夠在硅光子芯片的邊緣添加六倍於當前最先進的CPO技術的光纖,稱爲「海濱密度」。每根光纖寬度約爲人類頭髮的三倍,長度可達幾厘米到數百米,每秒傳輸數萬億位數據。IBM團隊在50微米間距的光通道上組裝了一個高密度PWG,採用標準的組裝打包工藝,與硅光子波導逐漸耦合。
The paper additionally indicates that these CPO modules with PWG at 50 micrometer pitch are the first to pass all stress tests required for manufacturing. Components are subjected to high-humidity environments and temperatures ranging from -40°C to 125°C, as well as mechanical durability testing to confirm that optical interconnects can bend without breaking or losing data. Moreover, researchers have demonstrated PWG technology to an 18-micrometer pitch. Stacking four PWGs would allow for up to 128 channels for connectivity at that pitch.
論文還指出,在50微米間距下的PWG CPO模塊是第一批通過製造所需的所有壓力測試的。在高溼環境和溫度範圍從-40°C到125°C的條件下,對元件進行機械耐久性測試,以確認光互連在不破壞或丟失數據的情況下可以彎曲。此外,研究人員已經展示了18微米間距的PWG技術。堆疊四個PWG將在該間距下提供多達128個連接通道。
IBM's continued leadership in semiconductor R&D
IBM在半導體研發領域的持續領導地位
CPO technology enables a new pathway to meet AI's increasing performance demands, with the potential to replace off-module communications from electrical to optical. It continues IBM's history of leadership in semiconductor innovation, which also includes the first 2 nm node chip technology, the first implementation of 7 nm and 5 nm process technologies, Nanosheet transistors, vertical transistors (VTFET), single cell DRAM, and chemically amplified photoresists.
CPO技術開闢了一條新的路徑,以滿足人工智能日益增長的性能需求,具有將電氣模塊通信替換爲光學通信的潛力。它延續了IBM在半導體創新方面的領導歷史,其中還包括首個2nm節點芯片技術、首個7nm和5nm工藝技術的實施、納米片晶體管、垂直晶體管(VTFET)、單電芯DRAm和化學增強光刻膠。
Researchers completed design, modeling, and simulation work for CPO in Albany, New York, which the U.S. Department of Commerce recently selected as the home of America's first National Semiconductor Technology Center (NSTC), the NSTC EUV Accelerator. Researchers assembled prototypes and tested modules at IBM's facility in Bromont, Quebec, one of North America's largest chip assembly and test sites. Part of the Northeast Semiconductor Corridor between the United States and Canada, IBM's Bromont fab has led the world in chip packaging for decades.
研究人員在紐約奧爾巴尼完成了CPO的設計、建模和仿真工作,美國商務部最近選擇該地作爲美國首個國家半導體技術中心(NSTC)的所在地,即NSTC EUV加速器。研究人員在IBM位於魁北克的Bromont設施組裝原型並測試模塊,Bromont是北美最大的一體化芯片組裝和測試地點之一。作爲美國與加拿大之間的東北半導體走廊的一部分,IBM的Bromont工廠在芯片封裝領域領先全球已有數十年。
About IBM
關於IBM
IBM is a leading provider of global hybrid cloud and AI, and consulting expertise. We help clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain the competitive edge in their industries. More than 4,000 government and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM's hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently and securely. IBM's breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and consulting deliver open and flexible options to our clients. All of this is backed by IBM's long-standing commitment to trust, transparency, responsibility, inclusivity and service. Visit for more information.
IBM是全球領先的混合雲和人工智能提供商以及諮詢專家。我們幫助175個國家的客戶利用他們數據中的洞察,優化業務流程,降低成本,並在各自行業中獲取競爭優勢。超過4000個政府和企業實體在金融服務、電信和醫療等關鍵基礎設施領域依賴IBM的混合雲平台和Red Hat OpenShift,以快速、高效和安全地進行數字化轉型。IBM在人工智能、量子計算、行業特定雲解決方案和諮詢方面的突破性創新,爲客戶提供開放和靈活的期權。所有這一切都得益於IBM長期以來對Trust、透明度、責任、包容性和服務的承諾。有關更多信息,請訪問。
[1] A reduction from five to less than one picojoule per bit.
[1] 從五個減少到每比特少於一個皮焦耳。
[2] Figures based on training a 70 billion parameter LLM using industry-standard GPUs and interconnects.
[2] 基於使用行業標準GPU和互連訓練700億參數LLm的數字。
[3] Figures based on training a large LLM (such as GPT-4) using industry-standard GPUs and interconnects.
[3] 數字基於使用行業標準GPU和互連設備訓練大型LLM(如GPT-4)。
Media Contacts:
媒體聯繫人:
Bethany Hill McCarthy
IBM Research
bethany@ibm.com
貝瑟妮·希爾·麥卡錫
IBM研究
bethany@ibm.com
Willa Hahn
IBM Research
willa.hahn@ibm.com
威拉·哈恩
IBM研究
willa.hahn@ibm.com