IBM Brings the Speed of Light to the Generative AI Era With Optics Breakthrough
IBM Brings the Speed of Light to the Generative AI Era With Optics Breakthrough
YORKTOWN HEIGHTS, N.Y. – Dec. 9, 2024: IBM (NYSE: IBM) has unveiled breakthrough research in optics technology that could dramatically improve how data centers train and run generative AI models. Researchers have pioneered a new process for co-packaged optics (CPO), the next generation of optics technology, to enable connectivity within data centers at the speed of light through optics to complement existing short reach electrical wires. By designing and assembling the first publicly announced successful polymer optical waveguide (PWG) to power this technology, IBM researchers have shown how CPO will redefine the way the computing industry transmits high-bandwidth data between chips, circuit boards, and servers.
YORKTOWN HEIGHTS, N.Y. – Dec. 9, 2024: IBm (NYSE: IBM) has unveiled breakthrough research in optics technology that could dramatically improve how data centers train and run generative AI models. Researchers have pioneered a new process for co-packaged optics (CPO), the next generation of optics technology, to enable connectivity within data centers at the speed of light through optics to complement existing short reach electrical wires. By designing and assembling the first publicly announced successful polymer optical waveguide (PWG) to power this technology, IBm researchers have shown how CPO will redefine the way the computing industry transmits high-bandwidth data between chips, circuit boards, and servers.
Today, fiber optic technology carries data at high speeds across long distances, managing nearly all the world's commerce and communications traffic with light instead of electricity. Although data centers use fiber optics for their external communications networks, racks in data centers still predominantly run communications on copper-based electrical wires. These wires connect GPU accelerators that may spend more than half of their time idle, waiting for data from other devices in a large, distributed training process which can incur significant expense and energy.
Today, fiber optic technology carries data at high speeds across long distances, managing nearly all the world's commerce and communications traffic with light instead of electricity. Although data centers use fiber optics for their external communications networks, racks in data centers still predominantly run communications on copper-based electrical wires. These wires connect GPU accelerators that may spend more than half of their time idle, waiting for data from other devices in a large, distributed training process which can incur significant expense and energy.
IBM researchers have demonstrated a way to bring optics' speed and capacity inside data centers. In a technical paper, IBM introduces a new CPO prototype module that can enable high-speed optical connectivity. This technology could significantly increase the bandwidth of data center communications, minimizing GPU downtime while drastically accelerating AI processing. This research innovation, as described, would enable:
IBm researchers have demonstrated a way to bring optics' speed and capacity inside data centers. In a technical paper, IBm introduces a new CPO prototype module that can enable high-speed optical connectivity. This technology could significantly increase the bandwidth of data center communications, minimizing GPU downtime while drastically accelerating AI processing. This research innovation, as described, would enable:
- Lower costs for scaling generative AI through a more than 5x power reduction in energy consumption compared to mid-range electrical interconnects [1], while extending the length of data center interconnect cables from one to hundreds of meters.
- Faster AI model training, enabling developers to train a Large Language Model (LLM) up to five times faster with CPO than with conventional electrical wiring. CPO could reduce the time it takes to train a standard LLM from three months to three weeks, with performance gains increasing by using larger models and more GPUs.[2]
- Dramatically increased energy efficiency for data centers, saving the energy equivalent of 5,000 U.S. homes' annual power consumption per AI model trained.[3]
- Lower costs for scaling generative AI through a more than 5x power reduction in energy consumption compared to mid-range electrical interconnects [1], while extending the length of data center interconnect cables from one to hundreds of meters.
- 更快的AI模型训练,使开发者能够使用CPO以比传统电力线路快五倍的速度训练大型语言模型(LLM)。CPO可以将训练一个标准LLM所需的时间从三个月缩短到三周,随着使用更大的模型和更多的gpu芯片,性能提升将更为显著。
- 显著提高了数据中心的能源效率,节省了相当于5000户美国家庭年用电量的能源,每训练一个人工智能模型。
"As generative AI demands more energy and processing power, the data center must evolve – and co-packaged optics can make these data centers future-proof," said Dario Gil, SVP and Director of Research at IBM. "With this breakthrough, tomorrow's chips will communicate much like how fiber optics cables carry data in and out of data centers, ushering in a new era of faster, more sustainable communications that can handle the AI workloads of the future."
“随着生成性人工智能对能源和处理能力的需求增加,数据中心必须进化——而联合打包的光学技术可以让这些数据中心具备未来保障,” IBm的高级副总裁兼研究主管Dario Gil说。“通过这一突破,明天的芯片将像光纤电缆在数据中心内外传递数据那样进行通信,开创一个能够处理未来人工智能工作负载的新更快、更可持续的通信时代。”
Eighty times faster bandwidth than today's chip-to-chip communication
相比于今天的芯片到芯片通信,带宽快了八十倍
In recent years, advances in chip technology have densely packed transistors onto a chip; IBM's 2 nanometer node chip technology can contain more than 50 billion transistors. CPO technology aims to scale the interconnection density between accelerators by enabling chipmakers to add optical pathways connecting chips on an electronic module beyond the limits of today's electrical pathways. IBM's paper outlines how these new high bandwidth density optical structures, coupled with transmitting multiple wavelengths per optical channel, have the potential to boost bandwidth between chips as much as 80 times compared to electrical connections.
近年来,芯片技术的进步使得晶体管在芯片上密集堆积;IBM的2纳米节点芯片技术可容纳超过500亿个晶体管。CPO技术旨在通过使芯片制造商能够在电子模块上添加连接芯片的光学通道,从而在加速器之间扩大互连密度,超越今日电气通道的限制。IBM的论文概述了这些新型高带宽密度光学结构,加上每个光通道传输多个波长,提升芯片之间带宽的潜力相比于电连接可达80倍。
IBM's innovation, as described, would enable chipmakers to add six times as many optical fibers at the edge of a silicon photonics chip, called "beachfront density," compared to the current state-of-the-art CPO technology. Each fiber, about three times the width of a human hair, could span centimeters to hundreds of meters in length and transmit terabits of data per second. The IBM team assembled a high-density PWG at 50 micrometer pitch optical channels, adiabatically coupled to silicon photonics waveguides, using standard assembly packaging processes.
IBM的创新如上所述,将使芯片制造商能够在硅光子芯片的边缘添加六倍于当前最先进的CPO技术的光纤,称为“海滨密度”。每根光纤宽度约为人类头发的三倍,长度可达几厘米到数百米,每秒传输数万亿位数据。IBM团队在50微米间距的光通道上组装了一个高密度PWG,采用标准的组装打包工艺,与硅光子波导逐渐耦合。
The paper additionally indicates that these CPO modules with PWG at 50 micrometer pitch are the first to pass all stress tests required for manufacturing. Components are subjected to high-humidity environments and temperatures ranging from -40°C to 125°C, as well as mechanical durability testing to confirm that optical interconnects can bend without breaking or losing data. Moreover, researchers have demonstrated PWG technology to an 18-micrometer pitch. Stacking four PWGs would allow for up to 128 channels for connectivity at that pitch.
论文还指出,在50微米间距下的PWG CPO模块是第一批通过制造所需的所有压力测试的。在高湿环境和温度范围从-40°C到125°C的条件下,对元件进行机械耐久性测试,以确认光互连在不破坏或丢失数据的情况下可以弯曲。此外,研究人员已经展示了18微米间距的PWG技术。堆叠四个PWG将在该间距下提供多达128个连接通道。
IBM's continued leadership in semiconductor R&D
IBM在半导体研发领域的持续领导地位
CPO technology enables a new pathway to meet AI's increasing performance demands, with the potential to replace off-module communications from electrical to optical. It continues IBM's history of leadership in semiconductor innovation, which also includes the first 2 nm node chip technology, the first implementation of 7 nm and 5 nm process technologies, Nanosheet transistors, vertical transistors (VTFET), single cell DRAM, and chemically amplified photoresists.
CPO技术开辟了一条新的路径,以满足人工智能日益增长的性能需求,具有将电气模块通信替换为光学通信的潜力。它延续了IBM在半导体创新方面的领导历史,其中还包括首个2nm节点芯片技术、首个7nm和5nm工艺技术的实施、纳米片晶体管、垂直晶体管(VTFET)、单电芯DRAm和化学增强光刻胶。
Researchers completed design, modeling, and simulation work for CPO in Albany, New York, which the U.S. Department of Commerce recently selected as the home of America's first National Semiconductor Technology Center (NSTC), the NSTC EUV Accelerator. Researchers assembled prototypes and tested modules at IBM's facility in Bromont, Quebec, one of North America's largest chip assembly and test sites. Part of the Northeast Semiconductor Corridor between the United States and Canada, IBM's Bromont fab has led the world in chip packaging for decades.
研究人员在纽约奥尔巴尼完成了CPO的设计、建模和仿真工作,美国商务部最近选择该地作为美国首个国家半导体技术中心(NSTC)的所在地,即NSTC EUV加速器。研究人员在IBM位于魁北克的Bromont设施组装原型并测试模块,Bromont是北美最大的一体化芯片组装和测试地点之一。作为美国与加拿大之间的东北半导体走廊的一部分,IBM的Bromont工厂在芯片封装领域领先全球已有数十年。
About IBM
关于IBM
IBM is a leading provider of global hybrid cloud and AI, and consulting expertise. We help clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain the competitive edge in their industries. More than 4,000 government and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM's hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently and securely. IBM's breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and consulting deliver open and flexible options to our clients. All of this is backed by IBM's long-standing commitment to trust, transparency, responsibility, inclusivity and service. Visit for more information.
IBM是全球领先的混合云和人工智能提供商以及咨询专家。我们帮助175个国家的客户利用他们数据中的洞察,优化业务流程,降低成本,并在各自行业中获取竞争优势。超过4000个政府和企业实体在金融服务、电信和医疗等关键基础设施领域依赖IBM的混合云平台和Red Hat OpenShift,以快速、高效和安全地进行数字化转型。IBM在人工智能、量子计算、行业特定云解决方案和咨询方面的突破性创新,为客户提供开放和灵活的期权。所有这一切都得益于IBM长期以来对Trust、透明度、责任、包容性和服务的承诺。有关更多信息,请访问。
[1] A reduction from five to less than one picojoule per bit.
[1] 从五个减少到每比特少于一个皮焦耳。
[2] Figures based on training a 70 billion parameter LLM using industry-standard GPUs and interconnects.
[2] 基于使用行业标准GPU和互连训练700亿参数LLm的数字。
[3] Figures based on training a large LLM (such as GPT-4) using industry-standard GPUs and interconnects.
[3] 数字基于使用行业标准GPU和互连设备训练大型LLM(如GPT-4)。
Media Contacts:
媒体联系人:
Bethany Hill McCarthy
IBM Research
bethany@ibm.com
贝瑟妮·希尔·麦卡锡
IBM研究
bethany@ibm.com
Willa Hahn
IBM Research
willa.hahn@ibm.com
威拉·哈恩
IBM研究
willa.hahn@ibm.com