share_log

Starburst Announces 100GB/second Streaming Ingest From Apache Kafka to Apache Iceberg Tables

Starburst Announces 100GB/second Streaming Ingest From Apache Kafka to Apache Iceberg Tables

Starburst宣佈從阿帕奇石油Kafka流式傳輸每秒100GB到阿帕奇冰山表
PR Newswire ·  10/24 08:00

Go from data ingestion to blazing-fast SQL analytics in near real-time with the Starburst Open Hybrid Lakehouse

Starburst開放式混合湖屋平台將數據攝入轉換爲幾乎實時的SQL分析

BOSTON, Oct. 24, 2024 /PRNewswire/ -- Starburst, the Trino company, today announced a range of new capabilities for their Trino-based open hybrid lakehouse platform, Galaxy: the general availability of fully managed streaming ingestion from Apache Kafka to Apache Iceberg tables; the public preview of fully managed ingestion from files landing in Amazon Web Services (AWS) S3 to Iceberg tables; and multiple enhancements to performance and price-performance of their lakehouse platform. Galaxy customers can now easily configure and ingest data at a verified scale of up to 100GB/second per Iceberg table at leading price-performance. In addition, Galaxy users can now benefit from faster and more accurate auto-scaling of resources, simplified policy-based routing of user queries, and enhanced performance through improved automatic caching and indexing.

波士頓,2024年10月24日/美通社/-- Starburst,即Trino公司,今日宣佈了基於Trino的開放式混合湖屋平台Galaxy的一系列新功能:Apache Kafka到Apache Iceberg表的完全託管流式攝入的普遍可用性;從Amazon Web Services(AWS)S3中的文件到Iceberg表的完全託管攝入的公共預覽;以及對其湖屋平台的性能和價格性能的多重增強。Galaxy客戶現在可以輕鬆配置和攝入數據,每個Iceberg表的認證吞吐量高達100GB/秒,價格性能領先。此外,Galaxy用戶現在可以從資源更快更準確的自動擴展中獲益,簡化基於策略的用戶查詢路由,以及通過改進的自動緩存和索引提高性能。

Businesses that require data to be available for analytics in their cloud data lake with minimal delay traditionally build complex ingestion systems that require cobbling together multiple tools and writing custom software to stream data into cloud data lakes. Alternatively, these organizations may rely on incomplete solutions that only handle the ingestion process. Both approaches tend to be fragile, difficult to scale, costly to maintain, and solve only part of the problem. After the data lands in the lake, it still needs to be transformed and optimized for efficient querying—requiring even more code, pipelines, tools, and added complexity. In addition, the pressure for cost optimization across analytics functions is increasing. CIOs are looking for ways to improve their operational overhead against traditional lakehouses and legacy data warehouses while maintaining control of their data and analytics stack.

需要數據在雲數據湖中實時可用進行分析的企業傳統上構建複雜的攝入系統,需要將多個工具拼湊在一起,編寫自定義軟件將數據流式傳輸到雲數據湖中。或者,這些組織可能依賴於只處理攝入過程的不完整解決方案。這兩種方法往往是脆弱的、難以擴展的、維護成本高昂的,並且只解決了問題的一部分。數據落入湖屋後,仍然需要進行轉換和優化以進行高效查詢—這需要更多的代碼、流水線、工具和增加複雜性。此外,隨着分析功能的成本優化壓力不斷增加。CIO們正在尋找方法來改善他們對傳統湖屋和遺留數據倉庫的運營開銷,同時保持對其數據和分析堆棧的控制。

"As businesses strive to perform analytics on real-time data, they seek frictionless solutions for continuous data ingestion. They also prioritize open standards like Apache Iceberg to future-proof their environments amid rapidly evolving technologies. Furthermore, reducing complexity and simplifying architectures is critical, helping organizations optimize IT investments and avoid unnecessary costs associated with integrating disparate systems," said Sanjeev Mohan, Principal and Founder of SanjMo. "Starburst's latest announcements are significant because they address these exact needs—delivering improved price performance, simplicity, and efficient elastic scaling for modern data workloads."

「隨着企業努力對實時數據進行分析,他們尋求無障礙的持續數據攝入解決方案。他們還將開放標準(如Apache Iceberg)作爲未來技術發展快速演變中環境的未來證明。此外,減少複雜性,簡化架構至關重要,幫助組織優化IT投資,避免與集成不同系統相關的不必要成本,」 SanjMo的首席創始人和創始人Sanjeev Mohan表示。「Starburst的最新公告非常重要,因爲它們滿足了這些確切需求—爲現代數據工作負載提供改進的價格性能、簡單性和高效的彈性擴展。」

Streaming Ingest from Kafka (general availability) - Starburst now enables the easy creation of fully managed ingestion pipelines for Kafka topics at a verified scale up to 100GB/second, at half the cost of alternative solutions. Configuration is completed in minutes and simply entails selecting the Kafka topic, the auto-generated table schema, and the location of the resulting Iceberg table.

Starburst現在可提供從Kafka進行流式攝取(一般可用性)- Starburst現在可以輕鬆創建完全託管的攝取管道,用於Kafka話題的規模經過驗證的擴展高達100GB/秒,成本僅爲其他解決方案的一半。配置僅需幾分鐘,簡單地涉及選擇Kafka話題、自動生成的表模式以及結果Iceberg表的位置。

  • Starburst Galaxy's streaming ingestion is serverless and does the heavy lifting without any manual configuration, tuning, or additional tools required by the customer. Galaxy automatically ingests incoming messages from Kafka topics into managed Iceberg tables in S3, compacts and transforms the data, applies the necessary governance, and makes it available to query within about one minute.
  • Starburst's streaming ingestion can connect to Kafka-compliant systems, which includes Confluent Cloud, Amazon Managed Streaming for Apache Kafka (MSK), and Apache Kafka.
  • Starburst guarantees exactly once delivery, ensuring no duplicate messages are read, and no messages are missed to ensure accuracy.
  • It is built for a massive scale and has been tested to ingest 100 gigabytes of streaming data per second.
  • Starburst Galaxy的流式攝取是無服務器的,在無需任何手動配置、調整或客戶所需的其他工具的情況下進行了繁重的工作。Galaxy自動將Kafka話題中的傳入消息攝取到S3中管理的Iceberg表中,對數據進行壓縮和轉換,應用必要的治理,並在約一分鐘內開始可查詢。
  • Starburst的流式攝取可以連接到符合Kafka的系統,其中包括Confluent Cloud、亞馬遜託管的Apache Kafka數據流(MSK)和Apache Kafka。
  • Starburst保證精確一次交付,確保不讀取重複消息,也不會錯過任何消息,以確保準確性。
  • 它專爲大規模構建,並經過測試可以每秒攝取100GB的流式數據。

Ingest from Files landing in S3 (public preview) - Additionally, Starburst is expanding its ingestion capabilities by introducing file loading, offering customers a powerful, automated alternative to DIY or off-the-shelf solutions. This feature reads, parses, and writes records from files directly into Iceberg tables, which leverage the new ingestion capabilities to automatically optimize the tables for read performance through capabilities like compaction, snapshot retention, orphaned file removal, and statistics collection. The public preview of file loading will be available in November 2024.

從存儲在S3中的文件攝取(公開預覽)-此外,Starburst通過引入文件加載正在擴展其攝取能力,爲客戶提供了一個功能強大的自動化替代方案,用於DIY或現成解決方案。此功能直接從文件中讀取、解析和寫入記錄到Iceberg表,通過新的攝取功能自動優化表,以提高讀取性能,例如壓縮、快照保留、孤立文件刪除和統計信息收集。文件加載的公共預覽將於2024年11月提供。

Enhanced Auto Scaling (general availability) - Starburst makes auto scaling smarter in Starburst Galaxy. In environments with high concurrent users, demand for compute resources can fluctuate dynamically. The enhanced Auto Scaling intelligently monitors both active and pending queries to understand and allocate how much compute resources are needed per query up to 50% faster. Not only does enhanced Auto Scaling provision additional compute resources faster, but it also includes the ability to automatically reactivate draining worker nodes, improving the efficiency of resource utilization.

增強型自動擴展(一般可用性)- Starburst使Starburst Galaxy中的自動擴展變得更加智能。在具有高併發用戶的環境中,對計算資源的需求可能會動態波動。增強型自動擴展智能地監視活動和掛起的查詢,以了解並分配每個查詢所需的計算資源多快,最多可提高50%。增強型自動擴展不僅可以更快地提供額外的計算資源,而且還包括自動重新激活正在排空的工作節點的功能,提高資源利用效率。

Next Gen Caching (private preview) - Data engineers undertake various labor-intensive data preparation tasks. Starburst Warp Speed helps automate some of those tasks. Still, as business needs evolve and teams turn to a semantic layer approach with tools like dbt, data engineers struggle to provide fast query performance, scalability, and stability for BI and dashboarding without significant overhead. The next-generation caching in Starburst Galaxy combines the power of Warp Speed's smart indexing and caching capabilities to intermediate workload results. Warp Speed will now be able to identify patterns of similar subqueries across different workloads while improving performance up to 62% compared to non-accelerated queries.

下一代緩存(私人預覽)- 數據工程師承擔各種繁重的數據準備任務。Starburst Warp Speed幫助自動化其中一些任務。然而,隨着業務需求的發展,團隊轉向像dbt這樣的語義層方法,數據工程師在不增加大量開銷的情況下難以爲BI和儀表盤提供快速查詢性能、可擴展性和穩定性。在Starburst Galaxy中的下一代緩存結合了Warp Speed智能索引和緩存功能的強大能力,以中介工作負載結果。Warp Speed現在將能夠識別不同工作負載中類似子查詢的模式,同時將性能提升高達62%,與未加速的查詢相比。

User Role Based Routing (private preview) - Previously, users would spend too much effort determining which queries were appropriate for different cluster types. Also, administrators weren't able to assign groups of users to a cluster via roles and privileges. With User Role Based Routing, Starburst now supports the easy allocation of resources by cluster type. Customers can programmatically route queries to the appropriate Galaxy cluster based on a predefined set of rules. Users can send all queries to a single URL, which will route the queries based on the user's role, minimizing human intervention while improving what is already industry-leading price-performance against other leading cloud data warehouses and lakehouses.

用戶角色路由(私人預覽)- 以前,用戶不斷努力確定哪些查詢適用於不同的群集類型。此外,管理員無法通過角色和權限將用戶組分配到一個群集中。有了用戶角色路由,Starburst現在支持按照群集類型輕鬆分配資源。客戶可以按照預定義規則自動將查詢路由到適當的Galaxy群集。用戶可以將所有查詢發送到一個單一URL,該URL將根據用戶角色路由查詢,最大限度地減少人爲干預,同時提高已經領先行業的價格性能,勝過其他領先的雲數據倉庫和數據湖。

"With our new ingestion capabilities to Iceberg, customers don't have to worry about how fast or how much data they need to land in their data lake. At 100GB/second, Galaxy's ingestion can handle the scale of the most demanding use cases. Because it is so easy to configure and cost-effective to operate, customers don't have to artificially limit the number of up-to-date, fresh tables in their lake, enabling them to make the most informed business decisions," said Tobias Ternstrom, Starburst's Chief Product Officer.

「有了我們新的Iceberg攝入能力,客戶無需擔心需要以多快的速度或多少數據落入他們的數據湖。每秒100GB,Galaxy的攝入能夠處理最苛刻用例的規模。由於配置非常簡單且操作成本低廉,客戶不必人爲限制數據湖中最新、最新的表的數量,可以做出最明智的業務決策,」Starburst首席產品官托比亞斯·特恩斯特羅姆說。

Supporting Resources

支持資源

For more information, read Starburst's Icehouse launch blog.
Download an image of the Starburst Open Data Lakehouse here.

要了解更多信息,請閱讀Starburst的Icehouse啓動博客。
在此處下載Starburst Open Data Lakehouse的圖像。

About Starburst

關於Starburst

Starburst, the Open Hybrid Lakehouse, is the leading end-to-end data platform to securely access, analyze, and share data for analytics and AI across hybrid, on-premises, and multi-cloud environments. As the leaders in Trino, a modern open-source SQL engine, Starburst empowers the most data-intensive and security-conscious organizations like Comcast, Halliburton, Vectra, EMIS Health, and 7 of the top 10 global banks to democratize data access, enhance analytics performance, and improve architecture optionality. With the Open Hybrid Lakehouse from Starburst, enterprises globally can easily discover and use all their relevant business data to power new applications and analytics across risk mitigation, supply chain, customer experiences, product optimization, streaming, and more.

Starburst是領先的端到端數據平台Open Hybrid Lakehouse,可安全訪問、分析和分享來自混合、本地和多雲環境的數據,用於分析和人工智能。作爲現代開源SQL引擎Trino的領導者,Starburst賦予最數據密集和安全意識組織,如康卡斯特、哈里伯頓、Vectra、EMIS Health和全球前10家銀行中的7家,民主化數據訪問,提升分析性能和改善架構選擇性。藉助Starburst的Open Hybrid Lakehouse,全球企業可以輕鬆發現和使用所有相關的業務數據,促進新應用程序和分析的發展,涵蓋風險緩解、供應鏈、客戶體驗、產品優化、實時流和更多領域。

For additional information, please visit

獲取更多信息,請訪問

SOURCE Starburst

資訊來源爲Starburst

WANT YOUR COMPANY'S NEWS FEATURED ON PRNEWSWIRE.COM?

想要您公司的新聞在PRNEWSWIRE.COM上特色呈現嗎?

440k+
440k+

Newsrooms &
新聞發佈室&

Influencers
影響力
9k+
9k+

Digital Media
數字媒體

Outlets
賣場
270k+
270k+

Journalists
新聞記者

Opted In
已選擇加入
GET STARTED
開始使用
声明:本內容僅用作提供資訊及教育之目的,不構成對任何特定投資或投資策略的推薦或認可。 更多信息
    搶先評論