Revolutionize Your Business with AI — Unveiling the Future of Tech

Google's New TASP Method Speeds Up Long-Context Large Language Models

TASP's topology-aware sequence partitioning boosts speed by up to 3.58x. It minimizes communication overhead and maximizes performance, paving the way for more efficient large language models.

, and Administrator

2025 October 7 . 1:05 AM

1 min read

A group of people are standing and shouting, they wore white color t-shirts. Behind them there is a... — A group of people are standing and shouting, they wore white color t-shirts. Behind them there is a building, on the right side there are trees.

Google's New TASP Method Speeds Up Long-Context Large Language Models

Researchers at Google have developed a new method, TASP, to address communication bottlenecks in large language models dealing with long contexts. This innovation promises substantial speedups and enhanced scalability.

TASP, developed by Mark Chen, Bill Dolan, and their team, introduces topology-aware sequence partitioning for parallelizing Transformer training with long sequences. Unlike previous methods, TASP allows for non-contiguous partitioning, providing flexibility in workload balancing.

To minimize communication overhead, TASP optimizes partitioning based on network topology and interconnect bandwidth. With a batch size of 48, it achieves speedups ranging from 1.3x to 2.4x for sequence lengths of 10K to 50K. Experiments show TASP achieves up to 3.58x speedup compared to Ring Attention and its variant on NVIDIA H100 and AMD MI300X systems.

TASP fully utilizes the communication capacity of accelerators through topology and primitive decomposition, leading to significant memory savings. It improves the compute-to-communication ratio, balancing workload and enhancing performance. Combining TASP with computation-oriented optimizations, like sparse attention, could yield further performance improvements.

TASP, a novel approach by Google researchers, significantly boosts the speed and scalability of long-context large language models. By optimizing partitioning based on network topology and allowing non-contiguous partitioning, TASP minimizes communication overhead and maximizes performance. Future integration with computation-oriented optimizations promises even greater efficiency.

Latest

In this image there is a painting on the wall on which we can see there is a watch with some...

Smart-home-devices

Louis Vuitton Revives Classic Monterey Watch After 33 Years

The iconic Monterey returns after 33 years. This timepiece blends Louis Vuitton's heritage with modern watchmaking.

, and Administrator

2025 October 9

In this image on both sides there are buildings, electric poles. There are few vehicles parked in...

Climate change

Apple Invests €100m in Schroders' China Renewable Energy Strategy

Apple's significant investment in China's renewable energy sector signals growing global interest. This move could accelerate China's transition to cleaner energy, reducing global emissions and fossil fuel demand.

, and Administrator

2025 October 9

In this image, we can see an advertisement contains robots and some text.

Revolutionize Your Business with AI

Confluent Explores Sale Amidst Private Equity and Tech Interest

Confluent's robust streaming software draws interest from private equity and tech companies. A sale could benefit shareholders, but no deals are final yet.

, and Administrator

2025 October 9

In the image there is an insect on a web and the background is blurry.

Strengthen Your Digital Fortunes

UK's NCA Launches 'Power Off' Operation to Combat Cybercrime

The NCA's innovative 'Power Off' operation is using fake DDoS-for-hire sites to catch cybercriminals. It's already led to arrests in the UK and the US.

, and Administrator

2025 October 9

Google's New TASP Method Speeds Up Long-Context Large Language Models

Google's New TASP Method Speeds Up Long-Context Large Language Models

Read also:

Related

Latest