Differential Transformer

18 de oct. de 2024 · 9m 49s
Differential Transformer
Descripción

🎧 Differential Transformer The paper introduces the Differential Transformer, a new architecture for large language models (LLMs) that aims to improve their ability to focus on relevant information within long...

mostra más
🎧 Differential Transformer

The paper introduces the Differential Transformer, a new architecture for large language models (LLMs) that aims to improve their ability to focus on relevant information within long sequences. It achieves this by introducing a differential attention mechanism which calculates attention scores as the difference between two separate softmax attention maps, effectively canceling out noise and promoting sparse attention patterns. This enhanced focus on relevant context leads to improvements in various tasks, including long-context modeling, key information retrieval, hallucination mitigation, in-context learning, and reducing activation outliers. The paper provides experimental evidence to support these claims, showcasing the Differential Transformer's superiority over traditional Transformers in several scenarios.

📎 Link to paper
mostra menos
Información
Autor Shahriar Shariati
Organización Shahriar Shariati
Página web -
Etiquetas

Parece que no tienes ningún episodio activo

Echa un ojo al catálogo de Spreaker para descubrir nuevos contenidos.

Actual

Portada del podcast

Parece que no tienes ningún episodio en cola

Echa un ojo al catálogo de Spreaker para descubrir nuevos contenidos.

Siguiente

Portada del episodio Portada del episodio

Cuánto silencio hay aquí...

¡Es hora de descubrir nuevos episodios!

Descubre
Tu librería
Busca