Contactos
Información
Daily podcast about the published articles in the LLM field.
22 OCT. 2024 · 🤔 Revealing the Barriers of Language Agents in Planning
This research paper examines the challenges faced by language agents in planning tasks. The authors explore the reasons behind the shortcomings of these agents, particularly their limited understanding of constraints and their diminishing ability to focus on goals as the planning horizon lengthens. They investigate two common strategies for improving planning performance: episodic memory updating and parametric memory updating. The study concludes that these strategies, while offering some improvements, primarily function as shortcut learning mechanisms, falling short of achieving human-level planning abilities.
📎 https://arxiv.org/abs/2410.12409
21 OCT. 2024 · 🔀 Intelligence at the Edge of Chaos
This research investigates how intelligent behavior emerges in artificial systems by studying the connection between the complexity of rule-based systems and the abilities of models trained to predict these rules. The researchers used elementary cellular automata (ECA), simple one-dimensional systems with varying complexity, to train large language models (LLMs). Their results show that models trained on more complex ECAs demonstrate greater intelligence, excelling in reasoning and chess move prediction tasks. A key finding is the importance of training at a "sweet spot" of complexity—known as the "edge of chaos"—where systems are structured yet difficult to predict, fostering intelligent behavior. Additionally, models trained on complex rules develop sophisticated solutions by incorporating information from previous states, which improves their ability to generalize and perform well on various tasks.
📎 https://arxiv.org/abs/2410.02536v2
20 OCT. 2024 · 🗓 Inference Scaling for Long-Context Retrieval Augmented Generation
This research paper explores the effectiveness of inference scaling for retrieval augmented generation (RAG), a technique that enhances large language models (LLMs) by incorporating external knowledge. The authors introduce two strategies, demonstration-based RAG (DRAG) and iterative demonstration-based RAG (IterDRAG), for effectively scaling inference computation. They demonstrate that increasing inference computation, when optimally allocated, leads to nearly linear gains in RAG performance. Furthermore, they develop a computation allocation model to predict the optimal test-time compute allocation for various tasks and scenarios, showcasing its effectiveness in achieving performance gains and aligning with experimental results.
📎 https://arxiv.org/abs/2410.04343
19 OCT. 2024 · 🤝 Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence
This paper presents a new method called MODEL SWARMS, a collaborative search algorithm for adapting large language models (LLMs) using swarm intelligence. The researchers propose viewing each LLM expert as a "particle" in a swarm and use particle swarm optimization (PSO) to collaboratively search the weight space for optimized models. This approach allows LLMs to adapt to a variety of objectives, including single tasks, multi-task domains, reward models, and human interests, without requiring large amounts of training data. Extensive experiments demonstrate that MODEL SWARMS outperforms existing model composition baselines and enables the discovery of previously unseen capabilities in LLMs.
📎 https://arxiv.org/abs/2410.11163
18 OCT. 2024 · 🤖 Agent-as-a-Judge: Evaluate Agents with Agents
The paper detail a new framework for evaluating agentic systems called Agent-as-a-Judge, which uses other agentic systems to assess their performance. To test this framework, the authors created DevAI, a benchmark dataset consisting of 55 realistic automated AI development tasks. They compared Agent-as-a-Judge to LLM-as-a-Judge and Human-as-a-Judge on DevAI, finding that Agent-as-a-Judge outperforms both, aligning closely with human evaluations. The authors also discuss the benefits of Agent-as-a-Judge for providing intermediate feedback and creating a flywheel effect, where both the judge and evaluated agents improve through an iterative process.
📎 https://arxiv.org/abs/2410.10934v1
🤗 https://huggingface.co/DEVAI-benchmark
18 OCT. 2024 · ⚖️ First-Person Fairness in Chatbots
This paper from OpenAI examines potential bias in chatbot systems like ChatGPT, specifically focusing on how a user's name, which can be associated with demographic attributes, influences the chatbot's responses. The authors propose a privacy-preserving method to measure user name bias across a large dataset of real-world chatbot interactions. They identify several instances of bias, demonstrating that chatbot responses can show a tendency towards creating protagonists whose gender matches the user's likely gender and that users with female-associated names receive responses with friendlier and simpler language more often. The study also finds that post-training interventions like reinforcement learning can significantly mitigate harmful stereotypes.
📎 https://cdn.openai.com/papers/first-person-fairness-in-chatbots.pdf
🌐 https://openai.com/index/evaluating-fairness-in-chatgpt/
18 OCT. 2024 · 🤔 Thinking LLMs: General Instruction Following with Thought Generation
This research paper explores the concept of "Thinking LLMs," or large language models that can generate internal thoughts before responding to user prompts. The authors propose a training method called Thought Preference Optimization (TPO) which uses an iterative process to encourage LLMs to develop thinking abilities. TPO leverages an existing judge model that evaluates responses, implicitly guiding the model to improve its thoughts based on the quality of the resulting responses. The study demonstrates that Thinking LLMs can outperform standard LLMs on various general instruction-following tasks, including those not typically associated with reasoning, such as marketing and health. The research highlights the potential for Thinking LLMs to expand the capabilities of these models beyond traditional reasoning and problem-solving domains.
📎 https://arxiv.org/abs/2410.10630
18 OCT. 2024 · 🔋 Addition is All You Need for Energy-efficient Language Models
This research paper introduces a novel algorithm called Linear-Complexity Multiplication (L-Mul) that aims to make language models more energy-efficient. L-Mul replaces computationally expensive floating-point multiplications with integer addition operations, significantly reducing energy consumption. The authors demonstrate that L-Mul achieves high precision, even surpassing 8-bit floating-point multiplications in certain cases. They evaluate L-Mul on various benchmarks, including natural language, vision, and mathematics tasks, showing that L-Mul can be effectively implemented in attention mechanisms without compromising performance, leading to significant energy savings in model deployment. The authors conclude that L-Mul holds great potential for creating more energy-efficient and cost-effective AI systems.
📎 https://arxiv.org/abs/2410.00907
18 OCT. 2024 · 🤖 MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
The paper introduces MLE-bench, a benchmark designed to evaluate AI agents' ability to perform machine learning engineering tasks. The benchmark comprises 75 Kaggle competitions, each requiring agents to solve real-world problems involving data preparation, model training, and code debugging. Researchers evaluated several cutting-edge language models on MLE-bench, with the best-performing setup achieving at least a bronze medal in 16.9% of the competitions. The paper investigates various factors influencing performance, such as resource scaling and contamination from pre-training, and concludes that while current agents demonstrate promising capabilities, significant challenges remain.
📎 https://arxiv.org/abs/2410.07095
18 OCT. 2024 · 📈 Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG
This paper explores the challenges and opportunities of using long-context language models (LLMs) in retrieval-augmented generation (RAG) systems. While increasing the number of retrieved passages initially improves performance, the authors find that it eventually degrades due to the introduction of irrelevant information, or "hard negatives." To address this, the paper proposes three methods for enhancing the robustness of RAG with long-context LLMs: retrieval reordering, RAG-specific implicit LLM fine-tuning, and RAG-oriented LLM fine-tuning with intermediate reasoning. The paper also investigates the impact of various factors related to data distribution, retriever selection, and training context length on the effectiveness of RAG-specific tuning.
📎 https://arxiv.org/abs/2410.05983
Daily podcast about the published articles in the LLM field.
Información
Autor | Shahriar Shariati |
Organización | Shahriar Shariati |
Categorías | Tecnología , Matemáticas , Noticias tecnológicas |
Página web | - |
shahriarshm81@gmail.com |
Copyright 2024 - Spreaker Inc. an iHeartMedia Company