MLE-bench
Descarga y escucha en cualquier lugar
Descarga tus episodios favoritos y disfrĂştalos, ¡dondequiera que estĂ©s! RegĂstrate o inicia sesiĂłn ahora para acceder a la escucha sin conexiĂłn.
DescripciĂłn
🤖 MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering The paper introduces MLE-bench, a benchmark designed to evaluate AI agents' ability to perform machine learning engineering tasks. The benchmark...
mostra másThe paper introduces MLE-bench, a benchmark designed to evaluate AI agents' ability to perform machine learning engineering tasks. The benchmark comprises 75 Kaggle competitions, each requiring agents to solve real-world problems involving data preparation, model training, and code debugging. Researchers evaluated several cutting-edge language models on MLE-bench, with the best-performing setup achieving at least a bronze medal in 16.9% of the competitions. The paper investigates various factors influencing performance, such as resource scaling and contamination from pre-training, and concludes that while current agents demonstrate promising capabilities, significant challenges remain.
đź“Ž Link to paper
InformaciĂłn
Autor | Shahriar Shariati |
OrganizaciĂłn | Shahriar Shariati |
Página web | - |
Etiquetas |
Copyright 2024 - Spreaker Inc. an iHeartMedia Company
Comentarios