API-Bank: A Benchmark for Tool-Augmented LLMs

原创

qq5f2bca2ab6e17 2023-08-19 00:04:00 ©著作权

文章标签 API Augmented lua 文章分类 网络安全

©著作权归作者所有：来自51CTO博客作者qq5f2bca2ab6e17的原创作品，请联系作者获取转载授权，否则将追究法律责任

Recent research has shown that Large Language Models (LLMs) can utilize external

tools to improve their contextual processing

abilities, moving away from the pure language

modeling paradigm and paving the way for

Artificial General Intelligence. Despite this,

there has been a lack of systematic evaluation to demonstrate the efficacy of LLMs using

tools to respond to human instructions. This

paper presents API-Bank, the first benchmark

tailored for Tool-Augmented LLMs. APIBank includes 53 commonly used API tools,

a complete Tool-Augmented LLM workflow,

and 264 annotated dialogues that encompass

a total of 568 API calls. These resources have

been designed to thoroughly evaluate LLMs’

ability to plan step-by-step API calls, retrieve

relevant APIs, and correctly execute API calls

to meet human needs. The experimental results show that GPT-3.5 emerges the ability to

use the tools relative to GPT3, while GPT-4

has stronger planning performance. Nevertheless, there remains considerable scope for further improvement when compared to human

performance. Additionally, detailed error analysis and case studies demonstrate the feasibility of Tool-Augmented LLMs for daily use, as

well as the primary challenges that future research needs to address

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯