1 Star 1 Fork 2

OpenCompass / opencompass

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
Apache-2.0

👋 加入我们的 Discord微信社区

📣 2023 年度榜单计划

我们有幸与社区共同见证了通用人工智能在过去一年里的巨大进展,也非常高兴OpenCompass能够帮助广大大模型开发者和使用者。

我们宣布将启动OpenCompass 2023年度大模型榜单发布计划。我们预计将于2024年1月发布大模型年度榜单,系统性评估大模型在语言、知识、推理、创作、长文本和智能体等多个能力维度的表现。

届时,我们将发布开源模型和商业API模型能力榜单,以期为业界提供一份全面、客观、中立的参考。

我们诚挚邀请各类大模型接入OpenCompass评测体系,以展示其在各个领域的性能优势。同时,也欢迎广大研究者、开发者向我们提供宝贵的意见和建议,共同推动大模型领域的发展。如有任何问题或需求,请随时联系我们。此外,相关评测内容,性能数据,评测方法也将随榜单发布一并开源。

我们提供了本次评测所使用的部分题目示例,详情请见CompassBench 2023.

让我们共同期待OpenCompass 2023年度大模型榜单的发布,期待各大模型在榜单上的精彩表现!

🧭 欢迎

来到OpenCompass

就像指南针在我们的旅程中为我们导航一样,我们希望OpenCompass能够帮助你穿越评估大型语言模型的重重迷雾。OpenCompass提供丰富的算法和功能支持,期待OpenCompass能够帮助社区更便捷地对NLP模型的性能进行公平全面的评估。

🚩🚩🚩 欢迎加入 OpenCompass!我们目前招聘全职研究人员/工程师和实习生。如果您对 LLM 和 OpenCompass 充满热情,请随时通过电子邮件与我们联系。我们非常期待与您交流!

🔥🔥🔥 祝贺 OpenCompass 作为大模型标准测试工具被Meta AI官方推荐, 点击 Llama 的 入门文档 获取更多信息.

注意
我们正式启动 OpenCompass 共建计划,诚邀社区用户为 OpenCompass 提供更具代表性和可信度的客观评测数据集! 点击 Issue 获取更多数据集. 让我们携手共进,打造功能强大易用的大模型评测平台!

🚀 最新进展

  • [2024.01.17] 我们支持了 InternLM2InternLM2-Chat 的相关评测,InternLM2 在这些测试中表现出非常强劲的性能,欢迎试用!🔥🔥🔥.
  • [2024.01.17] 我们支持了多根针版本的大海捞针测试,更多信息见这里🔥🔥🔥.
  • [2023.12.28] 我们支持了对使用LLaMA2-Accessory(一款强大的LLM开发工具箱)开发的所有模型的无缝评估! 🔥🔥🔥.
  • [2023.12.22] 我们开源了T-Eval用于评测大语言模型工具调用能力。欢迎访问T-Eval的官方Leaderboard获取更多信息! 🔥🔥🔥.
  • [2023.12.10] 我们开源了多模评测框架 VLMEvalKit,目前已支持 20+ 个多模态大模型与包括 MMBench 系列在内的 7 个多模态评测集.
  • [2023.12.10] 我们已经支持了Mistral AI的MoE模型 Mixtral-8x7B-32K。欢迎查阅MixtralKit以获取更多关于推理和评测的详细信息.

更多

✨ 介绍

image

OpenCompass 是面向大模型评测的一站式平台。其主要特点如下:

  • 开源可复现:提供公平、公开、可复现的大模型评测方案

  • 全面的能力维度:五大维度设计,提供 70+ 个数据集约 40 万题的的模型评测方案,全面评估模型能力

  • 丰富的模型支持:已支持 20+ HuggingFace 及 API 模型

  • 分布式高效评测:一行命令实现任务分割和分布式评测,数小时即可完成千亿模型全量评测

  • 多样化评测范式:支持零样本、小样本及思维链评测,结合标准型或对话型提示词模板,轻松激发各种模型最大性能

  • 灵活化拓展:想增加新模型或数据集?想要自定义更高级的任务分割策略,甚至接入新的集群管理系统?OpenCompass 的一切均可轻松扩展!

📊 性能榜单

我们将陆续提供开源模型和API模型的具体性能榜单,请见 OpenCompass Leaderboard 。如需加入评测,请提供模型仓库地址或标准的 API 接口至邮箱 opencompass@pjlab.org.cn.

🔝返回顶部

🛠️ 安装

下面展示了快速安装以及准备数据集的步骤。

💻 环境配置

面向开源模型的GPU环境

conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
conda activate opencompass
git clone https://github.com/open-compass/opencompass opencompass
cd opencompass
pip install -e .

面向API模型测试的CPU环境

conda create -n opencompass python=3.10 pytorch torchvision torchaudio cpuonly -c pytorch -y
conda activate opencompass
git clone https://github.com/open-compass/opencompass opencompass
cd opencompass
pip install -e .
# 如果需要使用各个API模型,请 `pip install -r requirements/api.txt` 安装API模型的相关依赖

📂 数据准备

# 下载数据集到 data/ 处
wget https://github.com/open-compass/opencompass/releases/download/0.1.8.rc1/OpenCompassData-core-20231110.zip
unzip OpenCompassData-core-20231110.zip

有部分第三方功能,如 Humaneval 以及 Llama,可能需要额外步骤才能正常运行,详细步骤请参考安装指南

🔝返回顶部

🏗️ ️评测

确保按照上述步骤正确安装 OpenCompass 并准备好数据集后,可以通过以下命令评测 LLaMA-7b 模型在 MMLU 和 C-Eval 数据集上的性能:

python run.py --models hf_llama_7b --datasets mmlu_ppl ceval_ppl

OpenCompass 预定义了许多模型和数据集的配置,你可以通过 工具 列出所有可用的模型和数据集配置。

# 列出所有配置
python tools/list_configs.py
# 列出所有跟 llama 及 mmlu 相关的配置
python tools/list_configs.py llama mmlu

你也可以通过命令行去评测其它 HuggingFace 模型。同样以 LLaMA-7b 为例:

python run.py --datasets ceval_ppl mmlu_ppl \
--hf-path huggyllama/llama-7b \  # HuggingFace 模型地址
--model-kwargs device_map='auto' \  # 构造 model 的参数
--tokenizer-kwargs padding_side='left' truncation='left' use_fast=False \  # 构造 tokenizer 的参数
--max-out-len 100 \  # 最长生成 token 数
--max-seq-len 2048 \  # 模型能接受的最大序列长度
--batch-size 8 \  # 批次大小
--no-batch-padding \  # 不打开 batch padding,通过 for loop 推理,避免精度损失
--num-gpus 1  # 运行该模型所需的最少 gpu 数

注意
若需要运行上述命令,你需要删除所有从 # 开始的注释。

通过命令行或配置文件,OpenCompass 还支持评测 API 或自定义模型,以及更多样化的评测策略。请阅读快速开始了解如何运行一个评测任务。

更多教程请查看我们的文档

🔝返回顶部

📖 数据集支持

语言 知识 推理 考试
字词释义
  • WiC
  • SummEdits
成语习语
  • CHID
语义相似度
  • AFQMC
  • BUSTM
指代消解
  • CLUEWSC
  • WSC
  • WinoGrande
翻译
  • Flores
  • IWSLT2017
多语种问答
  • TyDi-QA
  • XCOPA
多语种总结
  • XLSum
知识问答
  • BoolQ
  • CommonSenseQA
  • NaturalQuestions
  • TriviaQA
文本蕴含
  • CMNLI
  • OCNLI
  • OCNLI_FC
  • AX-b
  • AX-g
  • CB
  • RTE
  • ANLI
常识推理
  • StoryCloze
  • COPA
  • ReCoRD
  • HellaSwag
  • PIQA
  • SIQA
数学推理
  • MATH
  • GSM8K
定理应用
  • TheoremQA
  • StrategyQA
  • SciBench
综合推理
  • BBH
初中/高中/大学/职业考试
  • C-Eval
  • AGIEval
  • MMLU
  • GAOKAO-Bench
  • CMMLU
  • ARC
  • Xiezhi
医学考试
  • CMB
理解 长文本 安全 代码
阅读理解
  • C3
  • CMRC
  • DRCD
  • MultiRC
  • RACE
  • DROP
  • OpenBookQA
  • SQuAD2.0
内容总结
  • CSL
  • LCSTS
  • XSum
  • SummScreen
内容分析
  • EPRSTMT
  • LAMBADA
  • TNEWS
长文本理解
  • LEval
  • LongBench
  • GovReports
  • NarrativeQA
  • Qasper
安全
  • CivilComments
  • CrowsPairs
  • CValues
  • JigsawMultilingual
  • TruthfulQA
健壮性
  • AdvGLUE
代码
  • HumanEval
  • HumanEvalX
  • MBPP
  • APPs
  • DS1000

🔝返回顶部

📖 模型支持

开源模型 API 模型
  • OpenAI
  • Claude
  • ZhipuAI(ChatGLM)
  • Baichuan
  • ByteDance(YunQue)
  • Huawei(PanGu)
  • 360
  • Baidu(ERNIEBot)
  • MiniMax(ABAB-Chat)
  • SenseTime(nova)
  • Xunfei(Spark)
  • ……

🔝返回顶部

🔜 路线图

  • 主观评测
    • 发布主观评测榜单
    • 发布主观评测数据集
  • 长文本
    • 支持广泛的长文本评测集
    • 发布长文本评测榜单
  • 代码能力
    • 发布代码能力评测榜单
    • 提供非Python语言的评测服务
  • 智能体
    • 支持丰富的智能体方案
    • 提供智能体评测榜单
  • 鲁棒性
    • 支持各类攻击方法

👷‍♂️ 贡献

我们感谢所有的贡献者为改进和提升 OpenCompass 所作出的努力。请参考贡献指南来了解参与项目贡献的相关指引。

🤝 致谢

该项目部分的代码引用并修改自 OpenICL

该项目部分的数据集和提示词实现修改自 chain-of-thought-hub, instruct-eval

🖊️ 引用

@misc{2023opencompass,
    title={OpenCompass: A Universal Evaluation Platform for Foundation Models},
    author={OpenCompass Contributors},
    howpublished = {\url{https://github.com/open-compass/opencompass}},
    year={2023}
}

🔝返回顶部

Copyright 2020 OpenCompass Authors. All rights reserved. Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright 2020 OpenCompass Authors. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

简介

OpenCompass is an LLM evaluation platform, supporting a wide range of models (LLaMA, LLaMa2, ChatGLM2, ChatGPT, Claude, etc) over 50+ datasets. 展开 收起
Python
Apache-2.0
取消

发行版 (1)

全部

贡献者

全部

近期动态

加载更多
不能加载更多了
Python
1
https://gitee.com/open-compass/opencompass.git
git@gitee.com:open-compass/opencompass.git
open-compass
opencompass
opencompass
main

搜索帮助