1 Star 0 Fork 2

Meng Wei / FunASR

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
MIT

(简体中文|English)

FunASR: A Fundamental End-to-End Speech Recognition Toolkit

FunASR希望在语音识别的学术研究和工业应用之间架起一座桥梁。通过支持在ModelScope上发布的工业级语音识别模型的训练和微调,研究人员和开发人员可以更方便地进行语音识别模型的研究和生产,并推动语音识别生态的发展。让语音识别更有趣!

核心功能

  • FunASR是一个基础语音识别工具包,提供多种功能,包括语音识别(ASR)、语音端点检测(VAD)、标点恢复、语言模型、说话人验证、说话人分离和多人对话语音识别等。FunASR提供了便捷的脚本和教程,支持预训练好的模型的推理与微调。
  • 我们在ModelScope上发布了大量开源数据集或者海量工业数据训练的模型,可以通过我们的模型仓库了解模型的详细信息。代表性的Paraformer非自回归端到端语音识别模型具有高精度、高效率、便捷部署的优点,支持快速构建语音识别服务,详细信息可以阅读(服务部署文档)。

最新动态

  • 2023/10/13: SlideSpeech: 一个大规模的多模态音视频语料库,主要是在线会议或者在线课程场景,包含了大量与发言人讲话实时同步的幻灯片。
  • 2023.10.10: Paraformer-long-Spk模型发布,支持在长语音识别的基础上获取每句话的说话人标签。
  • 2023.10.07: FunCodec: FunCodec提供开源模型和训练工具,可以用于音频离散编码,以及基于离散编码的语音识别、语音合成等任务。
  • 2023.09.01: 中文离线文件转写服务2.0 CPU版本发布,新增ffmpeg、时间戳与热词模型支持,详细信息参阅(一键部署文档)
  • 2023.08.07: 中文实时语音听写服务一键部署的CPU版本发布,详细信息参阅(一键部署文档)
  • 2023.07.17: BAT一种低延迟低内存消耗的RNN-T模型发布,详细信息参阅(BAT
  • 2023.07.03: 中文离线文件转写服务一键部署的CPU版本发布,详细信息参阅(一键部署文档)
  • 2023.06.26: ASRU2023 多通道多方会议转录挑战赛2.0完成竞赛结果公布,详细信息参阅(M2MeT2.0

安装教程

FunASR安装教程请阅读(Installation

服务部署

FunASR支持预训练或者进一步微调的模型进行服务部署。目前中文离线文件转写服务一键部署的CPU版本已经发布,详细信息参阅(一键部署文档。更多服务部署详细信息可以参阅(服务部署文档)。

快速开始

快速使用教程(新人文档

FunASR支持数万小时工业数据训练的模型的推理和微调,详细信息可以参阅(modelscope_egs);也支持学术标准数据集模型的训练和微调,详细信息可以参阅(egs)。 模型包含语音识别(ASR)、语音活动检测(VAD)、标点恢复、语言模型、说话人验证、说话人分离和多人对话语音识别等,详细模型列表可以参阅模型仓库

联系我们

如果您在使用中遇到问题,可以直接在github页面提Issues。欢迎语音兴趣爱好者扫描以下的钉钉群或者微信群二维码加入社区群,进行交流和讨论。

钉钉群 微信

社区贡献者

贡献者名单请参考(致谢名单

许可协议

项目遵循The MIT License开源协议,模型许可协议请参考(模型协议

论文引用

@inproceedings{gao2023funasr,
  author={Zhifu Gao and Zerui Li and Jiaming Wang and Haoneng Luo and Xian Shi and Mengzhe Chen and Yabin Li and Lingyun Zuo and Zhihao Du and Zhangyu Xiao and Shiliang Zhang},
  title={FunASR: A Fundamental End-to-End Speech Recognition Toolkit},
  year={2023},
  booktitle={INTERSPEECH},
}
@inproceedings{An2023bat,
  author={Keyu An and Xian Shi and Shiliang Zhang},
  title={BAT: Boundary aware transducer for memory-efficient and low-latency ASR},
  year={2023},
  booktitle={INTERSPEECH},
}
@inproceedings{gao22b_interspeech,
  author={Zhifu Gao and ShiLiang Zhang and Ian McLoughlin and Zhijie Yan},
  title={{Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition}},
  year=2022,
  booktitle={Proc. Interspeech 2022},
  pages={2063--2067},
  doi={10.21437/Interspeech.2022-9996}
}
[//]: # (<div align="left"><img src="docs/images/funasr_logo.jpg" width="400"/></div>) ([简体中文](./README_zh.md)|English) # FunASR: A Fundamental End-to-End Speech Recognition Toolkit <p align="left"> <a href=""><img src="https://img.shields.io/badge/OS-Linux%2C%20Win%2C%20Mac-brightgreen.svg"></a> <a href=""><img src="https://img.shields.io/badge/Python->=3.7,<=3.10-aff.svg"></a> <a href=""><img src="https://img.shields.io/badge/Pytorch-%3E%3D1.11-blue"></a> </p> <strong>FunASR</strong> hopes to build a bridge between academic research and industrial applications on speech recognition. By supporting the training & finetuning of the industrial-grade speech recognition model released on [ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition), researchers and developers can conduct research and production of speech recognition models more conveniently, and promote the development of speech recognition ecology. ASR for Fun! [**Highlights**](#highlights) | [**News**](https://github.com/alibaba-damo-academy/FunASR#whats-new) | [**Installation**](#installation) | [**Quick Start**](#quick-start) | [**Runtime**](./funasr/runtime/readme.md) | [**Model Zoo**](./docs/model_zoo/modelscope_models.md) | [**Contact**](#contact) <a name="highlights"></a> ## Highlights - FunASR is a fundamental speech recognition toolkit that offers a variety of features, including speech recognition (ASR), Voice Activity Detection (VAD), Punctuation Restoration, Language Models, Speaker Verification, Speaker Diarization and multi-talker ASR. FunASR provides convenient scripts and tutorials, supporting inference and fine-tuning of pre-trained models. - We have released a vast collection of academic and industrial pretrained models on the [ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition), which can be accessed through our [Model Zoo](https://github.com/alibaba-damo-academy/FunASR/blob/main/docs/model_zoo/modelscope_models.md). The representative [Paraformer-large](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary), a non-autoregressive end-to-end speech recognition model, has the advantages of high accuracy, high efficiency, and convenient deployment, supporting the rapid construction of speech recognition services. For more details on service deployment, please refer to the [service deployment document](funasr/runtime/readme_cn.md). <a name="whats-new"></a> ## What's new: - 2023/10/13: [SlideSpeech](https://slidespeech.github.io/): A large scale multi-modal audio-visual corpus with a significant amount of real-time synchronized slides. - 2023/10/10: The ASR-SpeakersDiarization combined pipeline [speech_campplus_speaker-diarization_common](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/asr_vad_spk/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/demo.py) is now released. Experience the model to get recognition results with speaker information. - 2023/10/07: [FunCodec](https://github.com/alibaba-damo-academy/FunCodec): A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec. - 2023/09/01: The offline file transcription service 2.0 (CPU) of Mandarin has been released, with added support for ffmpeg, timestamp, and hotword models. For more details, please refer to ([Deployment documentation](funasr/runtime/docs/SDK_tutorial.md)). - 2023/08/07: The real-time transcription service (CPU) of Mandarin has been released. For more details, please refer to ([Deployment documentation](funasr/runtime/docs/SDK_tutorial_online.md)). - 2023/07/17: BAT is released, which is a low-latency and low-memory-consumption RNN-T model. For more details, please refer to ([BAT](egs/aishell/bat)). - 2023/07/03: The offline file transcription service (CPU) of Mandarin has been released. For more details, please refer to ([Deployment documentation](funasr/runtime/docs/SDK_tutorial.md)). - 2023/06/26: ASRU2023 Multi-Channel Multi-Party Meeting Transcription Challenge 2.0 completed the competition and announced the results. For more details, please refer to ([M2MeT2.0](https://alibaba-damo-academy.github.io/FunASR/m2met2/index.html)). <a name="Installation"></a> ## Installation Please ref to [installation docs](https://alibaba-damo-academy.github.io/FunASR/en/installation/installation.html) ## Deployment Service FunASR supports pre-trained or further fine-tuned models for deployment as a service. The CPU version of the Chinese offline file conversion service has been released, details can be found in [docs](funasr/runtime/docs/SDK_tutorial.md). More detailed information about service deployment can be found in the [deployment roadmap](funasr/runtime/readme_cn.md). <a name="quick-start"></a> ## Quick Start Quick start for new users([tutorial](https://alibaba-damo-academy.github.io/FunASR/en/funasr/quick_start.html)) FunASR supports inference and fine-tuning of models trained on industrial datasets of tens of thousands of hours. For more details, please refer to ([modelscope_egs](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_pipeline/quick_start.html)). It also supports training and fine-tuning of models on academic standard datasets. For more details, please refer to([egs](https://alibaba-damo-academy.github.io/FunASR/en/academic_recipe/asr_recipe.html)). The models include speech recognition (ASR), speech activity detection (VAD), punctuation recovery, language model, speaker verification, speaker separation, and multi-party conversation speech recognition. For a detailed list of models, please refer to the [Model Zoo](https://github.com/alibaba-damo-academy/FunASR/blob/main/docs/model_zoo/modelscope_models.md): <a name="Community Communication"></a> ## Community Communication If you encounter problems in use, you can directly raise Issues on the github page. You can also scan the following DingTalk group or WeChat group QR code to join the community group for communication and discussion. |DingTalk group | WeChat group | |:---:|:-----------------------------------------------------:| |<div align="left"><img src="docs/images/dingding.jpg" width="250"/> | <img src="docs/images/wechat.png" width="215"/></div> | ## Contributors | <div align="left"><img src="docs/images/damo.png" width="180"/> | <div align="left"><img src="docs/images/nwpu.png" width="260"/> | <img src="docs/images/China_Telecom.png" width="200"/> </div> | <img src="docs/images/RapidAI.png" width="200"/> </div> | <img src="docs/images/aihealthx.png" width="200"/> </div> | <img src="docs/images/XVERSE.png" width="250"/> </div> | |:---------------------------------------------------------------:|:---------------------------------------------------------------:|:--------------------------------------------------------------:|:-------------------------------------------------------:|:-----------------------------------------------------------:|:------------------------------------------------------:| The contributors can be found in [contributors list](./Acknowledge.md) ## License This project is licensed under the [The MIT License](https://opensource.org/licenses/MIT). FunASR also contains various third-party components and some code modified from other repos under other open source licenses. The use of pretraining model is subject to [model license](./MODEL_LICENSE) ## Citations ``` bibtex @inproceedings{gao2023funasr, author={Zhifu Gao and Zerui Li and Jiaming Wang and Haoneng Luo and Xian Shi and Mengzhe Chen and Yabin Li and Lingyun Zuo and Zhihao Du and Zhangyu Xiao and Shiliang Zhang}, title={FunASR: A Fundamental End-to-End Speech Recognition Toolkit}, year={2023}, booktitle={INTERSPEECH}, } @inproceedings{An2023bat, author={Keyu An and Xian Shi and Shiliang Zhang}, title={BAT: Boundary aware transducer for memory-efficient and low-latency ASR}, year={2023}, booktitle={INTERSPEECH}, } @inproceedings{wang2023told, author={Jiaming Wang and Zhihao Du and Shiliang Zhang}, title={{TOLD:} {A} Novel Two-Stage Overlap-Aware Framework for Speaker Diarization}, year={2023}, booktitle={ICASSP}, } @inproceedings{gao22b_interspeech, author={Zhifu Gao and ShiLiang Zhang and Ian McLoughlin and Zhijie Yan}, title={{Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition}}, year=2022, booktitle={Proc. Interspeech 2022}, pages={2063--2067}, doi={10.21437/Interspeech.2022-9996} } ```

简介

personal mirror of FunASR https://github.com/alibaba-damo-academy/FunASR 展开 收起
MIT
取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
1
https://gitee.com/wmeng223/FunASR.git
git@gitee.com:wmeng223/FunASR.git
wmeng223
FunASR
FunASR
main

搜索帮助