代码拉取完成,页面将自动刷新
同步操作将从 MindSpore/mindformers 强制同步,此操作会覆盖自 Fork 仓库以来所做的任何修改,且无法恢复!!!
确定后同步将在后台操作,完成时将刷新页面,请耐心等待。
文本分类:模型在基于文本对的微调后,可以在给定任意文本对与候选标签列表的情况下,完成对文本对关系的分类,文本对的两个文本之间以-分割。
相关论文 Jacob Devlin, Ming-Wei Chang, et al., BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, 2019.
model | type | datasets | Top1-accuracy | stage | example |
---|---|---|---|---|---|
bert | txtcls_bert_base_uncased | Mnli | 30.9% | pretrain | -- |
txtcls | txtcls_bert_case_uncased_mnli | Mnli | 84.8% | finetune eval predict |
link link link |
数据集目录格式
└─mnli
├─dev
├─test
└─train
TFRecord
格式文件的生成。需开发者提前clone工程。
请参考使用脚本启动
脚本运行测试
# finetune
python run_mindformer.py --config ./configs/txtcls/run_txtcls_bert_base_uncased.yaml --run_mode finetune --load_checkpoint txtcls_bert_base_uncased
# evaluate
python run_mindformer.py --config ./configs/txtcls/run_txtcls_bert_base_uncased.yaml --run_mode eval --load_checkpoint txtcls_bert_base_uncased_mnli
# predict
python run_mindformer.py --config ./configs/txtcls/run_txtcls_bert_base_uncased.yaml --run_mode predict --load_checkpoint txtcls_bert_base_uncased_mnli --predict_data [TEXT]
import mindspore; mindspore.set_context(mode=0, device_id=0)
from mindformers import MindFormerBook
from mindformers.trainer import Trainer
# 显示Trainer的模型支持列表
MindFormerBook.show_trainer_support_model_list("text_classification")
# INFO - Trainer support model list for txt_classification task is:
# INFO - ['txtcls_bert_base_uncased']
# INFO - -------------------------------------
# 初始化trainer
trainer = Trainer(task='text_classification',
model='txtcls_bert_base_uncased',
train_dataset='./mnli/train',
eval_dataset='./mnli/eval')
# 测试数据,该input_data有两个测试案例,即两个文本对,单个文本对的两个文本之间用-分割
input_data = ["The new rights are nice enough-Everyone really likes the newest benefits ",
"i don't know um do you do a lot of camping-I know exactly."]
#方式1:使用现有的预训练权重进行finetune, 并使用finetune获得的权重进行eval和推理
trainer.train(resume_or_finetune_from_checkpoint="txtcls_bert_base_uncased",
do_finetune=True)
trainer.evaluate(eval_checkpoint=True)
trainer.predict(predict_checkpoint=True, input_data=input_data, top_k=1)
# 方式2: 从obs下载训练好的权重并进行eval和推理
trainer.evaluate()
# INFO - Top1 Accuracy=84.8%
trainer.predict(input_data=input_data, top_k=1)
# INFO - output result is [[{'label': 'neutral', 'score': 0.9714198708534241}],
# [{'label': 'contradiction', 'score': 0.9967639446258545}]]
import mindspore; mindspore.set_context(mode=0, device_id=0)
from mindformers.pipeline import TextClassificationPipeline
from mindformers import AutoTokenizer, BertForMultipleChoice, AutoConfig
input_data = ["The new rights are nice enough-Everyone really likes the newest benefits ",
"i don't know um do you do a lot of camping-I know exactly."]
tokenizer = AutoTokenizer.from_pretrained('txtcls_bert_base_uncased_mnli')
txtcls_mnli_config = AutoConfig.from_pretrained('txtcls_bert_base_uncased_mnli')
# Because batch_size parameter is required when bert model is created, and pipeline
# function deals with samples one by one, the batch_size parameter is seted one.
txtcls_mnli_config.batch_size = 1
model = BertForMultipleChoice(txtcls_mnli_config)
txtcls_pipeline = TextClassificationPipeline(task='text_classification',
model=model,
tokenizer=tokenizer,
max_length=model.config.seq_length,
padding="max_length")
results = txtcls_pipeline(input_data, top_k=1)
print(results)
# 输出
# [[{'label': 'neutral', 'score': 0.9714198708534241}], [{'label': 'contradiction', 'score': 0.9967639446258545}]]
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。