Mask2Former adopts the same meta architecture as MaskFormer, with our proposed Transformer decoder replacing the standard one. The key components of our Transformer decoder include a masked attention operator, which extracts localized features by constraining cross-attention to within the foreground region of the predicted mask for each query, instead of attending to the full feature map. To handle small objects, we propose an efficient multi-scale strategy to utilize high-resolution features. It feeds successive feature maps from the pixel decoder’s feature pyramid into successive Transformer decoder layers in a round-robin fashion. Finally, we incorporate optimization improvements that boost model performance without introducing additional computation.
# Install mesa-libGL
yum install mesa-libGL -y
# Install requirements
pip3 install urllib3==1.26.6
pip3 install 'git+https://github.com/facebookresearch/detectron2.git@d779ea63faa54fe42b9b4c280365eaafccb280d6'
pip3 install cityscapesscripts
pip3 install -r requirements.txt
cd mask2former/modeling/pixel_decoder/ops
sh make.sh
cd -
Go to visit Cityscapes official website, then choose 'Download' to download the Cityscapes dataset.
Specify /path/to/cityscapes
to your Cityscapes path in later training process, the unzipped dataset path structure should look like:
cityscapes/
├── gtFine
│ ├── test
│ ├── train
│ │ ├── aachen
│ │ └── bochum
│ └── val
│ ├── frankfurt
│ ├── lindau
│ └── munster
└── leftImg8bit
├── train
│ ├── aachen
│ └── bochum
└── val
├── frankfurt
├── lindau
└── munster
DETECTRON2_DATASETS=/path/to/cityscapes/ python3 train_net.py --num-gpus 8 --config-file configs/cityscapes/semantic-segmentation/maskformer2_R50_bs16_90k.yaml 1> train_mask2former.log 2> train_mask2former_error.log & tail -f train_mask2former.log
GPUs | fps | IoU Score Average | nIoU Score Average |
---|---|---|---|
BI-V100×8 | 11.52 | 0.795 | 0.624 |
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。