Abstract
We present AR-Net, an efficient semantic segmentation pipeline for unstructured terrains. For applications such as autonomous navigation, it is essential to accurately and efficiently understand the unstructured scenes in outdoor and urban environments. Given RGB images as inputs, the AR-Net uses an encoder backbone to extract multi-scale features and a novel Attention-Regulation layer as part of the decoder to predict the pixel-level segmentation results for unstructured terrains. Our AR-Net model achieved superior segmentation performance and fast inference on two real-world outdoor terrain datasets. We also provide detailed ablation studies and analyses on model parameter selections.