Few-shot Learning for Animal Detection and Segmentation

Camouflaged object detection and segmentation is a new and challenging research topic in computer vision. There is a serious issue of lacking data on concealed objects such as camouflaged animals in natural scenes. In this paper, we address the problem of few-shot learning for camouflaged object detection and segmentation. To this end, we first collect a new dataset, CAMO-FS, for the benchmark. As camouflaged instances are challenging to recognize due to their similarity compared to the surroundings, we guide our models to obtain camouflaged features that highly distinguish the instances from the background. In this work, we propose FS-CDIS, a framework to efficiently detect and segment camouflaged instances via two loss functions contributing to the training process. Firstly, the instance triplet loss with the characteristic of differentiating the anchor, which is the mean of all camouflaged foreground points, and the background points are employed to work at the instance level. Secondly, to consolidate the generalization at the class level, we present instance memory storage with the scope of storing camouflaged features of the same category, allowing the model to capture further class-level information during the learning process. The extensive experiments demonstrated that our proposed method achieves state-of-the-art performance on the newly collected dataset.

Code Dataset

Thanh-Danh Nguyen*, Anh-Khoa Nguyen Vu*, Nhat-Duy Nguyen*, Vinh-Tiep Nguyen, Thanh Duc Ngo, Thanh-Toan Do, Minh-Triet Tran, and Tam V. Nguyen†, “The Art of Camouflage: Few-shot Learning for Animal Detection and Segmentation”, IEEE Access, Jul 2024. IF = 3.4 (SCIE) [DOI, ArXiv]


Conditional Data Synthesis for Scene Understanding

Scene understanding at the instance level is an essential task in computer vision to support modern Advanced Driver Assistance Systems. Solutions have been proposed with abundant annotated training data. However, the annotation at the instance level is high-cost due to huge manual efforts. In this work, we solve this problem by introducing InstSynth, an advanced framework leveraging instance-wise annotations as conditions to enrich the training data. Existing methods focused on semantic segmentation via using prompts to synthesize image-annotation pairs, facing an unrealistic manner. Our proposals utilize the strength of such large generative models to synthesize instance data with prompt-guided and maskbased mechanisms to boost the performance of the instancelevel scene understanding models. We empirically improve the performance of the latest instance segmentation architectures of FastInst and OneFormer by 14.49% and 11.59% AP, respectively, evaluated on the Cityscapes benchmark. Accordingly, we construct an instance-level synthesized dataset, dubbed IS-Cityscapes, with over a 4× larger number of instances in comparison with the vanilla Cityscapes.


Thanh-Danh Nguyen, Bich-Nga Pham, Trong-Tai Dam Vu, Vinh-Tiep Nguyen†, Thanh Duc Ngo, and Tam V. Nguyen. “InstSynth: Instance-wise Prompt-guided Style Masked Conditional Data Synthesis for Scene Understanding.” 2024 International Conference on Multimedia Analysis and Pattern Recognition (MAPR). IEEE, 2024. (Scopus) [DOI]

Label Transfer Scene Parser

Nighttime Scene Understanding

Semantic segmentation plays a crucial role in traffic scene understanding, especially in nighttime condition. This paper tackles the task of semantic segmentation on nighttime scenes. The largest challenge of this task is the lack of annotated nighttime images to train a deep learning-based scene parser. The existing annotated datasets are abundant in daytime condition but scarce in nighttime due to the high cost. Thus, we propose a novel Label Transfer Scene Parser (LTSP) framework for nighttime scene semantic segmentation by leveraging daytime annotation transfer. Our framework performs segmentation in the dark without training on real nighttime annotated data. In particular, we propose translating daytime images to nighttime condition to obtain more data with annotation in an efficient way. In addition, we utilize the pseudo-labels inferred from unlabeled nighttime scenes to further train the scene parser. The novelty of our work is the ability to perform nighttime segmentation via daytime annotated label and nighttime synthetic versions of the same set of images. The extensive experiments demonstrate the improvement and efficiency of our scene parser over the state-of-the-art methods with the similar semi-supervised approach on the benchmark of Nighttime Driving Test dataset. Notably, our proposed method utilizes only one tenth of the amount of labeled and unlabeled data in comparison with the previous methods.


Thanh-Danh Nguyen, Nguyen Phan, Tam V. Nguyen†, Vinh-Tiep Nguyen, and Minh-Triet Tran, “Nighttime Scene Understanding with Label Transfer Scene Parser”, Image and Vision Computing, Sep 2024. [DOI]


Contour Emphasis for Camouflage Instance Segmentation

Understanding camouflage images at instance level is such a challenging task in computer vision. Since the camouflage instances have their colors and textures similar to the background, the key to distinguish them in the images should rely on their contours. The contours seperate the instance from the background, thus recognizing these contours should break their camouflage mechanism. To this end, we address the problem of camouflage instance segmentation via the Contour Emphasis approach. We improve the ability of the segmentation models by enhancing the contours of the camouflaged instances. We propose the CE-OST framework which employs the well-known architecture of Transformer-based models in a one-stage manner to boost the performance of camouflaged instance segmentation. The extensive experiments prove our contributions over the state-of-the-art baselines on different benchmarks, i.e. CAMO++, COD10K and NC4K.


Thanh-Danh Nguyen, Duc-Tuan Luu, Vinh-Tiep Nguyen†, and Thanh Duc Ngo, “CE-OST: Contour Emphasis for One-Stage Transformer-based Camouflage Instance Segmentation.” 2023 International Conference on Multimedia Analysis and Pattern Recognition (MAPR). IEEE, 2023. (Scopus) [DOI]