In this paper we present novel methods for automatically annotating images with relationship and position tags that are derived using mask and bounding box data. A Mask Region-based Convolutional Neural Network (Mask R-CNN) is used as the foundation for the ob- ject detection process. The relationships are found by manipulating the bounding box and mask segmentation outputs of a Mask R-CNN. The absolute positions, the positions of the objects relative to the image, and the relative positions, the positions of objects relative to the other objects, are then associated with the images as annotations that are out- put in order to assist with the retrieval of those images with keyword searches. Programs were developed in python to perform the image anal- ysis and as well as streamline the manual annotation of the images for testing. Image annotations that specify the relative location of objects to each other in an image would empower more nuanced searches allowing for finer filtering of images that contain common objects. Because the masks are manipulated as boolean matrices, the processing is fast. Also, this approach is model agnostic being able to work with any model that outputs a boolean mask and bounding box data.
Villanueva, Jaime M. Jr; Subramanian, Anantharam; Ahir, Vishal; and Pollock, Andrew
"Mapping Relationships and Positions of Objects in Images Using Mask and Bounding Box Data,"
SMU Data Science Review: Vol. 2:
3, Article 11.
Available at: https://scholar.smu.edu/datasciencereview/vol2/iss3/11
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License