Localized assistive scene understanding using deep learning and the IoT
MetadataShow full item record
In this paper, we propose a system for localized scene understanding to assist sufferers of visual disabilities. Our system determines the user's indoor location using WiFi fingerprinting and synthesizes a real-time description of the surrounding environment. The description is synthesized from prior information about the environment, real-time information obtained from object detection and localization using deep learning, and sensory information collected from IoT sensors. Our system can be activated automatically or on-demand as configured. On-demand activation happens by issuing a voice command to the environment's smart speaker or the user's mobile phone. Alternatively, our system can be set to activate automatically when detecting a change in the user's environment. When triggered, our system initiates a capture of an image using an environment-attached stationary camera and offloads the image to our server for identifying objects and their approximate locations in the environment. The server uses deep learning to localize persons, pets, or furniture. We then use a prior mapping of the environment to change the detected image-domain pixel coordinates to real-world relative locations. We also collect using IoT sensors information about the environment's temperature, humidity level, and light intensity. With the information available, we fuse a fixed description of the environment's permanent features with a dynamic description from the localized objects and sensors data. We finally use text-to-speech to change the textual description into an audio signal played on the user's Bluetooth headset or the environment's smart speaker. Our results show that our system can be an effective tool in helping the visually-challenged navigate unknown environments using increasingly available smart home technologies.