Camera sensors in the automotive scenario are exceedingly prone to soiling as they are mounted outside of the car body and directly interact with many environmental sources that primarily cause soiling such as mud, water, dust, sand, snow, etc. For either autonomous driving or advanced driver assistance systems, it is critical to be able to detect soiling on the camera lens and then trigger an automatic cleaning system for robust scene understanding and ensuring safety. Equally important is pedestrian pose estimation, as it is one of the critical tasks for activity recognition in automated driving. However, it becomes extremely challenging to estimate the pose of the occluded pedestrians (e.g., due to soiling) due to the unavailability of any suitable dataset.
Due to the lack of dataset available on pedestrian pose estimation in automotive scenes, we propose to resolve this bottleneck by learning the human poses from a non-automotive distribution and then be able to accurately estimate the pose of the pedestrians. One of the main challenges is that the image quality degrades severely in nighttime driving scenarios and under soiled conditions. This makes several vision-based tasks such as pedestrian detection, pose estimation, etc., immensely challenging. Achieving good performance of these tasks in adverse weather, poor lighting conditions, and under camera soiling, is a significant real-time problem to consider in any level of autonomous driving systems. This can be achieved by making use of multiple complementary sensors and using fusion algorithms to improve reliability. The quality of a thermal camera image remains unaffected by lighting conditions, for example. Here, several important attempts will be made to explore the avenues of efficient fusion of image and thermal (or lidar) data to achieve pedestrian detection. Pose estimation in nighttime using multimodal data is even more challenging as pose annotations of pedestrians in dark scenes have not been addressed in the literature yet.