Step into the generation of synthetic haze and fog in video content using Deep Learning. Let's discover why there is a need for generating artificial natural phenomenon to media content and the challenges it involves, as well as the general overview of creating a hazing tool to solve this need.
The Problem of Haze in Real World Applications
Both haze and fog are a visually captivating natural phenomena that are inoffensive and makes you enjoy a different perspective of your surroundings remembering your preferred mysterious or horror movies… until you are in your car trying to drive and with almost zero visibility other than a few meters.
Haze or fog removal is a critical task for improving the visibility in different scenarios such as automotive, drones, smart cities, etc. Moreover, its area of application can be extended to use cases such as underwater exploration, where the blurring of the water shares similar physical characteristics with the haze or mist.
The first part for developing a reliable haze removal model, as it happens in any model development, is to get a good dataset to train the model. Unfortunately, getting a pair of images with and without haze for training is not a trivial task. Even if you can set a camera perfectly still out of your house and capture images with and without haze, these captures are out of temporal synchronization, which means that factors such as light will differ from time to time. In order to overcome this problem, a hazing tool is necessary for the simulation of haze in real images.
In this blog, we will go over the details of adding haze to an image using Deep Learning models, considering physical factors such as the physical model of haze and the estimation of the depth in color images. For clarity, we will use the term haze and fog indifferently, since the tool tackles them in the same way.
Why a Hazing Tool?
There has been some research on the problem of adding fog or haze to images of clear scenes. This is done to generate datasets that are used to train and evaluate fog/haze removal models as well as augmenting data for other models. Additionally, it is a costly and almost impossible task to capture perfectly aligned real scenes with and without degrading weather conditions such as fog, haze, rain, snow, etc.
Work on images is usually based in the standard optical model [1]:
which is extensively used in the literature. Here, R(x) is the original pixel value at location x, t(x) represents the transmission map using the depth in an exponential function at location x and L is the atmospheric light, or the color and intensity of the haze/fog. I(x) is the final foggy pixel value at location x.
Note that L is a single value, applied to all pixels x. However, for a more realistic representation of fog/haze, an entire array could be used to specify a particular L value per pixel.
Once the depth is obtained, this optical model can easily be used to generate well-paired clean and foggy images for a dataset. This has been done in many studies, and there are several datasets with images like this. However, work on video is limited, such as the REVIDE dataset [2], which uses a haze machine and a robot to generate indoor data.
It is for this reason that a hazing tool comes to fill the gap of easily generate images with haze using clear scenes in video files. This way, it is possible to generate datasets for further model training or image processing.
Making a Hazed Video
In the context of the optical linear haze model, it is necessary to obtain the depth map of the images, which represents the distance from the camera to the surfaces depicted by each pixel. Luckily for us, there are several single-frame, single-camera depth estimators to solve this task. They usually work better for outdoor scenes. However, they are not perfect and can have some issues ensuring temporal consistency.
We tested two options for depth estimation in our tool: sc_depth_pl model [3], which estimates the depth of each single frame (i.e., it does not consider any temporal consistency), and the gcdv model [4], which is designed to process videos, and enforces some level of temporal consistency at the cost of higher computational costs.
In addition to the depth map, we also have to estimate the atmospheric light L. Estimating atmospheric light in an image, with or without haze, still remains an unresolved problem and an active area of research. In general, the direction, color and intensity of atmospheric light in a scene should be estimated.
Most estimation approaches for atmospheric light rely on a DCP (Dark Channel Prior). We choose to provide several options to estimate the atmospheric light, which allows us to vary its color, and improve the scattering of light with the presence of light sources, roughly based on the work in [5]. This provides vast possibilities to experiment and generate different scenarios to populate the training sets of dehazing models.
The following video demonstrates the process, the left segment is the original clear video, the middle segment is the estimated depth, and the right segment is the hazed video.
Comparison clean, depth and foggy. [Video by Hawaii Drone Videos in DroneStock]
Video Hazing Tool
Our tool adds simulated fog, mist and haze. In the real world, the size of water droplets or dust particles, and the properties of the light present in the scene, affect the resulting radiance. The tool is focused on simulating fog. There are several parameters that control visual properties that simulate real physical phenomena involved in the image capturing process.
Initially, the depth map is estimated for each input video frame. This uses either sc_depth_pl or gcvd. The output of this process are depth map files for each frame of the video. We use these depth maps and the corresponding original frames to simulate the fog or haze.
After fog is added to a frame, the resulting resolution may be too low, since for sc_depth_pl the output resolution is 640x384 and for gcvd the output resolution is 384x288. To address this, we offer two options for enhancing the resolution: interpolation of the depth maps using OpenCV, and super-resolution to the end video, by means of the model BasicVSR_PlusPlus [6].
Figure 2 shows in a flowchart the explained steps that are taken by the tool to simulate fog.
The tool allows you to easily specify the input video and additional configuration parameters, as command line arguments. In the next example, we use the sc_depth_pl model to estimate the depth map, and increase the resolution by means of interpolation, for all frames of the video “video_example.mp4“.
~ $ python video_fog_adder.py --input_path video_example.mp4 --output_name test01 --model sc_depth_pl --increase_depth_resolution beta=0.05 |
Left: clean, and right: hazed video. [Video by Mamunur Rashid on pexels.com]
We can also customize the color and intensity of the simulated fog/haze. In the following example, we use a red color weight of 1.2.
~ $ python video_fog_adder.py --input_path video_example.mp4 --output_name test02 --model sc_depth_pl --increase_depth_resolution beta=0.05 r0=1.2 |
Left: no customized color, and right: red weight. [Video by Mamunur Rashid on pexels.com]
With these examples we see that it is possible to get interesting variations for the same input video, simulating not only fog but also haze caused by non-natural sources such as fire or dust.
Final Remarks
Despite this work is still in progress, we at RidgeRun.ai wanted to share a use case that might be of interest for diverse applications and that showcase the usage of deep learning not only for the direct solution but also for the generation of complex data that could be used later for solving domain specific problems. Moreover, in the next release, we will show the insights of haze removal after training a model using a simulated dataset generated by our tool. Stay tuned!
The RidgeRun.ai Team
References:
[1] Koschmieder, H.: Theorie der horizontalen Sichtweite. Beiträge zur Physik der freien Atmosphäre. 1924.
[2] Zhang, X., Dong, H., Pan, J., Zhu, C., Tai, Y., Wang, C., Li, J., Huang, F., & Wang, F. Learning To Restore Hazy Video: A New Real-World Dataset and a New Method. In CVPR (pp. 9239–9248). 2021.
[3] J.-W. Bian et al., “Unsupervised Scale-consistent Depth Learning from Video,” International Journal of Computer Vision (IJCV), 2021.
[4] Y.-C. Lee, K.-W. Tseng, G.-S. Chen, and C.-S. Chen, Globally Consistent Video Depth and Pose Estimation with Efficient Test-Time Training. 2022.
[5] Yang, G., Evans, A.N. Improved single image dehazing methods for resource-constrained platforms. J Real-Time Image Proc 18, 2511–2525. 2021.
[6] K. C. K. Chan, S. Zhou, X. Xu, and C. C. Loy, “BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment,” arXiv preprint arXiv:2104.13371, 2021.
Comments