PREDICTING VIDEO SALIENCY USING CROWDSOURCED MOUSE-TRACKING DATA
Abstract and keywords
Abstract (English):
This paper presents a new way of getting high-quality saliency maps for video, using a cheaper alternative to eye-tracking data. We designed a mouse-contingent video viewing system which simulates the viewers’ peripheral vision based on the position of the mouse cursor. The system enables the use of mouse-tracking data recorded from an ordinary computer mouse as an alternative to real gaze fixations recorded by a more expensive eye-tracker. We developed a crowdsourcing system that enables the collection of such mouse-tracking data at large scale. Using the collected mouse-tracking data we showed that it can serve as an approximation of eye-tracking data. Moreover, trying to increase the efficiency of collected mouse-tracking data we proposed a novel deep neural network algorithm that improves the quality of mouse-tracking saliency maps.

Keywords:
saliency, deep learning, visual attention, crowdsourcing, eye tracking, mouse tracking
References

1. Borji, A. Saliency prediction in the deep learning era: Anempirical investigation. CoRR abs/1810.03716 (2018).

2. Borji, A., and Itti, L. State-of-the-art in visual attentionmodeling. IEEE Transactions on Pattern Analysis andMachine Intelligence 35, 1 (2013), 185-207.

3. Cornia, M., Baraldi, L., Serra, G., and Cucchiara, R.Predicting Human Eye Fixations via an LSTM-basedSaliency Attentive Model. IEEE Transactions on ImageProcessing 27, 10 (2018), 5142-5154.

4. Gitman, Y., Erofeev, M., Vatolin, D., Andrey, B., andAlexey, F. Semiautomatic visual-attention modeling and itsapplication to video compression. In InternationalConference on Image Processing (ICIP) (2014), pp. 1105-1109.

5. Huang, X., Shen, C., Boix, X., and Zhao, Q. Salicon:Reducing the semantic gap in saliency prediction byadapting deep neural networks. In International Conferenceon Computer Vision (2015), pp. 262-270.

6. Jiang, L., Xu, M., and Wang, Z. Predicting video saliencywith object-to-motion cnn and two-layer convolutionallstm. CoRR abs/1709.06316 (2017).

7. Judd, T., Durand, F., and Torralba, A. A benchmark ofcomputational models of saliency to predict humanfixations. Tech. rep., Computer Science and ArtificialIntelligence Lab, Massachusetts Institute of Technology,2012.

8. Judd, T., Ehinger, K., Durand, F., and Torralba, A. Learningto predict where humans look. In International Conferenceon Computer Vision (ICCV) (2009), pp. 2106-2113.

9. Kim, N. W., Bylinskii, Z., Borkin, M. A., Gajos, K. Z.,Oliva, A., Durand, F., and Pfister, H. Bubbleview: Aninterface for crowdsourcing image importance maps andtracking visual attention. ACM Trans. Comput.-Hum.Interact. 24, 5 (2017), 1-40.

10. Lyudvichenko, V., Erofeev, M., Gitman, Y., and Vatolin, D.A semiautomatic saliency model and its application to videocompression. In 13th IEEE International Conference onIntelligent Computer Communication and Processing(2017), pp. 403-410.

11. Lyudvichenko, V., Erofeev, M., Ploshkin, A., and Vatolin,D. Improving video compression with deep visual-attentionmodels. In International Conference on IntelligentMedicine and Image Processing (2019).

12. Mathe, S., and Sminchisescu, C. Actions in the eye:Dynamic gaze datasets and learnt saliency models for visualrecognition. IEEE Transactions on Pattern Analysis andMachine Intelligence (2015), 1408-1424.

13. Sidorov, O., Pedersen, M., Kim, N. W., and Shekhar, S. Areall the frames equally important? CoRR abs/1905.07984(2019).

14. Wang, W., Shen, J., Guo, F., Cheng, M.- M., and Borji, A.Revisiting video saliency: A large-scale benchmark and anew model. IEEE Conference on Computer Vision andPattern Recognition (2018).

15. Xu, P., Sugano, Y., and Bulling, A. Spatiotemporalmodeling and prediction of visual attention in graphical userinterfaces. In CHI Conference on Human Factors inComputing Systems (2016), pp. 3299-3310.

Login or Create
* Forgot password?