Abstract:Aimed at the problems that visual tracking is loss caused by insufficient appearance feature representation under complex backgrounds in combination with the motion features estimated by deep optical flow network, a visual tracking algorithm is proposed based on adaptive fusion of temporal information and spatial information. On the basis of correlation filtering tracking framework, the Recurrent All Pairs Field Transforms (RAFT) deep network is utilized for estimating the optical flow to obtain the timing information of the target, and extract the CN feature and HOG feature to obtain spatial information, and fuse the target timing information and spatial information to enhance the characterization ability of the target's temporal and spatial characteristics, and then a mechanism for discriminating the reliability of tracking results is established, and the weight of time sequence information in the fusion process is adjusted in real time, effectively improving the generalization ability of the algorithm in a complex dynamic environment. In order to evaluate the effectiveness of the algorithm in this paper, the tests are carried out on two data sets, OTB100 and VOT2019 respectively. The experimental results show that compared with the mainstream visual tracking algorithms in recent years, the tracking performance is improved by the algorithm, especially in motion blur, fast motion and other attributes of the video. And this algorithm has obvious advantages.