Simultaneous tracking of features acquired by multiple video cameras mounted on a rig opens new possibilities for ego-motion estimation and 3D scene modeling. In this paper we propose a novel approach of tracking three video streams at once. The color image features are detected using interest operators and described with SIFT. Since standard tracking techniques perform outlier detection only according to relative orientation between temporal image pairs and hence suffer from outliers which cannot be identified by the epipolar constraints, we improve the outlier detection using temporal and spatial trifocal constraints. Furthermore, these spatio-temporal constraints allow the system to perform a guided matching, which increases the number of tracked features.