The autonomous navigation of unmanned vehicles in GPS denied environments is an incredibly challenging task. Because cameras are low in price, obtain rich information, and passively sense the environment, vision based simultaneous localization and mapping (VSLAM) has great potential to solve this problem. In this paper, we propose a novel VSLAM framework based on a stereo camera. The proposed approach combines the direct and indirect method for the real-time localization of an autonomous forklift in a non-structured warehouse. Our proposed hybrid method uses photometric errors to perform image alignment for data association and pose estimation, extracts features from keyframes, and matches them to acquire the updated pose. By combining the efficiency of the direct method and the high accuracy of the indirect method, the approach achieves higher speed with comparable accuracy to a state-of-the-art method. Furthermore, the two step dynamic threshold feature extraction method significantly reduces the operating time. In addition, a motion model of the forklift is proposed to provide a more reasonable initial pose for direct image alignment based on photometric errors. The proposed algorithm is experimentally tested on a dataset constructed from a large scale warehouse with dynamic lighting and long corridors, and the results show that it can still successfully perform with high accuracy. Additionally, our method can operate in real time using limited computing resources.