The development of 3D modeling technology has promoted the development of the multimedia film and television industry. This article is aimed at studying the design of 3D modeling facial image library in multimedia film and television, at providing a more comprehensive facial image library for the multimedia film and television industry, at breaking the shackles of the traditional film and television industry with 3D technology, and at continuously surpassing traditional film and television media forms. This article deeply explores the background development of multimedia film and television and the characteristics of the development of new media. Starting from 3D technology, it extracts facial features of characters, transforms image data through deep autoencoders, and uses local binarization mode to perform the original facial image is subjected to texture feature extraction. In this paper, a number of experimental subjects were selected, and the subjects were photographed from the left, front, and right from multiple angles. Through the pinhole camera projection imaging process, the internal and external parameters of the camera were adjusted. In the process of 3D image construction, the image is first selected for feature detection, then the corresponding vector information and geometric conditions are matched to construct a 3D matrix, and the facial structure image is obtained by triangulation. This article compares the 3D production software on the market and selects the Maya platform suitable for building this system. The global constraint information is obtained by training some sample images. When searching the test image, find the appropriate feature point position according to the structural matching degree of the local image. When each search is completed, the global information will be used for constraint, so as to output reasonable feature information. The average residual range of the human face image constructed in this paper is 0.25-0.45, and the maximum residual error does not exceed 4.0. The experimental method in this paper has good stability and robustness. Using the COM transmission model can make experimenters not need to think too much about the underlying details. This face animation-driven simulation scheme can achieve more vivid facial expressions.