The legal judgments are always based on the description of the case, the legal document. However, retrieving and understanding large numbers of relevant legal documents is a time-consuming task for legal workers. The legal judgment prediction (LJP) focus on applying artificial intelligence technology to provide decision support for legal workers. The prison term prediction(PTP) is an important task in LJP which aims to predict the term of penalty utilizing machine learning methods, thus supporting the judgement. Long-Short Term Memory(LSTM) Networks are a special type of Recurrent Neural Networks(RNN) that are capable of handling long term dependencies without being affected by an unstable gradient. Mainstream RNN models such as LSTM and GRU can capture long-distance correlation but training is time-consuming, while traditional CNN can be trained in parallel but pay more attention to local information. Both have shortcomings in case description prediction. This paper proposes a prison term prediction model for legal documents. The model adds causal expansion convolution in general TextCNN to make the model not only limited to the most important keyword segment, but also focus on the text near the key segments and the corresponding logical relationship of this paragraph, thereby improving the predicting effect and the accuracy on the data set. The causal TextCNN in this paper can understand the causal logical relationship in the text, especially the relationship between the legal text and the prison term. Since the model uses all CNN convolutions, compared with traditional sequence models such as GRU and LSTM, it can be trained in parallel to improve the training speed and can handling long term. So causal convolution can make up for the shortcomings of TextCNN and RNN models. In summary, the PTP model based on causality is a good solution to this problem. In addition, the case description is usually longer than traditional natural language sentences and the key information related to the prison term is not limited to local words. Therefore, it is crucial to capture substantially longer memory for LJP domains where a long history is required. In this paper, we propose a Causality CNN-based Prison Term Prediction model based on fact descriptions, in which the Causal TextCNN method is applied to build long effective history sizes (i.e., the ability for the networks to look very far into the past to make a prediction) using a combination of very deep networks (augmented with residual layers) and dilated convolutions. The experimental results on a public data show that the proposed model outperforms several CNN and RNN based baselines.