Abstract
Aim and Objective: Protein tyrosine phosphatases (PTPs) are responsible for protein phosphorylation. Because the level of protein phosphorylation is correlated with tumor transformation, PTPs have been considered as candidate transformation suppressors. In this study, we developed a novel PTP site prediction model, DephosSitePred, based on bi-profile sequence features.
Materials and Method: A dataset which contains 63-, 50- and 51-positive samples, and 868-, 856-, and 731-negative samples with less than 70% sequence identity for the three phosphatases was constructed in this study. Based on the dataset, a predictor model DephosSitePred was constructed, by applying the sequence-based bi-profile Bayes feature extraction technique to identify three phosphatases, PTP1B, SHP-1, and SHP-2. Concerning the imbalance of datasets used in our study, the weight parameters (W1 and W-1) of the support vector machine (SVM) were selected according to jackknife cross-validation. Results: DephosSitePred yielded Matthews correlation coefficients of 0.686 for protein tyrosine phosphatase 1B (PTP1B), 0.668 for Src homology region 2 domain-containing phosphatase (SHP)-1, and 0.748 for SHP-2 substrate sites, which significantly outperformed other existing predictors. Moreover, 30 times of 5-fold cross-validations showed that DephosSitePred achieved average area under the curve values of 0.968, 0.968, and 0.982 for PTP1B, SHP-1 and SHP-2, respectively, which were 0.115, 0.105 and 0.105 higher than those of the second best model, MGPS-DEPHOS, respectively. Conclusion: DephosSitePred is indeed an effective auxiliary tool for in silico identification of dephosphorylation sites and may help to reveal the physiological and pathological role of dephosphorylation protein.Keywords: Protein tyrosine phosphatase, PTP, Bi-profile, prediction, SVM, weight parameter.