TBpred
From DrugPedia: A Wikipedia for Drug discovery
TBpred
TBpred is a prediction server that predicts four subcellular localization (cytoplasmic,integral membrane,secretory and membrane attached by lipid anchor) of mycobacterial proteins.It is SVM based method that exploits defferent features of protein such as amino acid compositin, dipeptide composition and position specific scoring matrix (PSSM).The overall prediction accuracy of these SVM modules are 82.51, 80.39 and 86.62% respectively.Along with SVM other techniques like profile HMM and MEME/MAST motif based studies were also applied.Moreover a hybrid approach combining the pssm based SVM model and the MEME/MAST model has been incorporated.
Importance of this webserver:
Availability of TBpred Webserver: This server is available at TBpred
About Dataset:
Current dataset of mycobacterial proteins along with their subcellular localization has been developed from SWISS-PROT along with their subcellular localization. Out of 1365 proteins, non-experimental qualifier "by similarity" is excluded resulting in 882 proteins. Among 13 different subcellular compartments , 4 major sites have been selected containing reasonable number of samples.
Subcellular Localization | Sample Number |
1.Cytoplasmic | 340 |
2.Integral Membrane | 402 |
3.Secreted | 50 |
4.Attached to the membrane by lipid anchor | 60 |
Support Vector Machine (SVM):
SVMlight has been used in the present study in classification mode.Several parameters may be tuned for their appropriate values to get optimum results.Among different inbuilt kernels three have been used namely linear,polynomial and RBF.Subcellular localization prediction is a multi-class approach. For a defined protein feature, four types of SVM modules have been developed each belonging to a specific subcellular localization.The nth SVM model learns from nth class samples with positive labels and rest other samples with negative labels.Prediction of an unknown sample is based upon the maximum score out of four scores, generated by four models specific to four different subcellular compartments.
Evaluation of prediction performance of TBpred:
The performance of this method is evaluated by 5-fold cross-validation technique.The whole data is partitioned in 5 sets in such a manner that no two proteins from different sets shows sequence similarity greater than 36%.The training is done on four sets and remaining one is used for testing.In order to test each and every protein this process is carried out 5 times, each time using distinct set for testing.Evaluation of performance of different SVM modules has been done by calculating accuracy and Matthew's correlation coefficient (MCC) by the following equations:
where, x can be any subcellular location (cytoplasmic, mitochondrial, nuclear, or plasma membrane), exp(x) is the number of sequences observed in location x, p(x) is the number of correctly predicted sequences of location x, n(x) is the number of correctly predicted sequences not of location x, u(x) is the number of under predicted sequences and o(x) is the number of over-predicted sequences.
Various Prdiction Approahes:
In this study mainly three approaches have been studied, based on different features of proteins.
Subcellular Localization | Accuracy(%) | MCC |
cytoplasmic | 88.82 | 0.77 |
Integral Membrane | 86.07 | 0.71 |
Secreted | 44.00 | 0.57 |
Attached to membrane by a lipid anchor | 55.00 | 0.58 |
Subcellular Localization | Accuracy(%) | MCC |
cytoplasmic | 89.41 | 0.72 |
Integral Membrane | 81.09 | 0.67 |
Secreted | 50.00 | 0.60 |
Attached to membrane by a lipid anchor | 50.00 | 0.57 |
From the PSSM obtained for each protein sequence a SVM pattern has been made.The input vector contains 400 dimensions.Overall accuracy acheived by this SVM module (kernel-RBF,g=2, c=50, j=1) was 86.62%. |
Subcellular Localization | Accuracy(%) | MCC |
cytoplasmic | 94.71 | 0.85 |
Integral Membrane | 87.81 | 0.80 |
Secreted | 44.00 | 0.48 |
Attached to membrane by a lipid anchor | 68.33 | 0.69 |
--Mamoon 04:57, 20 August 2008 (UTC)