Monday, February 15, 2010

Protein Structure Modeling * Protein structure prediction and why is it useful?

* An overview of protein structure prediction

* Application: Structure-based drug design

An overview of protein structure prediction:
Protein structure determination methods:
X-ray crystallography
Accuate
Must have 20 mg material
Must be able to crystallize protein
 
NMR Limited to about 120 residues
Protein must be soluble, about 30mg/ml
 
Protein structure prediction
Does not need material
Complementary to Crystallography/NMR
More information=higher reliablity

Prediction categories: * Second structure prediction-1D only
* Protein threading or fold family recognition-3D fold information
* Homology modeling-up to X-ray accuracy
* Ab-initio structure prediction-3-6 A

Secondary Structure Prediction:

* Given an amino acid sequence
Predict a secondary structure state (alpha helix, beta strand, coil) for each residue in the sequence
* Best prediction method: PSIPRED (Jones)
* Provides more information for threading and other 3D prediction approaches
* Assists in structure determination

Threading:

* Given:
sequence of protein P with unknown structure
Database of known folds
* Find:
Most plausible fold for P
Good alignment gives approximate backbone structure
* Places the residues of unknown P along the backbone of a known structure and determines stability of side chains in that arrangement

Homology Modeling:n

* Simplest, reliable approach
* Basis: proteins with similar sequences tend to fold into similar structures
* Has been observed that even proteins with 25% sequence identity fold into similar structures
* Does not work for remote homologs (< 25% pairwise identity)
* Given:
A query sequence Q
A database of known protein structures
Find protein P such that P has high sequence similarity to Q
Return P’s structure as an approximation to Q’s structure

How good can homology modeling be?
Sequence Identity Accuracy
60-100% Molecular replacement in crystallography
Support site-directed mutagenesis
through visualization
30-60% Comparable to medium resolution NMR
Substrate Specificity
<30% Serious errors

Successful homology modeling program:
COMPOSER – felix.bioccam.ac.uksoft-base.html

MODELLER – guitar.rockefeller.edu/modeller/modeller.html

WHAT IF – www.sander.embl-heidelberg.de/whatif/

SWISS-MODEL – www.expasy.ch/SWISS-MODEL.html
 
For more information:

Comparative protein structure modeling with Modeller: A practical approach by Andras Fiser and Andrej Sali

Functional characterization of a protein sequence is one of the most frequent problems in biology. This task is usually facilitated by accurate three-dimensional (3D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3D model for a protein (target) that is related to at least one known protein structure (template) [1,2,3,4,5,6,7].

Despite progress in ab initio protein structure prediction [8], comparative modeling remains the only method that can reliably predict the 3D structure of a protein with an accuracy comparable to a low-resolution experimentally determined structure [6]. Even models with errors may be useful, because some aspects of function can be predicted from only coarse structural features of a model. Typical uses of comparative models are listed in Table 1 [4,6].

3D structure of proteins from the same family is more conserved than their primary sequences [9]. Therefore, if similarity between two proteins is detectable at the sequence level, structural similarity can usually be assumed. Moreover, proteins that share low or even non-detectable sequence similarity many times also have similar structures. Currently, the probability to find related proteins of known structure for a sequence picked randomly from a genome ranges approximately from 20% to 65%, depending on the genome [10,11]. Approximately one half of all known sequences have at least one domain that is detectably related to at least one protein of known structure [10]. Since the number of known protein sequences is approximately 600,000 [12,13], comparative modeling can be applied to domains in approximately 300,000 proteins. This number is an order of magnitude larger than the number of experimentally determined protein structures deposited in the Protein Data Bank (PDB) (15,000 ) [14]. Furthermore, the usefulness of comparative modeling is steadily increasing because the number of different structural folds that proteins adopt is limited [15,16,17,18] and because the number of experimentally determined new structures is increasing exponentially [19]. This trend is accentuated by the recently initiated structural genomics project that aims to determine at least one structure for most protein families [20,21]. It is conceivable that this aim will be substantially achieved in less than 10 years, making comparative modeling applicable to most protein sequences.

Structure modeling flowchart:Ab-initio structure prediction:

* Can be done with any protein…no other members of the family need be known
* Based on assumption – Native State is at the global free energy minimum
* Provides a way to observe the motion of large molecules such as proteins at the atomic level – dynamic simulation
* Newton’s second law applied to molecules
* Force on all atoms can be calculated, given this function
Trajectory of motion of molecule can be determined