Crystallization has proven to be the most significant bottleneck to high-throughput protein structure determination using diffraction methods. We have used the large-scale, systematically generated experimental results of the Northeast Structural Genomics Consortium to characterize the biophysical properties that control protein crystallization. Datamining of crystallization results combined with explicit folding studies lead to the conclusion that crystallization propensity is controlled primarily by the prevalence of well-ordered surface epitopes capable of mediating interprotein interactions and is not strongly influenced by overall thermodynamic stability. These analyses identify specific sequence features correlating with crystallization propensity that can be used to estimate the crystallization probability of a given construct. Analyses of entire predicted proteomes demonstrate substantial differences in the bulk amino acid sequence properties of human versus eubacterial proteins that reflect likely differences in their biophysical properties including crystallization propensity. Finally, our thermodynamic measurements enable critical evaluation of previous claims regarding correlations between protein stability and bulk sequence properties, which generally are not supported by our dataset.
"Understanding the Physical Properties that Control Protein Crystallization by Analysis of Large-Sale Experimental Data"