Abstract
Software tools have been developed to do rapid, large-scale protein sequence comparisons on databases of amino acid sequences, using a data parallel computer architecture. This software enables one to compare a protein against a database of several thousand proteins in the same time required by a conventional computer to do a single protein-protein comparison, thus enabling biologists to find relevant similarities much more quickly, and to evaluate many different comparison metrics in a reasonable period of time. We have used this software to analyze the effectiveness of various scoring metrics in determining sequence similarity, and to generate statistical information about the behavior of these scoring systems under the variation of certain parameters.
Similar content being viewed by others
References
Arratia, R., and Lander, E. S. 1989. The Distribution of Clusters in Random Graphs.Adv. Appl. Math. (to appear).
Arratia, R., and Waterman, M. S. 1985. An Erdos-Renyi Law with shifts, Adv. Math., 55:13–23.
Arratia, R., Gordon, L., and Waterman, M. S. 1986. An extreme value distribution for sequence matching.Ann. Stat., 14:971–993.
Bellman, R. 1957.Dynamic Programming. Princeton University Press, Princeton, N. Jersey.
Coulson, A. F. W., Collins, J. F., and Lyall, A. 1987. Protein and nucleic acid sequence database searching: A suitable case for parallel processing.The Computer J., 30, 5:420–424.
Doolittle, R. F. 1986.Of Urfs and Orfs: A Primer on How to Analyze Derived Amino Acid Sequences. University Science Books, Mill Valley, Calif.
Doolittle, R. F., Hunkapiller, M. W., Hood, L. E., Devare, S. G., Robbins, K. C., Aaronson, S. A., and Antoniades, H. M. 1983. Simian sarcoma viruses oncogene v-sis is derived from the gene (or genes) encoding a platelet-derived growth factor.Science, 221:275–276.
Edmiston, E., and Wagner, R. A. 1987. Parallelization of the dynamic programming algorithm for comparison of sequences. InProc., 1987 International Conf. on Parallel Processing (Chicago, Aug. 17–21), Penn State Press, Philadelphia, pp. 78–80.
Hillis, W. D. 1985.The Connection Machine. MIT Press, Cambridge, Mass.
Johnson, D. S. 1973.Near-optimal bin packing algorithms. Ph.D. diss., Dept. of Mathematics, Mass. Institute of Technology, Cambridge, Mass.
Lander, E., Mesirov, J. P., and Taylor, W. 1988. Protein sequence comparison on a data parallel computer. InProc., 1988 Internatinal Conf. on Parallel Processing (Chicago, Aug. 15–19), Penn State Press, Philadelphia, pp. 257–263.
Maxam, A. M., and Gilbert, W. 1977.Proc., Nat. Acad. Sci., 74:560–564.
Needleman, S. B., and Wunsch, C. D. 1970. A general method applicable to the search for similarities in the amino acid sequences of two proteins.J. Mol. Biol., 48:444–453.
Sanger, F., Nicklen, S., and Coulson, A. R. 1977.Proc., Nat. Acad. Sci., 74:5463–5467.
Smith, T. F., and Waterman, M. S. 1981. Identification of common molecular subsequences.J. Mol. Biol., 147:195–197.
Thinking Machines Corp. 1987. Connection Machine® Model CM-2 technical summary.
Waterman, M. S., Gordon, L., and Arratia, R. 1987. Phase transitions in sequence matches and nucleic acid structure.Proc., Nat. Acad. Sci., 84:1239–1243.
Author information
Authors and Affiliations
Additional information
Eric Lander was supported in part by National Science Foundation grant #NSF-DCB-8611317 and System Development Foundation grant #SDF612
Rights and permissions
About this article
Cite this article
Lander, E., Mesirov, J.P. & Taylor, W. Study of protein sequence comparison metrics on the connection machine CM-2. J Supercomput 3, 255–269 (1989). https://doi.org/10.1007/BF00128166
Issue Date:
DOI: https://doi.org/10.1007/BF00128166