Study of protein sequence comparison metrics on the connection machine CM-2 | The Journal of Supercomputing Skip to main content
Log in

Study of protein sequence comparison metrics on the connection machine CM-2

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Software tools have been developed to do rapid, large-scale protein sequence comparisons on databases of amino acid sequences, using a data parallel computer architecture. This software enables one to compare a protein against a database of several thousand proteins in the same time required by a conventional computer to do a single protein-protein comparison, thus enabling biologists to find relevant similarities much more quickly, and to evaluate many different comparison metrics in a reasonable period of time. We have used this software to analyze the effectiveness of various scoring metrics in determining sequence similarity, and to generate statistical information about the behavior of these scoring systems under the variation of certain parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Arratia, R., and Lander, E. S. 1989. The Distribution of Clusters in Random Graphs.Adv. Appl. Math. (to appear).

  • Arratia, R., and Waterman, M. S. 1985. An Erdos-Renyi Law with shifts, Adv. Math., 55:13–23.

    Google Scholar 

  • Arratia, R., Gordon, L., and Waterman, M. S. 1986. An extreme value distribution for sequence matching.Ann. Stat., 14:971–993.

    Google Scholar 

  • Bellman, R. 1957.Dynamic Programming. Princeton University Press, Princeton, N. Jersey.

    Google Scholar 

  • Coulson, A. F. W., Collins, J. F., and Lyall, A. 1987. Protein and nucleic acid sequence database searching: A suitable case for parallel processing.The Computer J., 30, 5:420–424.

    Google Scholar 

  • Doolittle, R. F. 1986.Of Urfs and Orfs: A Primer on How to Analyze Derived Amino Acid Sequences. University Science Books, Mill Valley, Calif.

    Google Scholar 

  • Doolittle, R. F., Hunkapiller, M. W., Hood, L. E., Devare, S. G., Robbins, K. C., Aaronson, S. A., and Antoniades, H. M. 1983. Simian sarcoma viruses oncogene v-sis is derived from the gene (or genes) encoding a platelet-derived growth factor.Science, 221:275–276.

    Google Scholar 

  • Edmiston, E., and Wagner, R. A. 1987. Parallelization of the dynamic programming algorithm for comparison of sequences. InProc., 1987 International Conf. on Parallel Processing (Chicago, Aug. 17–21), Penn State Press, Philadelphia, pp. 78–80.

    Google Scholar 

  • Hillis, W. D. 1985.The Connection Machine. MIT Press, Cambridge, Mass.

    Google Scholar 

  • Johnson, D. S. 1973.Near-optimal bin packing algorithms. Ph.D. diss., Dept. of Mathematics, Mass. Institute of Technology, Cambridge, Mass.

    Google Scholar 

  • Lander, E., Mesirov, J. P., and Taylor, W. 1988. Protein sequence comparison on a data parallel computer. InProc., 1988 Internatinal Conf. on Parallel Processing (Chicago, Aug. 15–19), Penn State Press, Philadelphia, pp. 257–263.

    Google Scholar 

  • Maxam, A. M., and Gilbert, W. 1977.Proc., Nat. Acad. Sci., 74:560–564.

    Google Scholar 

  • Needleman, S. B., and Wunsch, C. D. 1970. A general method applicable to the search for similarities in the amino acid sequences of two proteins.J. Mol. Biol., 48:444–453.

    Google Scholar 

  • Sanger, F., Nicklen, S., and Coulson, A. R. 1977.Proc., Nat. Acad. Sci., 74:5463–5467.

    Google Scholar 

  • Smith, T. F., and Waterman, M. S. 1981. Identification of common molecular subsequences.J. Mol. Biol., 147:195–197.

    Google Scholar 

  • Thinking Machines Corp. 1987. Connection Machine® Model CM-2 technical summary.

  • Waterman, M. S., Gordon, L., and Arratia, R. 1987. Phase transitions in sequence matches and nucleic acid structure.Proc., Nat. Acad. Sci., 84:1239–1243.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Eric Lander was supported in part by National Science Foundation grant #NSF-DCB-8611317 and System Development Foundation grant #SDF612

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lander, E., Mesirov, J.P. & Taylor, W. Study of protein sequence comparison metrics on the connection machine CM-2. J Supercomput 3, 255–269 (1989). https://doi.org/10.1007/BF00128166

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00128166

Key words

Navigation