



  • 随机选择列表或字符串的索引:这是在DNA(或其它数据)中选择随机位置的基本方法。
  • 随机数突变模型通过学习如何随机选择DNA的核苷酸然后将其随机突变为其他核苷酸。
  • 使用随机数生成DNA序列数据集,可用于研究实际基因组中随机性的程度。
  • 反复突变DNA以研究在进化过程中随时间积累的突变的影响。

1. 随机数生成器



2. 使用随机化的程序



例子7-1 儿童的随机数游戏
#!/usr/bin/env python
# Children's game, demonstrating primitive artificial intelligence,
#  using a random number generator to randomly select parts of sentences.

import random
import time

# Here are the arrays of parts of sentences:
nouns = [
'Robin Hood',
'Joe and Moe',

verbs = [
'ran to',
'giggled with',
'put hot sauce into the orange juice of',
'sang stupid songs with',
'jumped with',

prepositions = [
'at the store',
'over the rainbow',
'just for the fun of it',
'at the beach',
'before dinner',
'in New York City',
'in a dream',
'around the world',

# This loop composes six-sentence "stories".
#  until the user types "quit".
while True:
    # (Re)set $story to the empty string each time through the loop
    story = ''  

    # Make 6 sentences per story.
    for count in range(6):

        #  Notes on the following statements:
        #  1) len(list) gives the number of elements in the array.
        #  2) rand returns a random number greater than 0 and 
        #     less than len(list).
        #  3) int removes the fractional part of a number.
        #  4) + joins two strings together.
        sentence   = nouns[random.choice(range(len(nouns)))] \
                    + " "  \
                    + verbs[random.choice(range(len(verbs)))] \
                    + " " \
                    + nouns[random.choice(range(len(nouns)))] \
                    + " " \
                    + prepositions[random.choice(range(len(prepositions)))]  \
                    + '. ' 

        story += sentence

    # Print the story.
    print("\n%s\n" % story)

    # Get user input.
    print('\nType \"quit\" to quit, or press Enter to continue: ')

    input_value = input().rstrip()
    # Exit loop at user's request
    if input_value == 'quit': break



Joe and Moe jumped with Rebecca in New York City. Rebecca exploded Groucho
in a dream. Mom ran to Harpo over the rainbow. TV giggled with Joe and Moe
over the rainbow. Harpo exploded Joe and Moe at the beach. Robin Hood giggled
with Harpo at the beach. 

Type "quit" to quit, or press Enter to continue: 

Harpo put hot sauce into the orange juice of TV before dinner. Dad ran to
Groucho in a dream. Joe and Moe put hot sauce into the orange juice of TV
in New York City. Joe and Moe giggled with Joe and Moe over the rainbow. TV
put hot sauce into the orange juice of Mom just for the fun of it. Robin Hood
ran to Robin Hood at the beach. 

Type "quit" to quit, or press Enter to continue: quit

 3. 模拟DNA突变程序



3.1 伪代码





3.1.1 在DNA字符串中选择一个随机位置


# randomposition
# A subroutine to randomly select a position in a string.
# WARNING: make sure you call srand to seed the
#  random number generator before you call this function.

def randomposition(dna):
    return random.choice(len(dna))


3.1.2 随机选择一个核苷酸


# randomnucleotide
# A subroutine to randomly select a nucleotide
# WARNING: make sure you call srand to seed the
#  random number generator before you call this function.

def randomnucleotide(nucs)
    return random.choice(nucs)


3.1.3 将随机核苷酸置于随机位置

现在编写第三个也是最后一个函数,实际上就是突变。 代码如下:

# mutate
# A subroutine to perform a mutation in a string of DNA

def mutate(dna):
    nucleotides = ['A', 'C', 'G', 'T']

    # Pick a random position in the DNA
    position = randomposition(dna)

    # Pick a random nucleotide
    newbase = randomnucleotide(nucleotides)

    # Insert the random nucleotide into the random position in the DNA.
    dna = dna[ : position] + newbase + dna[position+1 :]
    return dna

这里又是一个简短的函数,当你仔细研究时,请注意它的阅读和理解相对容易。 你通过挑选一个随机位置然后随机选择一个核苷酸并在该字符串中的那个位置取代该核苷酸进行突变。(如果您忘记了字符串索引操作,请参阅其他Pythonl文档。)


3.2 结合函数模拟突变


例子7-2 突变DNA
#!/usr/bin/env python

import random
# Subroutines for Example 7-2

#  Notice, now that we have a fair number of subroutines, we
#  list them alphabetically

# A subroutine to perform a mutation in a string of DNA
# WARNING: make sure you call srand to seed the
#  random number generator before you call this function.

def mutate(dna):
    nucleotides = ['A', 'C', 'G', 'T']

    # Pick a random position in the DNA
    position = randomposition(dna)

    # Pick a random nucleotide
    newbase = randomnucleotide(nucleotides)

    # Insert the random nucleotide into the random position in the DNA
    # The substr arguments mean the following:
    #  In the string $dna at position $position change 1 character to
    #  the string in $newbase
    dna = dna[:position] + newbase + dna[position+1 :]

    return dna

# A subroutine to randomly select an element from an array
# WARNING: make sure you call srand to seed the
#  random number generator before you call this function.

def randomelement(array):

    return random.choice(array)

# randomnucleotide
# A subroutine to select at random one of the four nucleotides
# WARNING: make sure you call srand to seed the
#  random number generator before you call this function.

def randomnucleotide():

    nucleotides = ['A', 'C', 'G', 'T']

    return randomelement(nucleotides)

# randomposition
# A subroutine to randomly select a position in a string.
# WARNING: make sure you call srand to seed the
#  random number generator before you call this function.

def randomposition(dna):

    return random.choice(range(len(dna)))

# Mutate DNA
#  using a random number generator to randomly select bases to mutate

# Declare the variables

# The DNA is chosen to make it easy to see mutations:

# Let's test it, shall we?
mutant = mutate(DNA)

print("\nMutate DNA\n\n")

print("\nHere is the original DNA:\n\n")
print("%s\n" % DNA)

print("\nHere is the mutant DNA:\n\n")
print("%s\n" % mutant)

# Let's put it in a loop and watch that bad boy accumulate mutations:
print("\nHere are 10 more successive mutations:\n\n")

for i in range(10):
    mutant = mutate(mutant)
    print("%s\n" % mutant)



Mutate DNA

Here is the original DNA:


Here is the mutant DNA:


Here are 10 more successive mutations:


4 生成随机序列



4.1 自下而上与自上而下



4.2 生成一组随机DNA的函数


random_DNA = make_random_DNA_set(minimum_length, maximum_length, size_of_set )


repeat size_of_set times:

    length = random number between minimum and maximum length

    dna = make_random_DNA ( length )

    add dna to set
  return set


from 1 to size

    base = randomnucleotide

    dna .= base

return dna


4.3 将伪代码转成函数

  现在我们有了自上而下的设计,如何进行编码? 由于python是顺序执行,让我们按照自下而上的设计,编写程序。例7-3从函数定义开始,按照你在伪代码中执行的自顶向下设计的顺序继续,然后是函数。

例子7-3 产生随机DNA
#!/usr/bin/env python

import random
# Subroutines

# make_random_DNA_set
# Make a set of random DNA
#   Accept parameters setting the maximum and minimum length of
#     each string of DNA, and the number of DNA strings to make
# WARNING: make sure you call srand to seed the
#  random number generator before you call this function.

def make_random_DNA_set(minimum_length, maximum_length, size_of_set):

    # set of DNA fragments
    dna_set = []

    # Create set of random DNA
    for i in range(size_of_set):

        # find a random length between min and max
        length = randomlength (minimum_length, maximum_length)

        # make a random DNA fragment
        dna = make_random_DNA ( length )

        # add dna fragment to dna_set

    return dna_set

# Notice that we've just discovered a new subroutine that's
# needed: randomlength, which will return a random
# number between (or including) the min and max values.
# Let's write that first, then do make_random_DNA

# randomlength
# A subroutine that will pick a random number from
# minlength to maxlength, inclusive.
# WARNING: make sure you call srand to seed the
#  random number generator before you call this function.

def randomlength(minlength, maxlength):

    # Calculate and return a random number within the
    #  desired interval.
    # Notice how we need to add one to make the endpoints inclusive,
    #  and how we first subtract, then add back, minlength to
    #  get the random number in the correct interval.
    return random.choice(range(maxlength - minlength + 1)) + minlength

# make_random_DNA
# Make a string of random DNA of specified length.
# WARNING: make sure you call srand to seed the
#  random number generator before you call this function.

def make_random_DNA(length):

    for i in range(length):

        dna .= randomnucleotide()
    return dna

# We also need to include the previous subroutine
# randomnucleotide.
# Here it is again for completeness.

# randomnucleotide
# Select at random one of the four nucleotides
# WARNING: make sure you call srand to seed the
#  random number generator before you call this function.

def randomnucleotide():

    nucleotides = ['A', 'C', 'G', 'T']

    # scalar returns the size of an array. 
    # The elements of the array are numbered 0 to size-1
    return randomelement(nucleotides)

# randomelement
# randomly select an element from an array
# WARNING: make sure you call srand to seed the
#  random number generator before you call this function.

def randomelement(array):

    return random.choice(array)

# Generate random DNA
#  using a random number generator to randomly select bases

# Declare and initialize the variables
size_of_set = 12
maximum_length = 30
minimum_length = 15

# And here's the subroutine call to do the real work
random_DNA = make_random_DNA_set(minimum_length, maximum_length, size_of_set )

# Print the results, one per line
print("Here is an array of %s randomly generated DNA sequences\n" % size_of_set)
print("  with lengths between %s and %s:\n\n" % (minimum_length, maximum_length))

for dna in random_DNA:

    print("%s\n" % dna)





 5. 分析DNA



Generate a set of random DNA sequences, all the same length

For each pair of DNA sequences

    How many positions in the two sequences are identical as a fraction?



assuming DNA1 is the same length as DNA2,

for each position from 1 to length(DNA)

    if the character at that position is the same in DNA_1 and DNA_2

return count/length


例子7-4 计算随机DNA序列对之间的平均%同一性
#!/usr/bin/env python
import random

# Subroutines

# matching_percentage
# Subroutine to calculate the percentage of identical bases in two
# equal length DNA sequences

def matching_percentage(string1, string2):

    # we assume that the strings have the same length
    length = len(string1)

    count = 0

    for position in range(length):
        if string1[position] == string2[position]:

    return count / length

# make_random_DNA_set
# Subroutine to make a set of random DNA
#   Accept parameters setting the maximum and minimum length of
#     each string of DNA, and the number of DNA strings to make
# WARNING: make sure you call srand to seed the
#  random number generator before you call this function.

def make_random_DNA_set(minimum_length, maximum_length, size_of_set):

    # set of DNA fragments
    dna_set = []

    # Create set of random DNA
    for i in range(size_of_set):

        # find a random length between min and max
        length = randomlength (minimum_length, maximum_length)

        # make a random DNA fragment
        dna = make_random_DNA ( length )

        # add $dna fragment to dna_set

    return dna_set

# randomlength
# A subroutine that will pick a random number from
# $minlength to $maxlength, inclusive.
# WARNING: make sure you call srand to seed the
#  random number generator before you call this function.

def randomlength(minlength, maxlength):

    # Calculate and return a random number within the
    #  desired interval.
    # Notice how we need to add one to make the endpoints inclusive,
    #  and how we first subtract, then add back, minlength to
    #  get the random number in the correct interval.
    return random.choice(range(maxlength - minlength + 1)) + minlength 

# make_random_DNA
# Make a string of random DNA of specified length.
# WARNING: make sure you call srand to seed the
#  random number generator before you call this function.

def make_random_DNA(length):

    for i in range(length):
        dna .= randomnucleotide()

    return dna

# randomnucleotide
# Select at random one of the four nucleotides
# WARNING: make sure you call srand to seed the
#  random number generator before you call this function.

def randomnucleotide():

    nucleotides = ['A', 'C', 'G', 'T']

    return randomelement(nucleotides)

# randomelement
# randomly select an element from an array
# WARNING: make sure you call srand to seed the
#  random number generator before you call this function.

def randomelement(array)

    return random.choice(array)

# Calculate the average percentage of positions that are the same
# between two random DNA sequences, in a set of 10 sequences.

percentages = []

#  Generate the data set of 10 DNA sequences.
random_DNA = make_random_DNA_set( 10, 10, 10 );

# Iterate through all pairs of sequences
for k in range(len(random_DNA) -1):
    for i in range(k+1, len(random_DNA)):

        # Calculate and save the matching percentage
        percent = matching_percentage(random_DNA[k], random_DNA[i])
        percentages.append(percent )

# Finally, the average result:
result = 0;

for percent in percentages:
  result += percent

result = result / len(percentages)
#Turn result into a true percentage
result = int (result * 100)

print("In this run of the experiment, the average percentage of \n")
print("matching positions is %s%%\n\n" % result)



In this run of the experiment, the average number of 
matching positions is 0.24%

 6. 练习

1. 写一个程序,要求你挑选氨基酸,然后(随机)猜测你选择了哪种氨基酸。

2. 编写一个程序,选择四个核苷酸中的一个,然后继续提示,直到你正确猜出它挑选的核苷酸。

3. 编写一个子程序来随机混洗数组的元素。子例程应该将一个数组作为参数,并返回一个具有相同元素但以随机顺序混洗的数组。原始数组的每个元素应该在输出数组中只出现一次,就像改组一副牌一样。

4. 编写一个突变蛋白质序列的程序,类似于实例7-2中变异DNA的代码。

5. 编写一个子程序,给定一个密码子(长度为3的DNA片段),在密码子中返回一个随机突变。

6. 有时并非所有选择都会随机选择。编写一个随机返回核苷酸的子程序,其中可以指定每个核苷酸的概率。将子程序四个数作为参数传递,代表每个核苷酸的概率;如果每个概率为0.25,则子程序同样可能挑选每个核苷酸。作为错误检查,让子程序确保四个概率的总和为1。