New model for estimating the mutation rate of the human genome

  An international team of scientists led by the Thousand Genome Project Association has constructed the world’s largest catalog of human genome variants, which can provide researchers with valuable clues to help them establish the reasons why some people are susceptible to various diseases. Previously, a Canadian research team led by Professor Brendan Frey developed the first method to "arrange" genetic mutations based on the way living cells "read" DNA, thereby revealing the possibility of any particular mutation causing disease. They used their own methods to discover some unexpected genetic determinants of autism, hereditary cancer and spinal muscular atrophy, which can be said to have found the golden key to the interpretation of the "Book of Heaven" of the human genome. Recently, scientists at the Perelman School of Medicine at the University of Pennsylvania discovered that the type, frequency, and location of new mutations in the human genome depend on the nearby DNA building blocks. The relevant research results were published in this week’s "Nature Genetics" magazine. Dr. Benjamin F. Voight, senior author of this article, assistant professor in the Department of Systems Pharmacology and Translational Therapy, and the Department of Genetics, pointed out: “We have developed a mathematical model that can be based on the nearby DNA'letter' sequence in the human genome-called To estimate the mutation rate for nucleotides. This new model not only provides clues to the mutation process, but also helps to discover possible genetic risk factors that affect complex human diseases such as autism spectrum disorders." This article focuses on the study of any given nucleotide in the human genome—one of the four letters in the DNA alphabet (A, C, G, or T is adenine, cytosine, guanine, and thymine)—is changed Possibility. Voight focuses on the simplest type of mutation, a "point" mutation-a single letter in a given sequence is changed. Most of these changes-commonly referred to as single nucleotide polymorphisms (SNPs), or "scissors", are generally harmless to human function. However, Voight investigated why some sequences are more susceptible to mutation, while others are not. Voight said: "The key to this article is the dependence of the mutation rate on nucleotides one, two or three bases from either side of the SNP. We already know that the DNA sequence in the genome-where the methyl group is attached Cytosine nucleotides, also known as CpG sites, are hot spots for mutations. But besides that, are there other types of local sequences?"

  To solve this problem, Voight and graduate student Varun Aggarwala designed a mathematical model that can be applied to the SNP data found in humans. Their method uses publicly available data from thousands of human subjects around the world, namely from the 1,000 Genome Project. As part of an international initiative, these people sequenced to characterize genetic variation that occurs naturally in human populations. Their discovery is amazing: understand the three nucleotides flanking a given SNP, for a total of seven nucleotides, look for a given sequence in an individual (their genome sequence is in the genome project database) A SNP can predict up to 93% of the variability. In addition, their model found several distinctive local nucleotide sequences, which were previously thought to be less susceptible to mutation. Voight said: "It turns out that there is indeed a DNA sequence other than the CpG site that is prone to mutation. The reason is still unclear. It is necessary to study the initial rate and our model in more depth to decipher the basic mechanism of inducing mutations in the human genome. ."

   Another discovery questioned a hypothesis: methylated CpG sites always have the same mutation rate. Voight said: "I think it is generally assumed that all CpG sequences mutate at the same rate, but our research results show that there are more mutations than we expected." Voight and Aggarwala used another public database, in a few people Among them, measuring the methylation status of CpG sites, they found that the frequency of methylation of different sequences cannot fully explain the difference in mutation rates at these sites. Voight said: "This certainly shows the possibility of additional genetic mutations in CpG hot spots, which can change how these sites are prone to mutation. For example, how can DNA repair mechanisms correct new mutations that may occur?" In addition to finding clues about the "different ways in which mutations occur", Voight and Aggarwala also tested the application of their model in human diseases, in order to "determine which mutations newly discovered in clinical studies are most likely to cause disease." Provided insights. Such computational and predictive measurements can be used to help discover rare or new genetic variants from follow-up investigations. Voight and Aggarwala focused on a group of autism sequencing studies, looking for genes with excessive new mutations in children with autism. When they applied the model to these data, they discovered an improvement of existing methods that can be used to predict which rare or new mutations are associated with human diseases. Voight said: "We can focus on some possible pathogenic variants in the follow-up work, although we need more work to pinpoint the correct variants and genes for autism or Alzheimer's disease, and these Disease sequencing data is readily available.”

   He not only believes in a large amount of public data, but also carefully and specialized research for a long period of time, as the main influencing factor, can evaluate and improve the mathematical model they proposed. "The exciting part of this work is not only the results we have found, but also the scope of the new problems that we will systematically solve in the next few years. Although it takes time to build a solid foundation, it is The scientific "skyscraper" in the future will definitely persist for a longer period of time and therefore reach a higher height."