On Insertions and Deletions

I’ve been writing about alignment quite a bit lately, since my method of genomics makes use of a very simple alignment that follows the default NIH alignment, which you can see looking at the opening bases of basically any genome. This makes things really simple, and allows you to quickly compare entire genomes. However, I noted that in the case of my method of ancestry analysis, you actually should at least consider the possibility of local alignments, even though it doesn’t seem to matter very much.

I’m now convinced you should not consider local alignments unless you’re looking for genes, insertions, or deletions, because as I suspected, it turns out that insertions and deletions appear to define maternal lines. Moreover, insertions and deletions are associated with drastic changes in behavior and morphology, e.g., Down Syndrome and Williams Syndrome, unlike single-base mutations, which can cause diseases, but it’s obvious that plenty of people differ by many bases over even ideal alignments, so they’re plainly not as important as indels.

Specifically, I wrote an algorithm that iterates over every possible global alignment between two genomes, and for the Iberian Roma population (a nearly perfectly homogenous population), the alignment that maximizes the number of matching bases, when comparing two genomes from that population, is the default NIH alignment. The Iberian Roma are very closely related to the people of Papua New Guinea, and the same is true of them. However, for the Kazakh and Italian populations, this is not the case, with many genomes requiring some changes to alignment, implying insertions and deletions. These insertions and deletions therefore in turn plainly define different maternal lines within a given population, and among populations. As a consequence, again, I think the right method is to fix the global alignment using the default NIH alignment, and then compare entire genomes.

Attached is the dataset, and some code that runs through every possible global alignment.

https://www.dropbox.com/s/ht5g2rqg090himo/mtDNA.zip?dl=0

https://www.dropbox.com/s/4h1myonndzkgfts/Count_Matching_Bases.m?dl=0

https://www.dropbox.com/s/ojmo0kw8a26g3n5/find_sequence_in_genome.m?dl=0

https://www.dropbox.com/s/p22as65hh9brpcv/Find_Seq_CMNDLINE.m?dl=0


Discover more from Information Overload

Subscribe to get the latest posts sent to your email.

Leave a comment