DNA Testing & Genealogy

DNA Testing: Overview

The root meaning of DNA testing today, means reading out portions of the unique genetic code which each of us inherits from our parents. Reading out the whole is usually called, misleadingly: “mapping the genome”—misleading because because the end product is nothing but a set of meaningless letters. The meaning comes in a bit at a time from painstakingly correlating tiny patches of a small portion of this code (the 5% or so that constitute the genes) with the development and manifestation of interesting traits. Thus, in time, the genetic part of the genome might truly be mapped in a general way onto the observed characteristics of species, or even at the detailed level of particular organisms. Those are the goals of classical DNA testing.

In parallel with the grand goal of mapping the genome, other more limited, but also more focused kinds of DNA testing have arisen. For example, where the testing is thorough enough to identify an individual uniquely, or at least to identify him and his closest blood relatives in a way which distinguishes them from all others on the planet, it can have important forensic applications, for example, in conclusively establishing paternity. And a set of tests performed on an individual DNA sample can, if extensive enough, establish a unique genetic fingerprint, and be used as such to place a suspected perpetrator at the scene of a crime, as with the O.J. Simpson evidence.

A different, though overlapping, kind of DNA testing aims to determine whether a set of tested males probably have a common ancestor within the purview of genealogical research.

Genealogy is concerned with working out the ancestries of people alive today, or, more broadly, on reconstructing the tree of descent from a common ancestor of a set of lineage cousins. The tree metaphor might, if we please, be extended to the whole human race, because it can be shown that all males descend from a single Adam, though the existing evidence instructs us that this forefather of us all was very far from being the first male of the homo sapiens species. Or, since man is a social animal, distinguished from the other animals perhaps in more than any other way by the elaborate transmissable body of knowledge and values he shares with his tribe (in a word, “culture”), the descent of the human race may also be conceived in ethnographic terms. But ethnographic family trees are of a scale that far exceeds the scope of any genealogical project.

The kind of DNA testing we are primarily concerned with here as would-be “genetic genealogists” has a far narrower focus: estimating the time back to a common paternal ancestor of two males. And its practical scope is confined to a period that one might call “genealogical time”—the period in each culture since written records began to be kept that document the lives of ordinary men, by name. Thus, genealogical time is roughly coincident, at least in Western cultures, with the time period since hereditary surnames came into general use—usually thought to compass the period1300-1500 in England, for example. To understand how and why DNA testing can be used to predict that two men have a common ancestor who lived not too many centuries ago, we need to review some of the basics of human DNA.

The Basics of Human DNA

In order to explain DNA testing for genealogical purposes, it is first necessary to review some of the basics of human DNA, and it’s replication to produce offspring. Some of the DNA-related terms found in the following sections, and throughout this website, are defined in the glossary in the left column of this page.

Each of us has developed from a single cell containing our unique DNA blueprint. This DNA is organized into 23 paired “chromosomes”, one chromosome of each pair coming directly from the father, and one from the mother. Every cell in our body contains an exact copy of this complete genetic blueprint, except that when we produce germ cells for replication (sperm for men, eggs for women), our separate parental parts mix and recombine in a new and unique way for each sex cell we produce.

22 of these 23 chromosomal pairs are called “autosomal”, and each consists of matching paternal and maternal parts, perfectly aligned. DNA is a blueprint for producing the proteins of life, and two matched, but differing, versions of each gene sets up a genetic competition for determining the offspring’s characteristics, which results in some wins for the father, some for the mother, and a large proportions of compromises.

The remaining chromosomal pair, called the “sex chromosomes”, works quite differently. Instead of a matched pair, we find, at least in males, an odd couple: an X chromosome inherited from the mother, and a runty Y chromosome from the father (I prefer to style these “xChromosome” and “yChromsome”, just as I refer to yDNA, rather than the more conventional “Y-DNA” or “Y DNA”). The yChromosome contains fewer than 100 genes, only 9 of which match to those of the female xChromosome. A large proportion of the remaining yChromosome genes code for the specific attributes of the more specialized male (the female is the default type of the species).

The XX female, like the XY male, inherits one xChromosome from her mother, and the other from her father’s mother, so there is plenty of genetic competition between her two sex chromosome genes. However, since the male yChromosome fails to match up to most of the xChromosome, the male inherits most of his mother’s xChromosome genes as is. This can cause problems where the mother transmits a recessive genetic abnormality from one of her xChromosomes to a son; such an abnormality is hemophilia, which rarely occurs in women, but for those who are carriers, crops up in half of their sons. And of course those other genes on the yChromosome that have no xChromosome counterpart also operate to make men more exceptional. Interestingly, even with the autosomal chromosomes, women’s DNA recombines in a much more homogenized way than for men’s, keeping females much closer to the norms of the species, while males, more prone to extreme differentiation, may be considered nature’s experimental sex.

Why Genetic Genealogists Prefer to Test Male DNA

What matters most for our present purposes about DNA transmission from one generation to the next is that the yChromosome replicates virtually unchanged down the male paternal line. Current models of population genetics hypothesize that all men descend either from a single Adam, or at least a very small set of original progenitors, and women too have their Eves. But if all men descended from the same man, and if the yChromosome never changed at all, then all men would have identical yChromosomes and there would be nothing to be learned from testing. Fortunately for genetic genealogists, mutations creep into the germ cells, or occur during the replication process, and it is these mutations that produce the variations that yDNA testing measures.

I speak here exclusively of yDNA (the DNA of the male yChromosome) only because it is the testing of certain areas of the male yChromosome that has the highest payoff for genealogists. However, there are other kinds of DNA testing, all of which have their interest, and I have more to say about these below.

As it happens, in Western societies, and in many other cultures as well, surname runs with the paternal line, and since tracking surnames is the main preoccupation of genealogists, yDNA testing fits perfectly into their epistemological paradigms. If we test the yDNA of two males with the same surname and find that they are very closely matched, we have strong positive confirmation that they descend from a common ancestor of their patrilineage, while otherwise we may say that although they share a surname, they are probably no more likely to have a common ancestor than if one of them was surnamed Jones, and the other, Smith.

But yDNA can tell us more than just that two males do, or do not, have a common ancestor within the genealogical research horizon (the period since records of individuals began to be kept). Starting from the premises that yDNA is highly stable from generation to generation, but subject to change over very long stretches of time, and at a statistically predictable rate, the differential number of mutations that have accumulated between two tested male yChromosomes (the genetic distance) can serve as a kind of generational clock, measuring the time (in generations) back to their most recent common male ancestor—quite analogous to the archaeologists’s tool, radiocarbon dating. This estimate is called “TMRCA”.

The sensitivity of the clock depends on the average mutation rate at tested marker sites, and across the genome, these occur at a rate that ranges from 1 per billions of generations, to 1 per several hundred generations. The fastest mutations occur in stretches of DNA called microsatellites, and the most rapidly mutating microsatellites are those on the yChromosome, which are known as ySTRs (yChromosome Short Tandem Repeats). However, even with ySTRs that mutate at a rate of once every several hundred generations, it is still obviously necessary to test many of them in order to generate standard sets of markers that change within the narrow time span of genealogical time (roughly the last 400-1000 years). These standard sets of markers are called haplotypes.

It follows from this, that the more markers tested, the more mutations likely to show up, and thus the more finely calibrated would be the resulting TMRCA-measuring generational clock. However, it’s a bit more complicated than that because not all ySTR markers are created equal. In recent years, widely varying mutation rates have been observed across these markers, with some of them running at a mutational rate of 10 times those of the stodgiest ones. Thus, the average mutation rate across the various ySTR marker panels offered by the half-dozen or so yDNA testing companies, are at least as important as the number of markers tested. At present, the best “bang for the buck test” panel, and the one most useful for identifying patrilineage relationships, is the FTDNA 37-marker panel.

Incidentally, the marker sites sampled for the yDNA tests do not involve genes, per se. If they did, they might be subject to natural selection bias that would reduce the predictability of their mutation rates. Only about 5% of the genome actually codes for the genes that define our unique traits. The purpose and function of the rest of the genome, often called "junk DNA", is largely unknown at present.

The Payoffs of yDNA Testing

What kinds of things can genealogists infer from sets of tested marker haplotypes for men bearing the same surname?

Identifying/Disconfirming Your Patrilineage

By patrilineage, I mean the exclusively male line descendants of the earliest male ancestor, the patriarch, who lived within genealogical time—usually the time since thus lineage first came to be identified in the records by a particular hereditary surname, or family name. More loosely, the term may be understood to include all the cousins of these male descendants (all the descendants of the patriarch), male or female.

The concept that I have labelled “patrilineage” is fundamental to genealogy, and I have deliberately given it a restrictive meaning in order to capture that concept. The unqualified term “patrilineage” might more usually be defined as any male-only tree of descent. It might be a tree as foreshortened as a father and his sons, or as deep as the tree of all living males (all of whom, it can be shown, descend from a common male ancestor). However, neither of these extremes, nor indeed the broad sweep of the unqualified term, are of much use to genealogists or family historians.

There is, though, a sense of “patrilineage” from which my usage needs to be distinguished. The MRCA (Most Recent Common Ancestor) of a set of yDNA-tested male cousins is very much the focal point of any current yDNA project, and rightly so. But as new, more remote cousins join the project, the MRCA gets pushed further and further back until it begins to approach the original patriarch of the patrilineage, and it is this wider scope that I think needs to be kept firmly in mind as we proceed with our investigations, be they in the documentary archives, or in the testing lab.

What we can say about yDNA testing is that subsets of reasonably closely matched haplotypes can be grouped into DNA surname patrilineages, comprising a group of men who almost certainly descend not only from a common ancestor, but from an original patriarch of the patrilineage (perhaps even the first bearer of the common hereditary surname) who lived anywhere from several 100 to as many as 1000 years ago. At the same time, and as a corollary of this, a pair of men thought to be closely related on the basis of genealogical research, but whose yDNA is widely disparate, may be said conclusively to be unrelated, at least within the time frame of the genealogical researcher, and thus are of different patrilineages.

yDNA testing is therefore likely to occasionally disconfirm membership in an expected patrilineage group. Although a disruption of one’s established assumptions or conclusions may seem to be a negative outcome of yDNA testing (and it is certainly likely to be at least mildly disturbing), the possibility of such unexpected results actually represents one of the chief benefits of yDNA testing. Where a fair amount of research has been done on a line, and testing is merely confirmatory of membership in an expected patrilineage, it may be reassuring, but it adds only a little to the strength of one’s much more focused genealogical theses. On the other hand, if an unexpected break in the lineage can be demonstrated, it is likely to set one off on new and productive research tracks.

What is the best way to interpret and deal with a failure of one’s yDNA haplotype to fall into an established and expected lineage? The first thing to suspect is that somewhere one has drawn a plausible, but invalid inference from one’s data; consequently, one needs to probe each link of the paternal ancestral chain for weaknesses. And where the lineage one expected to match is itself not throughly established, with a number of matching haplotypes backed by high quality research, the research behind the established lineage needs to be carefully scrutinized as well.

Second, if one is, after all, able to make a solid case from the evidence for each link of the paternal ancestral chain, it may be the evidence itself that is erroneous or incomplete. Record keepers made errors, then as now, which is why one tries to assemble several independent pieces of evidence, rather than relying on just one.

However, the greatest challenge facing all genealogists most of the time is the problem of incomplete records. That is why the BCG Genealogical Proof Standard calls for a “reasonably exhaustive search” of all the possibly pertinent records. Assuming that this standard has been met, the most likely remaining explanation for the failed match is that an NPE) has occurred. An NPE may be due to an adoption, an out-of-wedlock birth, or perhaps just an elective name change, none of which are likely to be reflected in the records. This may be due to inadvertence, or simply to a disinclination to publicize an unfortunate family or personal event.

Incidentally, while an extensive haplotype mismatch (and a careful re-examination of one’s evidence) can raise the possibility that an NPE has occurred, determining whether this is in fact the explanation for the divergence remains a conventional research task. Nor can even a perfect match between two haplotypes conclusively rule out the possibility that an NPE might have occurred—for example, a child might be fathered by a man everyone thought was his uncle, who would almost certainly have yDNA identical to the man everyone thought (mistakenly) was his real father. I’ve recently heard of such a case that turned up in one of the current DNA surname projects.

Estimating TMRCA

Besides disconfirmation, the positive value of yDNA testing is a function of two factors: (1) the number of people of the same patrilineage who have been tested; and (2) the quality of the genealogical research into their ancestral lines.

Independent of any genealogical research, the most that may be said for sure about two men whom yDNA testing show to be of a common patrilineage, is that they have a male ancestor in common within the last, say, 1000 years (at the outside), and the exact degree of their genetic distance (the number of divergent marker mutations between their haplotypes), can yield a rough estimate of the TMRCA, accurate only to within a century or two either way.

A piece of knowledge this imprecise, considered in isolation, doesn’t advance the genealogical enterprise much. However, where there are a number of tested members of a patrilineage, preferably bringing into play several well-researched sub-lineages, one can begin to correlate particular patterns of mutation with particular sub-lineages. And at that point, each addition to the tested pool adds significantly to the accumulated knowledge of the genetic tree, the research tree, or both. That is the ultimate desideratum of these surname DNA projects.

Which is why I’ve defined the prime goal of these DNA surname projects as the bringing together of DNA patrilineage cousins for the purpose of sharing research, and why I try to promote the recruiting of additional male bearers of the common surname to the project, with an emphasis on distant cousins within existing project patrilineages.

Hopefully, we may look forward soon to the day when the costs of testing come down further so that many more bearers of the same surnames may be tested, and on more and/or faster markers than at present. It would also help a great deal if the current mutation rate estimates for individual markers could be more precisely quantified so that the corresponding estimates of TMRCA could be narrowed in range.

In the meantime, these DNA surname projects are already acting as powerful catalysts, not only for the sharing of chunks of heretofore disconnected research, but in promoting new and broader research activities on the part of those interested in particular patrilineages. One reason most genealogists run into “dead ends” sooner than need be is that they don’t cast their nets wide enough. Accomplished (or professional) genealogists know that where the records are scant, it is usually necessary to research everyone of the same surname for a particular time and place. Surname DNA projects have a way of teaching that lession and widening everyone’s research horizons.

Other kinds of DNA Testing: Autosomal Genealogical

This is a brand new kind of DNA testing for genealogical purposes, which opens up the possibility of identifying all of one’s reasonably close cousins and relatives, not just those of the patrilineal and matrilineal lines. These tests scan the autosomal chromosomes, noting the values of hundreds of thousands, or even millions of particular SNPs, assumed to be representative of particular chromosomes. Then, a comparison is made with the set of SNP values of others in the test database, looking for long stretches of SNPs on the same chromosome that are at leach half-identical, and thus indicative of a possible close cousin, or relative relationship.

Because shared inheritance decreases by a factor of 3-1 with each generation, these shared segments decrease rapidly in size and numbers with each generation, and in practice, identification of cousins beyond the 3rd cousin level becomes increasingly problematical. The blocks of half-identical DNA, often called HIRs (or Half-Identical Regions) are chopped into ever smaller fragments due to a phenomenon called crossover, which can affect a particular chromosome each time it is copied for replication to the next generation. The length of HIRs is measured in cMs (centimorgans), or sometimes in the number of SNPs sampled across the HIR, and both the length of the largest HIR shared between two people, and the number of HIRs they share are relevant to assessing the probability of descent from a common ancestor, and in estimating the genetic distance back to that common ancestor.

The two principal companies offering autosomal SNP testing for genealogical purposes, 23andMe, and FTNDA set the threshold of possible significance at 7 cM, or 5 cM, respectively. Across those HIR ranges, both companies test over 500,000 SNPs, all of which have to be half-identical for the segment to qualify as an HIR (although both companies do make a very limited allowance for the occasional testing glitch). Both companies provide to their customers software tools for identifying possible cousin or relative matches in their databases, and for estimating the closeness of the relations between them.

While this approach to genetic genealogy has great potential, it’s value at present is limited by the relatively small sizes of the databases, and by the fact that few pairs of possible tested cousins have both worked out their descendancies sufficiently to identify the ancestor, or ancestors, they have in common. I say “or ancestors”, because in cases where there has been much intermarriage within a small set of families, the estimates of relationship closeness are likely to be skewed as well by the fact that there are probably many different shared ancestors of various degrees.

In the end, the value of autosomal testing, as with all DNA testing is largely dependent on the quantity and quality of the genealogy that has been done. Nonetheless, autosomal testing provides in principle a way for male surname lines that have “daughtered out” to be validated nonetheless, by the targeted testing of one of the surviving females of such a line, and the corresponding autosomal testing of a proven male same-surname descendant of the same line. And as the databases grow in size, and as the genealogical enterprise advances, this value of this kind of testing should increase in proportion. However, the pricing needs to come down from the $300 range where it is currently situated.

Other kinds of DNA Testing: Haplogroups & Clades

Another important kind of yDNA testing is the determination of a man’s patrilineal haplogroup. Although of little use for genealogical purposes, the haplogroup can give one an idea of where a man’s remote male ancestors originally came from going back thousands and tens of thousands of years.

Just as haplotype is determined by testing ySTR microsatellites on the yChromosome, so haplogroup is determined by testing SNP (Single Nucleotide Polymorphisms) sites on the yChromosome. Compared to ySTRs, SNPs mutate very rarely—so rarely that when a ySNP mutation happens to occur in a particular father-son transmission event, it is considered practically unique, and is therefore sometimes called a UEP (Unique Event Polymorphism), although the chances are that many of these mutations aren't really unique, just so rare that its unlikely a second occurence of one will ever be found.

A Terminological Digression: “Haplogroup”, “Clade”, and “Patrilineage”

As I have explained elsewhere, the collection of ySTR values that constitute a man’s haplotype place him fairly reliably within a particular patrilineage, which I have defined narrowly to mean all the male descendants of a man who lived within genealogical time, or roughly the timespan since a particular hereditary surname came into use for that patrilineage. However, I must confess here to having somewhat hijacked the term “patrilineage” to represent this vitally important genealogical concept. In reality, “patrilineage” as it is generally used, has a wider application, meaning all the male descendants of any arbitrarily chosen male. In fact, since it can be shown that all living males descend from a single male yAdam who lived perhaps 40-60,000 years ago, all living males are ipso facto members of the same patrilineage, but at this point the term ceases to have much value.

This broader sense of patrilineage does help one in understanding haplogroups, though, because the first male bearer of a unique SNP mutation on the yChromosome becomes thereby the patriarch of his own patrilineage, and the founder of a new sub-haplogroup. Except that the terms “sub-haplogroup” and “patrilineage” aren’t much used in this context, but another term is: “clade”, or more often “subclade”. The term “clade” also has a broader meaning, but it is used (and understood, in this deep ancestry context) without qualification as a synonym for branching haplogroups, just as I have used patrilineage without qualification to represent the small recent portion of an ancestry that falls within the scope of genealogical research.

But why, exactly, has it been deemed necessary to bring in the alternate term “clade”, when “haplogroup” is meant (both terms being defined in a special restrictive sense)? I think it’s because what we really need to talk about are the way haplogroups are constantly branching off into subhaplogroups, and “subclades” sounds a little less awkward.

And, for that matter, why do we need the terms “haplogroup” or “clade”, when “patrilineage” (defined with a different scope from my “(genealogical) patrilineage” usage) would do as well? I suppose that it’s because as it is, when we see the words “haplogroup” or “clade”, as used by genetic genealogists, they invoke the deep ancestral context defined by SNP testing, just as “patrilineage”, in my usage, is meant to invoke its specifically genealogical meaning.

Back to the Concept of a Haplogroup or Clade

What’s important is to understand the underlying concepts: there is a single tree of descent from one ancient patriarch to all living men, which branches each time a ySNP occurs—a ySNP that we know about. Each such branch point defines a new subclade (or subhaplogroup—take your pick); thus every living man belongs to a set of nested subclades of an original haplogroup. Haplogrouping is a way of classifying a man’s kinship group from the top down.

Meanwhile, classifying a man into a patrilineage on the basis of a set of ySTR marker values called a haplotype represents the bottoms-up approach. Eventually, as more and more ySNPs are found, these approaches may converge in many cases, but in the meantime it is useful to distinguish them, and this can most economically be done by differential terminology. Thus I reserve the term “patrilineage” for genealogical purposes (implying a reference to ySTR testing and haplotypes), and otherwise refer by preference to “clades and subclades” (implying thereby a reference to ySNP testing and haplogroups).

Although the ascertainment of one’s lowest order subclade requires SNP testing in most cases, membership in a more general clade, or haplogroup, can usually be inferred with a high degree of confidence from one’s haplotype. Thus, the general clade, or haplogroup, for the DENNISON DNA Surname Project Patrilineage 1 group is R1b1a2—perhaps the single most common subclade of the R1b haplogroup, shared by about 65-85% of all men who have British ancestry, depending on where in Britain they live. A haplogroup predictor program for inferring broad haplogroup from haplotype is available online, and besides that, the FTDNA testing company is commited to performing free SNP testing for any of its haplotype customers whose broad haplogroup cannot be inferred with confidence from their haplotype; beyond that, FTDNA and other companies offer detailed SNP testing for a more fully resolved subclade determination.

Progress in this haplogroup classification field has been so rapid that new, more recent SNPs (further articulating the tree) are being added constantly. The best way to keep abreast of new developments is to check the ISOGG Haplogroup Tree from time to time. Even the nomenclature has been changing so frequently of recent years, that the old “Henry System” style of nomenclature, in which one of the more articulated branches on R1b has now become R1b1a2a1a1a4a1a1, is giving way to the more compact (and stable) terminology, R-L237, where the “R” refers to the master haplogroup clade, and the “L237” to the most recent mutation in one of its particular branches.

Thus R1b itself has become R-269, and the DENNISON Patrilineage 1 group mentioned above is now best designated, not as R1b1a2a1a1a, but as U106*, with the SNP mutation at U106 being the defining mutation of this haplogroup subclade, and the “*” meaning that all the currently known SNPs downstream of U106 have been tested and come up negative. If a new SNP more recent than U106 were to be discovered and added to the tree, the U106* designation would have to be changed to U106+, to indicate that at least one additional SNP test remains to be performed.

One’s haplogroup can harbor surprises. One DNA Surname project administrator I know who thought that his surname was of German origin, instead came up with a Norse (Viking) haplogroup, while my Robb genealogical patrilineage, which is clearly Scotch-Irish, turned out to come originally from northern Germany, probably part of the wave of Anglo-Saxon settlement that swept southern England in the wake of the Romans in the 4th Century—although my particular ancestors could have come over many hundreds of years either before or after that period.

All this is quite interesting in its own right, though it takes us far afield from genealogy, per se. However, one thing we may infer from the fact that two people with a common surname have different haplogroups, is that they have no common ancestor for at least thousands of years, and thus can hardly be of the same patrilineage.

Other kinds of DNA Testing: Mitochondrial DNA (mtDNA)

Besides the diploid (from two parents) DNA that lies coiled in the cell nucleus in a doubled helical spiral, there is the mitochondrial DNA in the cell’s cytoplasm. Every cell has mitochondria that both liase with the nuclear DNA and act as the high-volume factories of protein production, using copies of the nuclear DNA for much of their factory plan. Mitochondria, which are thus crucial to the life cycle, also have their own DNA blueprints that are independent of the diploid nuclear DNA, and these are inherited directly from the mother, via the egg cell that plays host to the fertilizing sperm.

Thus, analogous to the patrilineal yChromosome, the same mitochondrial DNA that your mother got from her mother, and so forth, is also subject to mutations, which allows one’s maternal line ancestors to be classified into one of a handful of deep matrilineages descended from a small number Eves who lived several tens of thousands of years ago. Mutations in two tested “hypervariable control regions” have been used to define mtDNA haplogroups, which in turn have been mapped onto the human population dispersion out of Africa. Although the mutation rates of mtDNA are not high enough to be genealogically useful, it could be interesting to learn that even though one’s mother’s mother’s ... mother, known to have descended from a family rooted in Britain since the time of the Norman Conquest, nonetheless inherited her mitochondrial DNA from a woman who lived 30,000 years ago in what is today the Russian steppes.

Other kinds of DNA Testing: Ethnographic Testing

Still another kind of DNA testing with a wide time horizon, is ethnographic testing, which doesn’t bother trying to construct a mutational tree of descent, but rather simply samples DNA from all over the genome looking for characteristic markers associated with various ethnic populations. This kind of testing, which takes into account all of ones ancestors, and not just those at either edge of the tree (the purely patrilineal and matrilineal lines) has little to offer genealogy, but it does provide an estimate of the percentage contribution of various ethnic groups to one’s overall ancestry. It is thus one way to explore the popular tradition in many American families of Native American ancestry.

However, there are some caveats that come with this kind of testing. First, unless the Native American ancestor, for example is rather recent, there is a significant chance that in the sampling, his/her DNA will be altogether missed. Second, many Europeans whose ancestors have never left the continent also have Native American ancestry in their makeup. How can that be? Because, besides the original east Asians who crossed the Bering Straits during a period when the land bridge to America opened, and thus became “Indians”, many of their own deep ancestors, out of Africa, went north and west to the middle East and Europe instead of east into Asia, and so influenced the DNA on the opposite side of the world.

Thus, like all the other DNA tests, these ethnographic tests too need to be interpreted in the light of other, more conventional sorts of evidence—in this case, the evidence provided by archeaology.

Paternity and Forensic DNA Testing

This type of testing alone depends on no wider context than that of father and son. And because it aims for the maximum degree of certainty, testing both ySTR sites and SNPs, it is conclusive beyond any sane person’s definition of reasonable doubt. Much of the mutation rate literature relied on by genetic genealogists is predicated on paternity test databases, which have the advantage, thus, of completely eliminating the NPE factor. Paternity testing is itself the father of all the other kinds of DNA testing, with their varied purposes, and it is still the “gold standard” for measuring mutation rates; the only problem is that, like gold, paternity test results are relatively scarce, so they need to be fleshed out by data derived more problematically from genealogical DNA databases, applying sophisticated statistical and the mathematics of probability to try to compensate for the many unknowns in the equations.

*** THIS CONTENT WAS STOLEN FROM JOHNBROBB.COM IN VIOLATION OF COPYRIGHT LAW ***

-

Navigating from here

The menu buttons at top right take you to other pages on this site, while the nav panel above targets other points on this page, or brings up other resources (papers I’ve written, and the like). If you find yourself lost, the browser BACK button will take you back to where you were (some people also have a convenient BACK button on their mouse, right under their thumb). Or hitting the HOME key of your keyboard will take you back to the top of this page where you are now.

A Brief DNA Glossary

Some Key Terms: GD (Genetic Distance)haplotypehaplogroupMRCA (Most Recent Common Ancestor)MHT (Mutation History Tree)NPE (Non-Paternity Event)patrilineageRPH,  and TMRCA.

For a more extensive glossary, see the ISOGG Wiki Glossary

invisible writing

autosomal
pertaining to the numbered human chromosome, 1-22; all the human chromosomes except the “sex chromosomes”, the yChromosome, and the xChromosome

invisible writing

chromosome
one of 46 strands of the complete human DNA that constitute the genetic blueprint for each individual, organized into pairs, with one member of each pair inherited from the father, the other from the mother. 22 of these 23 chromosomal pairs are called autosomal chromosomes, while the remaining pair, made up of the xChromosome and the yChromosome, are called the sex chromosomes. Other species have variant numbers of chromosomes. The chromsomes of an organism taken as a whole are called the “genome”.

invisible writing

clade
a (once) living organism and all of its descendants; in the context of genetic testing of the male yChromosome, a common patriarch and all his male descendants.

invisible writing

deep clade testing
the testing for particular ySNP values to determine a man’s most specific (closest to the present) haplogroup, also called a clade or subclade.

invisible writing

crossover
a process that occurs during the replication of one of a parent’s two chromosomal strands to pass on to the next generation, in which part of the genetic material is taken from the other chromosomal strand instead; since crossover is likely to occur at some point on most chromosomes each generation, over time the segments of DNA passed on from ancestors get smaller and smaller, and eventually frustrate attempts to demonstrate relationship through autosomal DNA testing.

invisible writing

genealogical time
the time period within which genealogical research is possible and practical—roughly coincident with the time since written records began to be kept identifying individuals by name, and especially by hereditary surname.

invisible writing

genetic distance (GD) (in the context of yDNA surname projects)
the number of mutation events that have occurred to a panel of tested ySTR markers in the descent of two male line cousins from their common male ancestor.
     Each generational passing of the male yChromosome from father to son represents a transmission event—an opportunity for one or more mutation events to occur amongst the set of tested ySTR markers on that chromosome, and the GD is a count of the number of mutation events that have occurred down the generations in both male descendants. So, given that the tested markers mutate at a widely varying, but roughly predictable rates, GD provides an estimate of the closeness of the genetic relationship between two male patrilineal cousins.
     Usually, the genetic distance between the ySTR haplotypes of two men is simply the sum of the absolute number marker value differentials (the stepwise mutation model), but a simpler way of measuring GD is to simply count the number of markers that are different (the infinite alleles model), which usually provides a close approximation to the number of mutation events. Markers only occasionally differ by more than one number, and when they do, the current scientific evidence says that this is usually due to multiple mutations to the same marker, but multistep mutations seem to occur about once in 50 mutations; one rare kind of mutation that simultaneously affects the values of several markers is the reclOH mutation event.

invisible writing

half-identical
said of two humans who share at least one allele value at a particular SNP. Long consecutive stretches of half-identical sampled SNPs, measured in CM's (centimorgans, which adjust for the variant rates of crossover in different chromosomes) are indicative of a shared descent from a common ancestor. The term HIR is sometimes used to mean half-identical region, whose length may be quantified either in cMs or in the number of SNPs. The principle testing companies at present, 23andME, and FTDNA, consider anywhere from 5-7 cMs (or about 500-700 SNPs) to be the minimum length to be possibly indicative of a reasonably close cousin relationship.

invisible writing

haplogroup
the deep ancestry of a particular individual
The common male ancestor of the members of a yDNA haplogroup usually goes back many thousands, or even tens of thousands of years. Haplogroups have a branching tree structure, dividing meta-groups like R, called “clades”, into “subclades” like R1b, or R1b1b2, with each subclade branch defined by the particular SNP mutation that occurred in the common male ancestor of members of that subclade. Thus, a subclade like R1b1b2 is defined by the chain of accumulated SNP mutations: M173, M343, P25, P297, M269. In the new terminology, haplogroups are denominated by their most recent SNP; thus R1b1b2 becomes simply R-M269.

invisible writing

haplotype
a set of yDNA/mtDNA marker values associated with a particular individual (haplotypes are only rarely unique)
yDNA marker values (also called alleles) are determined by testing a subset of highly mutable microsatellite sites on the yChromosome called ySTRs.

invisible writing

IBD (Identical By Descent)
obfuscatory jargon for “inherited”, typically used to characterize a particular stretch of DNA that is known to have been inherited from some relatively recent ancestor (and perhaps shared with another descendant), as opposed to the same stretch of DNA that is IBS (Identical By State), meaning simply “identical” between two individuals and not known to have been inherited from a common ancestor.

invisible writing

infinite alleles mutation model
The assumption that each difference between ySTR marker values in a panel of tested ySTR marker values is due to a single mutation, even when there may have been a gain or loss of several repeats. This model of the way mutations work is a considerable simplification of the complex reality of the mutation process, but it provides a reasonable quantitative approximation to it over the period of genealogical time.

invisible writing

microsatellite
a stretch of DNA characterized by multiple repeats of the same 2-6 nucleotide base sequence letters in which the genetic code is written. Miscrosatellites occur throughout the genome, but the ones most useful for genealogical testing purposes are located on the yChromosome.

invisible writing

marker (in the context of DNA testing)
a stretch of DNA whose allele values are sampled as a means of identifying individuals or placing individuals within (deep) patrilineages

invisible writing

MRCA (Most Recent Common Ancestor)
the MRCA is relative to a particular set of yDNA-tested subjects, and is not, therefore, necessarily the same as the ultimate patriarch of a patrilineage, as I have defined it here. As a DNA surname project grows in scope with the addition of more distant patrilineal cousins, the MRCA moves backwards in time and may eventually become identical with the patrilineage patriarch, but even if this does not happen, the patriarch is the ultimate genealogical focus of the project. For that reason projects are best subdivided into patrilineages, rather than clusters of descendants of a more recent common ancestor. Nonetheless, it should be kept in mind that the TMRCA estimates for the current set of patrilineage members all point back only as far as they need to, to coelesce in a common ancestor.

invisible writing

Mutation History Tree
is a schematic tree of descent constructed for a set of descendant haplotypes of the same patrilineage that shows when particular mutations within the patrilineage tree of descent occurred, and thus how the tested members of the set are related. Here is a sample mutation history tree.

invisible writing

NPE (Non-Paternity Event)
in Western cultures, an unexpected disjunction somewhere in the paternal ancestral chain between the inherited surname and the inherited yDNA, due to a replacement of a son’s biological father (with his inherited surname) by a surrogate father with (usually) a different surname. The most frequent cause of NPEs historically was probably adoption, but there are many other possible causes, including out-of-wedlock births. See the final paragraphs of Identifying/Disconfirming Your Patrilineage for more on NPEs.

invisible writing

nucleotide
There are four of these protein bases, denominated “A”, “G”, “C”, and “T”, and they constitute the alphabet of the genetic code

invisible writing

(genealogical) patrilineage
the male line descendants of the earliest male ancestor, the patriarch, who lived within genealogical time.
     The patriarch of a patrilineage, thus defined, is typically the first of his male line to adopt a particular surname and pass it on to his children. The most recent common ancestor (MRCA) of any particular set of yDNA tested descendants is likely to be well downstream of the original patriarch. The methods (and pitfalls) of sorting people into genealogical patrilineages are discussed at length under Identifying/Disconfirming Your Patrilineage

invisible writing

patrilineage cousins
a set of tested or testable (male) paternal line cousins who are members of a patrilineage as defined above; more loosely (for genealogical purposes), any individuals with ancestors belonging to this patrilineage.

invisible writing

reclOH mutation
A rare kind of mutation to a portion of the yChromosome that can affect more than one of a set of ySTR markers that usually mutate separately and independently. Read this article, and this one, to learn more.

invisible writing

repeat
one iteration of a sequence of nucleotide letters that is repeated a number of times to make up a ySTR marker; when the marker mutates, it usually gains or loses a single repeat.

invisible writing

RPH (Root Prototype Haplotype)
the hypothetical haplotype of the ancestor of a common patrilineage
RPH may also be defined as the haplotype of that member of a set of tested patrilineage cousins who is most closely related to all of the others, collectively. For a fuller discussion of RPH (a term, and concept, developed by yours truly), see this paper.

invisible writing

SNP (Single Nucleotide Polymorphism)
an observed difference in allele values between single nucleotides on the chromosomal strands of two individuals of the same species. The term is also used to refer to the paired nucleotides, or "base pair" of the nuclear DNA of an individual of a diploid species, like we humans, who inherit a copy of each chromosome from each of our parents.
     In autosomal testing for genealogical purposes, large numbers of SNPs (base pairs) are sampled across whole chromosomes in two individuals, with the aim of identifying long half-identical stretches that are likely indicative of shared DNA from a common ancestor.

invisible writing

stepwise mutation model
The assumption that each unit of difference between measured ySTR marker values is due to the gain or loss of a single repeat. This model of the way mutations work provides a close approximation to the complex reality of the mutation process.

invisible writing

TMRCA (Time to the Most Recent Common Ancestor)
TMRCA, like genetic distance, is a measure of the closeness of relationship between two haplotypes. TMRCA may be measured in generations, or in years, where the number of years/generation is defined. TMRCA is calculated as a probabilistic function of the number of marker variations between the two haplotypes, and the calculation depends crucially on the estimated mutation rates for the particular markers that constitute the haplotype. Simple TMRCA calculators apply an average mutation rate across the marker panel, while more sophisticated calculators take account of which particular markers have mutated; if all the variant markers are fast ones, a closer relationship is indicated than if some of them are slow mutators. Another factor that may be taken into consideration is to adjust for the positive knowledge that there is no common ancestor back a certain number of generations from the present; this factor has the effect of pushing TMRCA farther back into the past. See my paper Deconstructing TMRCA & Genetic Distance for an extended discussion of TMRCA and GD (Genetic Distance).

invisible writing

transmission event
the event of male parentage in which the yChromosome of the father is replicated, with the possibility of mutations, and passed on to a son.

invisible writing

yChromosome (or “Y Chromosome”)
the yChromosome is that one of the 23 paired human chromosomes that is possessed only by the male, and which is handed down virtually unchanged to each of his sons.

invisible writing

yDNA (or “Y-DNA”)
the DNA of the male yChromosome, which is said to be “non-recombinant” because (except for a tiny “pseudoautosomal” region containing 9 genes) it cannot combine with its odd couple partner, a female xChromosome.

invisible writing

ySNP (Single Nucleotide Polymorphism)
a single nucleotide on the male yChromosome for which a mutation has been found to occur; because such ySNP mutations occur so infrequently, they are used to mark branch points in the male descendancy from the original yAdam.

invisible writing

ySTR (Short Tandem Repeat)
a type of yDNA sequence composed of multiple repeats of the same multi-nucleotide sequence; these areas are also called microsatellites. Sets of these ySTRs are preferred for constructing yDNA haplotypes because they mutate much faster than single point (SNP) loci. Several hundred of these ySTR sites have been identified but only 100 or so are currently being tested, and unfortunately, reliable mutation rate data exist for only a minority of these.


Last updated 19Nov2011
© John Barrett Robb
Valid XHTML 1.0 Strict Valid CSS 2.1

*** THIS PAGE WAS STOLEN FROM JOHNBROBB.COM IN VIOLATION OF COPYRIGHT LAW ***