DNA Testing & Genealogy

DNA Testing: Overview

The root meaning of DNA testing today, means reading out portions of the unique genetic code which each of us inherits from our parents. Reading out the whole is usually called, misleadingly: “mapping the genome”—misleading because because the end product is nothing but a set of meaningless letters. The meaning comes in a bit at a time from painstakingly correlating tiny patches of a small portion of this code (the 5% or so which constitute the genes) with the development and manifestation of interesting traits. Thus, in time, the genetic part of the genome might truly be mapped in a general way onto the observed characteristics of species, or even at the detailed level of particular organisms. Those are the goals of classical DNA testing.

In parallel with the grand goal of mapping the genome, other more limited, but also more focused kinds of DNA testing have arisen. For example, where the testing is thorough enough to identify an individual uniquely, or at least to identify him and his closest blood relatives in a way which distinguishes them from all others on the planet, it can have important forensic applications, for example, in conclusively establishing paternity. And a set of tests performed on an individual DNA sample can, if extensive enough, establish a unique genetic fingerprint, and be used as such to place a suspected perpetrator at the scene of a crime, as with the O.J. Simpson evidence.

A different, though overlapping, kind of DNA testing aims to determine whether a set of tested males probably have a common ancestor within the purview of genealogical research.

Genealogy is concerned with working out the ancestries of people alive today, or, more broadly, on reconstructing the tree of descent from a common ancestor of a set of lineage cousins. The tree metaphor might, if we please, be extended to the whole human race, because it can be shown that all males descend from a single Adam, though the existing evidence instructs us that this forefather of us all was very far from being the first male of the homo sapiens species. Or, since man is a social animal, distinguished from the other animals perhaps in more than any other way by the elaborate transmissable body of knowledge and values he shares with his tribe (in a word, “culture”), the descent of the human race may also be conceived in ethnographic terms. But ethnographic family trees are of a scale which far exceeds the scope of any genealogical project.

The kind of DNA testing we are primarily concerned with here as would-be “genetic genealogists” has a far narrower focus: estimating the time back to a common paternal ancestor of two males. And its practical scope is confined to a period which one might call “genealogical time”—the period in each culture since written records began to be kept which document the lives of ordinary men, by name. Thus, genealogical time is roughly coincident, at least in Western cultures, with the time period since hereditary surnames came into general use—usually thought to compass the period1300-1500 in England, for example. To understand how and why DNA testing can be used to predict that two men have a common ancestor who lived not too many centuries ago, we need to review some of the basics of human DNA.

The Basics of Human DNA

In order to explain DNA testing for genealogical purposes, it is first necessary to review some of the basics of human DNA, and it’s replication to produce offspring. Some of the DNA-related terms found in the following sections, and throughout this website, are defined in the glossary in the left column of this page.

Each of us has developed from a single cell containing our unique DNA blueprint. This DNA is organized into 23 paired “chromosomes”, one chromosome of each pair coming directly from the father, and one from the mother. Every cell in our body contains an exact copy of this complete genetic blueprint, except that when we produce germ cells for replication (sperm for men, eggs for women), our separate parental parts mix and recombine in a new and unique way for each sex cell we produce.

22 of these 23 chromosomal pairs are called “autosomal”, and each consists of matching paternal and maternal parts, perfectly aligned. DNA is a blueprint for producing the proteins of life, and two matched, but differing, versions of each gene sets up a genetic competition for determining the offspring’s characteristics, which results in some wins for the father, some for the mother, and a large proportions of compromises.

The remaining chromosomal pair, called the “sex chromosomes”, works quite differently. Instead of a matched pair, we find, at least in males, an odd couple: an X chromosome inherited from the mother, and a runty Y chromosome from the father (I prefer to style these “xChromosome” and “yChromsome”, just as I refer to yDNA, rather than the more conventional “Y-DNA” or “Y DNA”). The yChromosome contains fewer than 100 genes, only 9 of which match to those of the female xChromosome. A large proportion of the remaining yChromosome genes code for the specific attributes of the more specialized male (the female is the default type of the species).

The XX female, like the XY male, inherits one xChromosome from her mother, and the other from her father’s mother, so there is plenty of genetic competition between her two sex chromosome genes. However, since the male yChromosome fails to match up to most of the xChromosome, the male inherits most of his mother’s xChromosome genes as is. This can cause problems where the mother transmits a recessive genetic abnormality from one of her xChromosomes to a son; such an abnormality is hemophilia, which rarely occurs in women, but for those who are carriers, crops up in half of their sons. And of course those other genes on the yChromosome which have no xChromosome counterpart also operate to make men more exceptional. Interestingly, even with the autosomal chromosomes, women’s DNA recombines in a much more homogenized way than for men’s, keeping females much closer to the norms of the species, while males, more prone to extreme differentiation, may be considered nature’s experimental sex.

Why Genetic Genealogists Prefer to Test Male DNA

What matters most for our present purposes about DNA transmission from one generation to the next is that the yChromosome replicates virtually unchanged down the male paternal line. Current models of population genetics hypothesize that all men descend either from a single Adam, or at least a very small set of original progenitors, and women too have their Eves. But if all men descended from the same man, and if the yChromosome never changed at all, then all men would have identical yChromosomes and there would be nothing to be learned from testing. Fortunately for genetic genealogists, mutations creep into the germ cells, or occur during the replication process, and it is these mutations which produce the variations which yDNA testing measures.

I speak here exclusively of yDNA (the DNA of the male yChromosome) only because it is the testing of certain areas of the male yChromosome which has the highest payoff for genealogists. However, there are other kinds of DNA testing, all of which have their interest, and I have more to say about these below.

As it happens, in Western societies, and in many other cultures as well, surname runs with the paternal line, and since tracking surnames is the main preoccupation of genealogists, yDNA testing fits perfectly into their epistemological paradigms. If we test the yDNA of two males with the same surname and find that they are very closely matched, we have strong positive confirmation that they descend from a common ancestor of their patrilineage, while otherwise we may say that although they share a surname, they are probably no more likely to have a common ancestor than if one of them was surnamed Jones, and the other, Smith.

But yDNA can tell us more than just that two males do, or do not, have a common ancestor within the genealogical research horizon (the period since records of individuals began to be kept). Starting from the premises that yDNA is highly stable from generation to generation, but subject to change over very long stretches of time, and at a statistically predictable rate, the differential number of mutations which have accumulated between two tested male yChromosomes (the genetic distance) can serve as a kind of generational clock, measuring the time (in generations) back to their most recent common male ancestor—quite analogous to the archaeologists’s tool, radiocarbon dating. This estimate is called “TMRCA”.

The sensitivity of the clock depends on the average mutation rate at tested marker sites, and across the genome, these occur at a rate which ranges from 1 per billions of generations, to 1 per several hundred generations. The fastest mutations occur in stretches of DNA called microsatellites, and the most rapidly mutating microsatellites are those on the yChromosome, which are known as ySTRs (yChromosome Short Tandem Repeats). However, even with ySTRs which mutate at a rate of once every several hundred generations, it is still obviously necessary to test many of them in order to generate standard sets of markers which change within the narrow time span of genealogical time (roughly the last 400-1000 years). These standard sets of markers are called haplotypes.

It follows from this, that the more markers tested, the more mutations likely to show up, and thus the more finely calibrated would be the resulting TMRCA-measuring generational clock. However, it’s a bit more complicated than that because not all ySTR markers are created equal. In recent years, widely varying mutation rates have been observed across these markers, with some of them running at a mutational rate of 10 times those of the stodgiest ones. Thus, the average mutation rate across the various ySTR marker panels offered by the half-dozen or so yDNA testing companies, are at least as important as the number of markers tested. At present, the best “bang for the buck test” panel, and the one most useful for identifying patrilineage relationships, is the FTDNA 37-marker panel.

Incidentally, the marker sites sampled for the yDNA tests do not involve genes, per se. If they did, they might be subject to natural selection bias which would reduce the predictability of their mutation rates. Only about 5% of the genome actually codes for the genes which define our unique traits. The purpose and function of the rest of the genome, often called "junk DNA", is largely unknown at present.

The Payoffs of yDNA Testing

What kinds of things can genealogists infer from sets of tested marker haplotypes for men bearing the same surname?

Identifying/Disconfirming Your Patrilineage

By patrilineage, I mean the exclusively male line descendants of the earliest male ancestor, the patriarch, who lived within genealogical time—usually the time since thus lineage first came to be identified in the records by a particular hereditary surname, or family name. More loosely, the term may be understood to include all the cousins of these male descendants (all the descendants of the patriarch), male or female.

The concept which I have labelled “patrilineage” is fundamental to genealogy, and I have deliberately given it a restrictive meaning in order to capture that concept. The unqualified term “patrilineage” might more usually be defined as any male-only tree of descent. It might be a tree as foreshortened as a father and his sons, or as deep as the tree of all living males (all of whom, it can be shown, descend from a common male ancestor). However, neither of these extremes, nor indeed the broad sweep of the unqualified term, are of much use to genealogists or family historians.

There is, though, a sense of “patrilineage” from which my usage needs to be distinguished. The MRCA (Most Recent Common Ancestor) of a set of yDNA-tested male cousins is very much the focal point of any current yDNA project, and rightly so. But as new, more remote cousins join the project, the MRCA gets pushed further and further back until it begins to approach the original patriarch of the patrilineage, and it is this wider scope which I think needs to be kept firmly in mind as we proceed with our investigations, be they in the documentary archives, or in the testing lab.

What we can say about yDNA testing is that subsets of reasonably closely matched haplotypes can be grouped into DNA surname patrilineages, comprising a group of men who almost certainly descend not only from a common ancestor, but from an original patriarch of the patrilineage (perhaps even the first bearer of the common hereditary surname) who lived anywhere from several 100 to as many as 1000 years ago. At the same time, and as a corollary of this, a pair of men thought to be closely related on the basis of genealogical research, but whose yDNA is widely disparate, may be said conclusively to be unrelated, at least within the time frame of the genealogical researcher, and thus are of different patrilineages.

yDNA testing is therefore likely to occasionally disconfirm membership in an expected patrilineage group. Although a disruption of one’s established assumptions or conclusions may seem to be a negative outcome of yDNA testing (and it is certainly likely to be at least mildly disturbing), the possibility of such unexpected results actually represents one of the chief benefits of yDNA testing. Where a fair amount of research has been done on a line, and testing is merely confirmatory of membership in an expected patrilineage, it may be reassuring, but it adds only a little to the strength of one’s much more focused genealogical theses. On the other hand, if an unexpected break in the lineage can be demonstrated, it is likely to set one off on new and productive research tracks.

What is the best way to interpret and deal with a failure of one’s yDNA haplotype to fall into an established and expected lineage? The first thing to suspect is that somewhere one has drawn a plausible, but invalid inference from one’s data; consequently, one needs to probe each link of the paternal ancestral chain for weaknesses. And where the lineage one expected to match is itself not throughly established, with a number of matching haplotypes backed by high quality research, the research behind the established lineage needs to be carefully scrutinized as well.

Second, if one is, after all, able to make a solid case from the evidence for each link of the paternal ancestral chain, it may be the evidence itself which is erroneous or incomplete. Record keepers made errors, then as now, which is why one tries to assemble several independent pieces of evidence, rather than relying on just one.

However, the greatest challenge facing all genealogists most of the time is the problem of incomplete records. That is why the BCG Genealogical Proof Standard calls for a “reasonably exhaustive search” of all the possibly pertinent records. Assuming that this standard has been met, the most likely remaining explanation for the failed match is that an NPE) has occurred. An NPE may be due to an adoption, an out-of-wedlock birth, or perhaps just an elective name change, none of which are likely to be reflected in the records. This may be due to inadvertence, or simply to a disinclination to publicize an unfortunate family or personal event.

Incidentally, while an extensive haplotype mismatch (and a careful re-examination of one’s evidence) can raise the possibility that an NPE has occurred, determining whether this is in fact the explanation for the divergence remains a conventional research task. Nor can even a perfect match between two haplotypes conclusively rule out the possibility that an NPE might have occurred—for example, a child might be fathered by a man everyone thought was his uncle, who would almost certainly have yDNA identical to the man everyone thought (mistakenly) was his real father. I’ve recently heard of such a case which turned up in one of the current DNA surname projects.

Estimating TMRCA

Besides disconfirmation, the positive value of yDNA testing is a function of two factors: (1) the number of people of the same patrilineage who have been tested; and (2) the quality of the genealogical research into their ancestral lines.

Independent of any genealogical research, the most that may be said for sure about two men whom yDNA testing show to be of a common patrilineage, is that they have a male ancestor in common within the last, say, 1000 years (at the outside), and the exact degree of their genetic distance (the number of divergent marker mutations between their haplotypes), can yield a rough estimate of the TMRCA, accurate only to within a century or two either way.

A piece of knowledge this imprecise, considered in isolation, doesn’t advance the genealogical enterprise much. However, where there are a number of tested members of a patrilineage, preferably bringing into play several well-researched sub-lineages, one can begin to correlate particular patterns of mutation with particular sub-lineages. And at that point, each addition to the tested pool adds significantly to the accumulated knowledge of the genetic tree, the research tree, or both. That is the ultimate desideratum of these surname DNA projects.

Which is why I’ve defined the prime goal of these DNA surname projects as the bringing together of DNA patrilineage cousins for the purpose of sharing research, and why I try to promote the recruiting of additional male bearers of the common surname to the project, with an emphasis on distant cousins within existing project patrilineages.

Hopefully, we may look forward soon to the day when the costs of testing come down further so that many more bearers of the same surnames may be tested, and on more and/or faster markers than at present. It would also help a great deal if the current mutation rate estimates for individual markers could be more precisely quantified so that the corresponding estimates of TMRCA could be narrowed in range.

In the meantime, these DNA surname projects are already acting as powerful catalysts, not only for the sharing of chunks of heretofore disconnected research, but in promoting new and broader research activities on the part of those interested in particular patrilineages. One reason most genealogists run into “dead ends” sooner than need be is that they don’t cast their nets wide enough. Accomplished (or professional) genealogists know that where the records are scant, it is usually necessary to research everyone of the same surname for a particular time and place. Surname DNA projects have a way of teaching that lession and widening everyone’s research horizons.

Other kinds of DNA Testing: Autosomal Genealogical

This is a brand new kind of DNA testing for genealogical purposes, which opens up the possibility of identifying all of one’s cousins and close relatives, not just those of the patrilineal and matrilineal lines. These tests scan the full genome, except for the yChromosome (and in some cases except for the xChromosome) noting the values of hundreds of thousands, or even millions of particular SNPs, assumed to be representative of particular chromosomes. Then, a comparison is made with the set of SNP values of others in the test database, looking for long stretches of SNPs on the same chromosome which are identical (as one would expect across a set of siblings), or “half-identical” (meaning that it is possible that they share a common ancestor) across a set of cousins who may go back any number of generations. Although matching stretches of half-identical SNPs only indicate the possibility that two haplotypes share identical values by virtue of inheritance from a common ancestor (IBD), rather than by chance (IBS), if the matched stretches are long enough, the probability that they could match by chance becomes vanishingly small.

Not surprisingly, the length of half-identical matches is proportional to the closeness of the genetic relationship. Nonetheless some people have managed to confirm relationships with others as distantly related as 6th or 7th cousins. The catch, of course, is that the genealogical value of these tests, like the genealogical value of the yChromosome tests, depends entirely on the quantity and quality of the relevant genealogical research by both tested, and matching, parties.

In order to confirm a match to an autosomal cousin it is necessary that both have researched the same particular ancestors, and often additional testing of collateral relatives in order to to determine whether the match is through one’s father’s ancestry, or through one’ mother’s. As one's ancestral tree goes farther and farther back, one encounters an increasing proportion of cousin marriages which reinforce the likelihood of shared autosomal segments between descendants, but also increase the difficulty of determining exactly which are the ancestors shared. In fact, in endogamous populations, like the various German-American religious groups, determining relationships through specific common ancestors can become dauntingly challenging, and require many additional tests.

Unfortunately, at present, these tests, which are offered by several companies (23andMe, deCODEme, and FTDNA) are not cheap. If and when prices come down dramatically, at least to the level of ySTR haplotype testing (the $100-200 range), and as the number of testees, and more important, the quantity and quality of their research, grows, autosomal testing, as an adjunct to yChromosome testing, can be expected to make a major contribution to genealogical research. However, since the primary records on which sound genealogical research is based reflect the patrilineal surname structure of the underlying society, I believe the principal benefits of genealogical testing are going to continue to be reaped from ySTR testing of the male yChromosome.

Other kinds of DNA Testing: Haplogroups & Clades

Another important kind of yDNA testing is the determination of a man’s patrilineal haplogroup. Although not of much very useful for genealogical purposes, the haplogroup can give one an idea of where a man’s remote male ancestors originally came from going back thousands and tens of thousands of years.

Just as haplotype is determined by testing ySTRs microsatellites on the yChromosome, so haplogroup is determined by testing SNP (Single Nucleotide Polymorphisms) sites on the yChromosome. Compared to ySTRs, SNPs mutate very rarely—so rarely that when a ySNP mutation happens to occur in a particular father-son transmission event, it is considered practically unique, and is therefore sometimes called a UEP (Unique Event Polymorphism), although the chances are that many of these mutations aren't really unique, just so rare that its unlikely a second occurence of one will ever be found.

A Terminological Digression: “Haplogroup”, “Clade”, and “Patrilineage”

As I have explained elsewhere, the collection of ySTR values which constitutes a man’s haplotype place him fairly reliably within a particular patrilineage, which I have defined narrowly to mean all the male descendants of a man who lived within genealogical time, or roughly the timespan since a particular hereditary surname came into use for that patrilineage. However, I must confess here to having somewhat hijacked the term “patrilineage” to represent this vitally important genealogical concept. In reality, “patrilineage” as it generally used has a wider application, meaning all the male descendants of any arbitrarily chosen male. In fact, since it can be shown that all living males descend from a single male who lived perhaps 40-60,000 years ago, all living males are ipso facto members of the same patrilineage, but at this point the term ceases to have much value.

This broader sense of patrilineage does help one in understanding haplogroups, though, because the first male bearer of a unique SNP mutation on the yChromosome becomes thereby the patriarch of his own patrilineage, and the founder of a new sub-haplogroup. Fortunately, however, the term “patrilineage” isn't often used in this context (as a quasi-synonym for haplogroup), but another term is: “clade”, or more often “subclade”. The term “clade” also has a broader meaning, but it is used (and understood, in this deep ancestry context) without qualification as a synonym for haplogroup, just as I have used patrilineage without qualification to represent that portion of an ancestry which is within the scope of genealogical research.

But why, exactly, has it been deemed necessary to bring in the alternate term “clade”, when “haplogroup” is meant (both terms being defined in a special restrictive sense)? I think it’s because what we really need to talk about are the way haplogroups are constantly branching off into subhaplogroups, and “subclades” sounds a little less awkward.

And, for that matter, why do we need the terms “haplogroup” or “clade”, when “patrilineage” (defined with a different scope from my “(genealogical) patrilineage” usage) would do as well? I suppose that it’s because as it is, when we see the words “haplogroup” or “clade”, as used by genetic genealogists, they invoke the deep ancestral context defined by SNP testing, just as “patrilineage”, in my usage, is meant to invoke its specifically genealogical meaning.

Back to the Concept of a Haplogroup or Clade

What’s important is to understand the underlying concepts: there is a single tree of descent from one ancient patriarch to all living men, which has branch points each time a ySNP has occurred (that we know about). Each such branch point defines a new subclade (or subhaplogroup—take your pick); thus every living man belongs to a telescoping set of haplogroups, each defined by an additional ySNP mutation as we more closely approach the present. Classifying a man’s relationship cohort in this way might be called the top-down approach.

Meanwhile, classifying a man into a patrilineage on the basis of a set of ySTR marker values called a haplotype represents the bottoms-up approach. Eventually, as more and more ySNPs are found, these approaches may converge in many cases, but in the meantime it is useful to distinguish them, and this can most economically be done differential terminology. Thus I reserve “patrilineage” for genealogical purposes (implying a reference to ySTR testing and haplotypes), and otherwise refer by preference to “clades and subclades” (implying thereby a reference to ySNP testing and haplogroups).

Although the ascertainment of one’s lowest order subclade requires SNP testing in most cases, membership in a more general clade, or haplogroup, can usually be inferred with a high degree of confidence from one’s haplotype. Thus, the general clade, or haplogroup, for the DENNISON DNA Surname Project Patrilineage 1 group is R1b1b2—perhaps the single most common subclade of the R1b haplogroup, shared by about 65-85% of all men who have British ancestry, depending on where in Britain they live. A haplogroup predictor program for inferring broad haplogroup from haplotype is available online, and besides that, the FTDNA testing company is commited to performing free SNP testing for any of its haplotype customers whose broad haplogroup cannot be inferred with confidence from their haplotype; beyond that, FTDNA and other companies offer detailed SNP testing for a more fully resolved subclade determination.

Progress in this field has been so rapid that the terminology itself seems to change every year or two. In the latest twist, the old haplogroup designations have been replaced by the first letter of the major haplogroup family, suffixed by the lowest-order (most recent, and most recently discovered) SNP mutation of an individual’s particular line. Thus, the broad DENNISON Patrilineage 1 haplogroup, R1b1b2, has now become R-M269, with the SNP mutation at M269 being the defining mutation of this haplogroup. No deep clade testing has yet been done on this line, so it’s likely that there are more recent SNPs than M269, and for that reason the more accurate designator is R-M269+ (if we knew that M269 was the most recent mutation as a result of deep clade testing, the designator would be R-M269*.

One’s haplogroup can harbor surprises. One DNA Surname project administrator I know who thought that his surname was of German origin, instead came up with a Norse (Viking) haplogroup, while my Robb patrilineage, which is clearly Scotch-Irish, turned out to come originally from northern Germany, probably part of the wave of Anglo-Saxon settlement which swept southern England in the wake of the Romans in the 4th Century—although my particular ancestors could have come over many hundreds of years either before or after that period.

All this is quite interesting in its own right, though it takes us far afield from genealogy, per se. However, one thing we may infer from the fact that two people with a common surname have different haplogroups, is that they have no common ancestor for at least thousands of years, and thus cannot be of the same patrilineage.

Other kinds of DNA Testing: Mitochondrial DNA (mtDNA)

Besides the diploid (from two parents) DNA which lies coiled in the cell nucleus in a doubled helical spiral, there is the mitochondrial DNA in the cell’s cytoplasm. Every cell has mitochondria which both liase with the nuclear DNA and act as the high-volume factories of protein production, using copies of the nuclear DNA for much of their factory plan. Mitochondria, which are thus crucial to the life cycle, also have their own DNA blueprints which are independent of the diploid nuclear DNA, and these are inherited directly from the mother, via the egg cell which plays host to the fertilizing sperm.

Thus, analogous to the patrilineal yChromosome, the same mitochondrial DNA which your mother got from her mother, and so forth, is also subject to mutations, which allows one’s maternal line ancestors to be classified into one of a handful of deep matrilineages descended from a small number Eves who lived several tens of thousands of years ago. Mutations in two tested “hypervariable control regions” have been used to define mtDNA haplogroups, which in turn have been mapped onto the human population dispersion out of Africa. Although the mutation rates of mtDNA are not high enough to be genealogically useful, it could be interesting to learn that even though one’s mother’s mother’s … mother, known to have descended from a family rooted in Britain since the time of the Norman Conquest, nonetheless inherited her mitochondrial DNA from a woman who lived 30,000 years ago in what is today the Russian steppes.

Other kinds of DNA Testing: Ethnographic Testing

Still another kind of DNA testing with a wide time horizon, is ethnographic testing, which doesn’t bother trying to construct a mutational tree of descent, but rather simply samples DNA from all over the genome looking for characteristic markers associated with various ethnic populations. This kind of testing, which takes into account all of ones ancestors, and not just those at either edge of the tree (the purely patrilineal and matrilineal lines) has little to offer genealogy, but it does provide an estimate of the percentage contribution of various ethnic groups to one’s overall ancestry. It is thus one way to explore the popular tradition in many American families of Native American ancestry.

However, there are some caveats which come with this kind of testing. First, unless the Native American ancestor, for example is rather recent, there is a significant chance that in the sampling, his/her DNA will be altogether missed. Second, many Europeans whose ancestors have never left the continent also have Native American ancestry in their makeup. How can that be? Because, besides the original east Asians who crossed the Bering Straits during a period when the land bridge to America opened, and thus became “Indians”, many of their own deep ancestors, out of Africa, went north and west to the middle East and Europe instead of east into Asia, and so influenced the DNA on the opposite side of the world.

Thus, like all the other DNA tests, these ethnographic tests too need to be interpreted in the light of other, more conventional sorts of evidence—in this case, the evidence provided by archeaology.

Paternity and Forensic DNA Testing

This type of testing alone depends on no wider context than that of father and son. And because it aims for the maximum degree of certainty, testing both ySTR sites and SNPs, it is conclusive beyond any sane person’s definition of reasonable doubt. Much of the mutation rate literature relied on by genetic genealogists is predicated on paternity test databases, which have the advantage, thus, of completely eliminating the NPE factor. Paternity testing is itself the father of all the other kinds of DNA testing, with their varied purposes, and it is still the “gold standard” for measuring mutation rates; the only problem is that, like gold, paternity test results are relatively scarce, so they need to be fleshed out by data derived more problematically from genealogical DNA databases, applying sophisticated statistical and the mathematics of probability to try to compensate for the many unknowns in the equations.

-

Navigating from here

The menu buttons at top right take you to other pages on this site, while the nav panel above targets other points on this page, or brings up other resources (papers I’ve written, and the like). If you find yourself lost, the browser BACK button will take you back to where you were (some people also have a convenient BACK button on their mouse, right under their thumb). Or hitting the Home key of your keyboard will take you back to the top of this page where you are now.

A Brief DNA Glossary

For a more extensive glossary, see Kerchner’s Glossary of DNA Terms

chromosome
one of 46 strands of the complete human DNA which constitute the genetic blueprint for each individual, organized into pairs, with one member of each pair inherited from the father, the other from the mother. 22 of these 23 chromosomal pairs are called autosomal chromosomes, while the remaining pair, made up of the xChromosome and the yChromosome, are called the sex chromosomes. Other species have variant numbers of chromosomes. The chromsomes of an organism taken as a whole are called the “genome”.

clade
a (once) living organism and all of its descendants; in the context of genetic testing of the male yChromosome, a common patriarch and all his male descendants.

deep clade testing
the testing for particular ySNP values to determine a man’s most specific (closest to the present) haplogroup, also called a clade.

genealogical time
the time period within which genealogical research is possible and practical—roughly coincident with the time since written records began to be kept identifying individuals by name, and especially by hereditary surname.

genetic distance (GD) (in the context of yDNA surname projects)
the number of single step marker value differences between two haplotypes. An alternate way of measuring GD is to simply count the number of markers which are different, rather than factoring in also the number of steps by which they are different (this way of measuring presupposes the “infinite alleles” model). Markers only occasionally differ by more than one number, and when they do, the jury is still out as to whether this is due to a single two-step mutation, or to more than one single-step mutation. For now, I think the latter is more likely.

haplogroup
the deep ancestry of a particular individual
The common male ancestor of the members of a yDNA haplogroup usually goes back many thousands, or even tens of thousands of years. Haplogroups have a branching tree structure, dividing meta-groups like R, called “clades”, into “subclades” like R1b, or R1b1b2, with each subclade branch defined by the particular SNP mutation which occurred in the common male ancestor of members of that subclade. Thus, a subclade like R1b1b2 is defined by the chain of accumulated SNP mutations: M173, M343, P25, P297, M269. In the new terminology, haplogroups are denominated by their most recent SNP; thus R1b1b2 becomes simply R-M269.

haplotype
a set of yDNA/mtDNA marker values associated with a particular individual (haplotypes are only rarely unique)
yDNA marker values (also called alleles) are determined by testing a subset of highly mutable microsatellite sites on the yChromosome called ySTRs.

IBD (Identical By Descent)
obfuscatory jargon for “inherited”, typically used to characterize a particular stretch of DNA which is known to have been inherited from some relatively recent ancestor (and perhaps shared with another descendant), as opposed to the same stretch of DNA which is IBS (Identical By State), meaning simply “identical” between two individuals and not known to have been inherited from a common ancestor.

microsatellite
a stretch of DNA characterized by multiple repeats of the same 2-6 nucleotide base sequence letters in which the genetic code is written. Miscrosatellites occur throughout the genome, but the ones most useful for genealogical testing purposes are located on the yChromosome.

marker (in the context of DNA testing)
a stretch of DNA whose allele values are sampled as a means of identifying individuals or placing individuals within (deep) patrilineages

MRCA (Most Recent Common Ancestor)
the MRCA is relative to a particular set of yDNA-tested subjects, and is not, therefore, necessarily the same as the ultimate patriarch of a patrilineage, as I have defined it here. As a DNA surname project grows in scope with the addition of more distant patrilineal cousins, the MRCA moves backwards in time and may eventually become identical with the patrilineage patriarch, but even if this does not happen, the patriarch is the ultimate genealogical focus of the project. For that reason projects are best subdivided into patrilineages, rather than clusters of descendants of a more recent common ancestor. Nonetheless, it should be kept in mind that the TMRCA estimates for the current set of patrilineage members all point back only as far as they need to, to coelesce in a common ancestor.

NPE (Non-Paternity Event)
in Western cultures, an unexpected disjunction somewhere in the paternal ancestral chain between the inherited surname and the inherited yDNA, due to a replacement of a son’s biological father (with his inherited surname) by a surrogate father with (usually) a different surname. The most frequent cause of NPEs historically was probably adoption, but there are many other possible causes, including out-of-wedlock births. See the final paragraphs of Identifying/Disconfirming Your Patrilineage for more on NPEs.

nucleotide
There are four of these protein bases, denominated “A”, “G”, “C”, and “T”, and they constitute the alphabet of the genetic code

(genealogical) patrilineage
the male line descendants of the earliest male ancestor, the patriarch, who lived within genealogical time.
     The patriarch of a patrilineage, thus defined, is typically the first of his male line to adopt a particular surname and pass it on to his children. The most recent common ancestor (MRCA) of any particular set of yDNA tested descendants is likely to be well downstream of the original patriarch. The methods (and pitfalls) of sorting people into genealogical patrilineages are discussed at length under Identifying/Disconfirming Your Patrilineage

patrilineage cousins
a set of tested or testable (male) paternal line cousins who are members of a patrilineage as defined above; more loosely (for genealogical purposes), any individuals with ancestors belonging to this patrilineage.

RPH (Root Prototype Haplotype)
the hypothetical haplotype of the ancestor of a common patrilineage
RPH may also be defined as the haplotype of that member of a set of tested patrilineage cousins who is most closely related to all of the others, collectively. For a fuller discussion of RPH (a term, and concept, developed by yours truly), see this paper.

SNP (Single Nucleotide Polymorphism)
a single nucleotide anywhere in the human chromosomes (including the autosomal chromosomes) in which a mutation has been found to occur; because such SNP mutations occur so infrequently, they are used to mark branch points in the descendancies from the most ancient male or female common ancestors.

TMRCA (Time to the Most Recent Common Ancestor)
TMRCA, like genetic distance, is a measure of the closeness of relationship between two haplotypes. TMRCA may be measured in generations, or in years, where the number of years/generation is defined. TMRCA is calculated as a probabilistic function of the number of marker variations between the two haplotypes, and the calculation depends crucially on the estimated mutation rates for the particular markers which constitute the haplotype. Simple TMRCA calculators apply an average mutation rate across the marker panel, while more sophisticated calculators take account of which particular markers have mutated; if all the variant markers are fast ones, a closer relationship is indicated than if some of them are slow mutators. Another factor which may be taken into consideration is to adjust for the positive knowledge that there is no common ancestor back a certain number of generations from the present; this factor has the effect of pushing TMRCA farther back into the past. See my paper Deconstructing TMRCA & Genetic Distance for an extended discussion of TMRCA and GD (Genetic Distance).

transmission event
the event of male parentage in which the yChromosome of the father is replicated, with the possibility of mutations, and passed on to a son.

yChromosome (or “Y Chromosome”)
the yChromosome is that one of the 23 paired human chromosomes which is possessed only by the male, and which is handed down virtually unchanged to each of his sons.

yDNA (or “Y-DNA”)
the DNA of the male yChromosome, which is said to be “non-recombinant” because (except for a tiny “pseudoautosomal” region containing 9 genes) it cannot combine with its odd couple partner, a female xChromosome.

ySNP (Single Nucleotide Polymorphism)
a single nucleotide on the male yChromosome for which a mutation has been found to occur; because such ySNP mutations occur so infrequently, they are used to mark branch points in the male descendancy from the original yAdam.

ySTR (Short Tandem Repeat)
a type of yDNA sequence composed of multiple repeats of the same multi-nucleotide sequence; these areas are also called microsatellites. Sets of these ySTRs are preferred for constructing yDNA haplotypes because they mutate much faster than single point (SNP) loci. Several hundred of these ySTR sites have been identified but only 100 or so are currently being tested, and unfortunately, reliable mutation rate data exist for only a minority of these.


Last updated 1Mar2010
© John Barrett Robb
Valid XHTML 1.0 Strict Valid CSS 2.1