DNA Testing & Genealogy

DNA Testing: Overview

The root meaning of DNA testing today, means reading out portions of the unique genetic code which each of us inherits from our parents. Reading out the whole is usually called, misleadingly: “mapping the genome”—misleading because because the end product is nothing but a set of meaningless letters. The meaning comes in a bit at a time from painstakingly correlating tiny patches of a small portion of this code (the 5% or so that constitute the genes) with the development and manifestation of interesting traits. Thus, in time, the genetic part of the genome might truly be mapped in a general way onto the observed characteristics of species, or even at the detailed level of particular organisms. Those are the goals of classical DNA testing.

In parallel with the grand goal of mapping the genome, other more limited, but also more focused kinds of DNA testing have arisen. For example, where the testing is thorough enough to identify an individual uniquely, or at least to identify him and his closest blood relatives in a way which distinguishes them from all others on the planet, it can have important forensic applications, for example, in conclusively establishing paternity. And a set of tests performed on an individual DNA sample can, if extensive enough, establish a unique genetic fingerprint, and be used as such to place a suspected perpetrator at the scene of a crime, as with the O.J. Simpson evidence.

A different, though overlapping, kind of DNA testing aims to determine whether a set of tested males probably have a common ancestor within the purview of genealogical research.

Genealogy is concerned with working out the ancestries of people alive today, or, more broadly, on reconstructing the tree of descent from a common ancestor of a set of lineage cousins. The tree metaphor might, if we please, be extended to the whole human race, because it can be shown that all males descend from a single Adam, though the existing evidence instructs us that this forefather of us all was very far from being the first male of the homo sapiens species. Or, since man is a social animal, distinguished from the other animals perhaps in more than any other way by the elaborate transmissable body of knowledge and values he shares with his tribe (in a word, “culture”), the descent of the human race may also be conceived in ethnographic terms. But ethnographic family trees are of a scale that far exceeds the scope of any genealogical project.

The kind of DNA testing we are primarily concerned with here as would-be “genetic genealogists” has a far narrower focus: estimating the time back to a common paternal ancestor of two males. And its practical scope is confined to a period that one might call “genealogical time”—the period in each culture since written records began to be kept that document the lives of ordinary men, by name. Thus, genealogical time is roughly coincident, at least in Western cultures, with the time period since hereditary surnames came into general use—usually thought to compass the period1300-1500 in England, for example. To understand how and why DNA testing can be used to predict that two men have a common ancestor who lived not too many centuries ago, we need to review some of the basics of human DNA.

The Basics of Human DNA

In order to explain DNA testing for genealogical purposes, it is first necessary to review some of the basics of human DNA, and it’s replication to produce offspring. Some of the DNA-related terms found in the following sections, and throughout this website, are defined in the glossary in the left column of this page.

Each of us has developed from a single cell containing our unique DNA blueprint. This DNA is organized into 23 paired “chromosomes”, one chromosome of each pair coming directly from the father, and one from the mother. Every cell in our body contains an exact copy of this complete genetic blueprint, except that when we produce germ cells for replication (sperm for men, eggs for women), our separate parental parts mix and recombine in a new and unique way for each sex cell we produce.

22 of these 23 chromosomal pairs are called “autosomal”, and each consists of matching paternal and maternal parts, perfectly aligned. DNA is a blueprint for producing the proteins of life, and two matched, but differing, versions of each gene sets up a genetic competition for determining the offspring’s characteristics, which results in some wins for the father, some for the mother, and a large proportions of compromises.

The remaining chromosomal pair, called the “sex chromosomes”, works quite differently. Instead of a matched pair, we find, at least in males, an odd couple: an X chromosome inherited from the mother, and a runty Y chromosome from the father (I prefer to style these “xChromosome” and “yChromosome”, just as I refer to ySTR, rather than the more conventional “Y-DNA” or “Y DNA”). The yChromosome contains fewer than 100 genes, only 9 of which match to those of the female xChromosome, and most the remaining genes and the rest of the yChromosome are concerned with the regulation of the developmental process that produces the male variation from the standard (default) female genotype.

The XX female, like the XY male, inherits one xChromosome from her mother, and the other from her father’s mother, so there is plenty of genetic competition between her two sex chromosome genes. However, since the male yChromosome fails to match up to most of the xChromosome, the male inherits most of his mother’s xChromosome genes as is. This can cause problems where the mother transmits a recessive genetic abnormality from one of her xChromosomes to a son; such an abnormality is hemophilia, which rarely occurs in women, but for those who are carriers, crops up in half of their sons. And of course those other genes on the yChromosome that have no xChromosome counterpart also operate to make men more exceptional. Interestingly, even with the autosomal chromosomes, women’s DNA recombines in a much more homogenized way than for men’s, keeping females much closer to the norms of the species, while males, more prone to extreme differentiation, may be considered nature’s experimental sex.

Why Genetic Genealogists Prefer to Test Male ySTR DNA

What matters most for our present purposes about DNA transmission from one generation to the next is that the yChromosome replicates virtually unchanged down the male paternal line. Current models of population genetics hypothesize that all men descend either from a single Adam, or at least a very small set of original progenitors, and women too have their Eves. But if all men descended from the same man, and if the yChromosome never changed at all, then all men would have identical yChromosomes and there would be nothing to be learned from testing. Fortunately for genetic genealogists, mutations creep into the germ cells, or occur during the replication process, and it is these mutations that produce the variations that ySTR testing measures.

I speak here exclusively of ySTR (the DNA of the male yChromosome) only because it is the testing of certain areas of the male yChromosome that has the highest payoff for genealogists. However, there are other kinds of DNA testing, all of which have their interest, and I have more to say about these below.

As it happens, in Western societies, and in many other cultures as well, surname runs with the paternal line, and since tracking surnames is the main preoccupation of genealogists, ySTR testing fits perfectly into their epistemological paradigms. If we test the ySTR of two males with the same surname and find that they are very closely matched, we have strong positive confirmation that they descend from a common ancestor of their patrilineage, while otherwise we may say that although they share a surname, they are probably no more likely to have a common ancestor than if one of them was surnamed Jones, and the other, Smith.

But ySTR can tell us more than just that two males do, or do not, have a common ancestor within the genealogical research horizon (the period since records of individuals began to be kept). Starting from the premises that ySTR is highly stable from generation to generation, but subject to change over very long stretches of time, and at a statistically predictable rate, the differential number of mutations that have accumulated between two tested male yChromosomes (the genetic distance) can serve as a kind of generational clock, measuring the time (in generations) back to their most recent common male ancestor—quite analogous to the archaeologists’s tool, radiocarbon dating. This estimate is called “TMRCA”.

The sensitivity of the clock depends on the average mutation rate at tested marker sites, and across the genome, these occur at a rate that ranges from 1 per billions of generations, to 1 per several hundred generations. The fastest mutations occur in stretches of DNA called microsatellites, and the most rapidly mutating microsatellites are those on the yChromosome, which are known as ySTRs (yChromosome Short Tandem Repeats). However, even with ySTRs that mutate at a rate of once every several hundred generations, it is still obviously necessary to test many of them in order to generate standard sets of markers that change within the narrow time span of genealogical time (roughly the last 400-1000 years). These standard sets of markers are called haplotypes.

It follows from this, that the more markers tested, the more mutations likely to show up, and thus the more finely calibrated would be the resulting TMRCA-measuring generational clock. However, it’s a bit more complicated than that because not all ySTR markers are created equal. In recent years, widely varying mutation rates have been observed across these markers, with some of them running at a mutational rate of 10 times those of the stodgiest ones. Thus, the average mutation rate across the various ySTR marker panels offered by the half-dozen or so ySTR testing companies, are at least as important as the number of markers tested. At present, the best “bang for the buck test” panel, and the one most useful for identifying patrilineage relationships, is the FTDNA 37-marker panel.

Incidentally, the marker sites sampled for the ySTR tests do not involve genes, per se. If they did, they might be subject to natural selection bias that would reduce the predictability of their mutation rates. Only about 5% of the genome actually codes for the genes that define our unique traits. The purpose and function of the rest of the genome, often called "junk DNA", is largely unknown at present.

The Payoffs of ySTR Testing

What kinds of things can genealogists infer from sets of tested marker haplotypes for men bearing the same surname?

Identifying, or Confirming, Your Patrilineage

The term “patrilineage” generally means the set of (male) patrilineal descendants of a common patriarch. However, it can reasonably be appropriated for purposes of ySTR DNA-based genealogy to mean the set of patrilineal descendants of a patriarch who lived within the period of genealogical time—usually the time since the patrilineage first came to be identified in the records by a particular hereditary surname, or family name.

However, I would argue that the term “genealogical time ”should be loosely construed to mean the mere possibility of identifying a remote patrilineal patriarch, either by name, or by fixing him in time and place. Thus, the DNA-identified Uí Néill group - the descendancy of the semi-legendary 4th century Irish patriarch, Niall Noigiallach (“Niall of the Nine Hostages”), might be considered a patrilineage in the DNA-based genealogical context, even though there seems little or no prospect of ever tracing a particular genealogical line back to him.

I advocate for this loose construction of “genealogical time” and “patrilineage” because knowing something about the ultimate DNA-based patriarch of a patrilineage, or about the hereditary surname that he bore, can provide clues or leads to records-based genealogical investigation. Thus, if one’s haplotype falls within the broadly construed patrilineage of the Uí Néill clan, the chances are very high that an ancestor who lived during the period when surname-based records were kept can be traced back to the small set of counties in northern Ireland or southwestern Scotland where that clan once prevailed, and where as many as 15% of the population falls into this broad patrilineage.

I have deliberately chosen an extreme example to illustrate the benefits of broadly construing patrilineage, and in general the concept should be restricted to cover just the period when hereditary surnames, or at least unique bynames, began to come into general use, raising at least the possibility of tracing one’s particular ancestor back through records. And because the advent of hereditary surnames was so disjointed, and so variable from country to country, it is almost the usual case that patrilineages thus broadly construed will comprise multiple surnames. Indeed, patrilineal descendants of the Uí Néill clan bear scores of different surnames, though in the more typical case, where classical NPE (Non-Paternity Events) have occurred since the adoption of a permanent hereditary surnames, one can expect at least several different surnames but not as many as 10.

In classifying haplotypes by patrilineage I therefore give “genealogical time ”a broad construction to bring to one’s specific, records-based, genealogical research as much of the historical context as may be useful. Thus, for England, where permanent hereditary surnames began to appear in the 12th century, became moderately common in the 13th, and were the norm by about 1400, I allow patrilineages to run back as far as the 12th century or so, even though the 2009 King & Jobling study, “Founders, Drift, and Infidelity: The Relationship between Y Chromosome Diversity and Patrilineal Surnames” has given us reason to suppose that many English surname lines are shallowly rooted, going back only a few hundred years. Even in such cases, given the remarkable tendency of English families to remain rooted in the same local area over many centuries, there are excellent prospects of uncovering local records that document the occurence of NPEs, as when a man inherited landed property through his wife and changed his surname accordingly. By the same token, where a deeply rooted (broadly construed) genealogical patrilineage bundles together several surnames into the same DNA-patriline, the less-common surnames that may be known to have orginated in certain areas may useful focus one’s research on the more common (and more recent) surnames of the same patrilineage.

In any patrilineage project based on ySTR haplotypes it’s important to recognize that the immediate focus of the project must be, not on the ultimate patriarch who first assumed a particular hereditary surname but on the actual MRCA (Most Recent Common Ancestor) of the particular set of tested haplotypes. However, over time, as new and more far-flung cousins (perhaps of different surnames) are tested and brought into the fold, the project MRCA may be pushed farther back into the shadowy realm where records begin to fail. My argument is that this should be regarded, not as a calamity, but as an opportunity that may open up fruitful new research territory.

Disconfirming Your Patrilineage

ySTR testing can be expected occasionally to disconfirm membership in an expected patrilineage group. Although a disruption of one’s established assumptions or conclusions may seem to be a negative outcome of ySTR testing (and it is certainly likely to be at least mildly disturbing), the possibility of such unexpected results actually represents one of the chief benefits of ySTR testing. Where a fair amount of research has been done on a line, and testing is merely confirmatory of membership in an expected patrilineage, it may be reassuring, but it adds only a little to the strength of one’s much more focused genealogical theses. On the other hand, if an unexpected break in the lineage can be demonstrated, it is likely to set one off on new and productive research tracks.

What is the best way to interpret and deal with a failure of one’s ySTR haplotype to fall into an established and expected lineage? The first thing to suspect is that somewhere one has drawn a plausible, but invalid inference from one’s data; consequently, one needs to probe each link of the paternal ancestral chain for weaknesses. And where the lineage one expected to match is itself not throughly established, with a number of matching haplotypes backed by high quality research, the research behind the established lineage needs to be carefully scrutinized as well.

Second, if one is, after all, able to make a solid case from the evidence for each link of the paternal ancestral chain, it may be the evidence itself that is erroneous or incomplete. Record keepers made errors, then as now, which is why one tries to assemble several independent pieces of evidence, rather than relying on just one.

However, the greatest challenge facing all genealogists most of the time is the problem of incomplete records. That is why the BCG Genealogical Proof Standard calls for a “reasonably exhaustive search” of all the possibly pertinent records. Assuming that this standard has been met, the most likely remaining explanation for the failed match is that an NPE) has occurred. An NPE may be due to an adoption, an out-of-wedlock birth, or perhaps just an elective name change, none of which are likely to be reflected in the records. This may be due to inadvertence, or simply to a disinclination to publicize an unfortunate family or personal event.

Incidentally, while an extensive haplotype mismatch (and a careful re-examination of one’s evidence) can raise the possibility that an NPE has occurred, determining whether this is in fact the explanation for the divergence remains a conventional research task. Nor can even a perfect match between two haplotypes conclusively rule out the possibility that an NPE might have occurred—for example, a child might be fathered by a man everyone thought was his uncle, who would almost certainly have ySTR identical to the man everyone thought (mistakenly) was his real father. I’ve recently heard of such a case that turned up in one of the current DNA surname projects.

Making the Most of your ySTR test: Patrilineage Projects
—The Uselessness of TMRCA Estimates

To address the latter topic first, although FTDNA and others make much of pairwise probabalistic TMRCA estimates that purport to show when the most recent common patrilineal ancestor of two members of a patrilineage lived, the most that can be said for such estimates, without factoring in the genealogy, is that they are likely to be accurate only to with a century or two either way. An estimate that imprecise is effectly useless for genealogical purposes. The problem is that these estimates are based on the number of mutational divergences that have accrued since the two lines split off from the common patriarch but the mutational process is so variable and sporadic that a sample of, say, two 37-marker haplotypes over 8 generations yields only a 2 x 37 x 8 = 592 opportunities for mutation, which is far too small a sample size to keep the variance down.

However, if instead, an analysis is made of the mutational differences across a pairwise matrix of all the members of a large patrilineage group, and especially if these have been extended to 67, or better 111, markers, TMRCA estimates for the MRCA of all the members can achieve genealogically useful accuracy. Thus, for example, with a set of 15 haplotypes who have extended to 111 markers, over 8 generations there are 15 x 111 x 8 = 13,320 opportunities for mutation (a sample size over 20x larger than the previous example) an estimate of when their patriarch probably lived can be narrowed down to a generation or three.

The rule here is that the more tested members of a patrilineage the merrier, and it applies not only to TMRCA calculations, but also to the much more fruitful approach to DNA analysis that has been touched on briefly above under the head “Why Genetic Genealogists Prefer to Test Male ySTR DNA”. This is the search for shared mutations within the set of patrilineage haplotypes that may be indicative of particular family sub-branches. But there is much more to making the most of one’s confirmed membership in a particular patrilineage than just DNA analysis.

The principal value of ySTR DNA testing is to bring together as many as possible of the serious genealogists for a particular patrilineage, for sharing and collaborative research; and that purpose can best be served by the organization of a DNA-based patrilineage project. Moreover, the value of the project to its members depends on two factors: (1) the number of tested members; and (2) the quality and extent of the genealogical research they have arrived at, either through their own efforts or by finding quality published research, for their particular ancestral line.

On the DNA side, the more patrilineage project members who have tested or extended their haplotypes to a particular level, the more likely it is that mutations shared by two or more members will turn up—shared mutations that more often than not mean that those who share them belong to a particular family sub-branch of the extended patrilineage.

The chances of discovering shared markers, like the accuracy of TMRCA estimates, is primarily a function of the number of members who have extended their haplotypes to a particular level, with the depth of their separated lineages being a secondary factor. The standard 37-marker test is usually sufficient to yield some shared mutations given enough haplotypes, but greatly increasing the number of markers tested, ideally to 111, is that much better.

Unfortunately, all or most members must extend to reap the benefits of the additional markers, and that can be expensive. However, a collective synchronized extension effort can be planned to coincide with FTDNA’s traditional December sale, and strategic selection can cut down the number of haplotypes that actually need to be extended in order to identify shared mutations that are far enough upstream to benefit the membership at large. If such upstream markers are found, certain other haplotypes can then be extended on a case by case basis, and with some markers, FTDNA offers inexpensive individual marker tests, that obviate the need for a full extension.

If your ySTR test has resulted in a few reported patrilineage matches (and all the matches that FTDNA reports at 37 markers or better can be considerred patrilineage cousins, regardless of their possibly divergent surnames) the best way to capitalize genealogically on your test results is to organize or join a specific patrilineage project for your line, not just the omnibus project for the surname in general. Any number of patrilineage cousins can benefit from contacting, and collaborating with, each other genealogically, and by forming a patrilineage project, though the opportunities for discovering shared mutations indicative of particular family sub-branches begins to emerge only when there are 5-7 haplotypes to work with, and as always, the more the better. Thus, successful patrilineage projects need to constantly on the lookup for ways to increase their membership.

In fact, so important is the acquisition and testing of new members, particularly of distant cousins of existing members, that it can be worthwhile to focus on a likely brother or patrilineal cousin of a known early ancestor of surname X, try to trace this possible patrilineal relative down to a living male descendant surnamed X, and offer to sponsor a 37-marker test for him. If the test shows that this person is indeed a patrilineal cousin you will have made a significant addition to the genealogical knowledge of your patriline, and also quite likely helped to clarify the mutational pattern that marks your particular family branch. If the test results do not match, you will have eliminated a red herring from genealogical consideration, and contributed to both the genealogical and DNA knowledge of the other surname X patrilineage. Such genealogically-directed pre-emptive ySTR testing is likely to contribute more to your genealogical knowledge than merely extending your haplotype to 67 or 111 markers.

In summary, a focused patrilineage project can bring together the best and most knowledgeable genealogists of the patriline, identify the best published resources, and promote both collaborative research and planned testing projects. Once enough members of a patrilineage project have accrued, mutational patterns begin to emerge that characterize particular family branches, and synchronized upgrades of strategically selected project haplotypes to 67 or 111 markers can be planned to discover more shared mutations. A patrilineage project website can serve as a place to post the ancestral pedigrees of all the members, based on the work of the best and most knowledgeable genealogists for the line, and the internet presence established thereby can be an effective method of attracting additional patrilineage cousins to ySTR testing and project membership.

The ALLEN (I) Patrilineage Project, and the several patrilineage projects linked to the FTDNA DENNISON surname project, exemplify successful patrilineage projects of various sizes.

Other kinds of DNA Testing: Autosomal Genealogical

This is a brand new kind of DNA testing for genealogical purposes, which opens up the possibility of identifying all of one’s reasonably close cousins and relatives, not just those of the patrilineal and matrilineal lines. These tests scan the autosomal chromosomes, noting the values of hundreds of thousands, or even millions of particular SNPs, assumed to be representative of particular chromosomes. Then, a comparison is made with the set of SNP values of others in the test database, looking for long stretches of SNPs on the same chromosome that are at leach half-identical, and thus indicative of a possible close cousin, or relative relationship.

Because shared inheritance decreases by a factor of 3-1 with each generation, these shared segments decrease rapidly in size and numbers with each generation, and in practice, identification of cousins beyond the 3rd cousin level becomes increasingly problematical. The blocks of half-identical DNA, often called HIRs (or Half-Identical Regions) are chopped into ever smaller fragments due to a phenomenon called crossover, which can affect a particular chromosome each time it is copied for replication to the next generation. The length of HIRs is measured in cMs (centimorgans), or sometimes in the number of SNPs sampled across the HIR, and both the length of the largest HIR shared between two people, and the number of HIRs they share are relevant to assessing the probability of descent from a common ancestor, and in estimating the genetic distance back to that common ancestor.

The two principal companies offering autosomal SNP testing for genealogical purposes, 23andMe, and FTNDA set the threshold of possible significance at 7 cM, or 5 cM, respectively. Across those HIR ranges, both companies test over 500,000 SNPs, all of which have to be half-identical for the segment to qualify as an HIR (although both companies do make a very limited allowance for the occasional testing glitch). Both companies provide to their customers software tools for identifying possible cousin or relative matches in their databases, and for estimating the closeness of the relations between them.

While this approach to genetic genealogy has great potential, it’s value at present is limited by the relatively small sizes of the databases, and by the fact that few pairs of possible tested cousins have both worked out their descendancies sufficiently to identify the ancestor, or ancestors, they have in common. I say “or ancestors”, because in cases where there has been much intermarriage within a small set of families, the estimates of relationship closeness are likely to be skewed as well by the fact that there are probably many different shared ancestors of various degrees.

In the end, the value of autosomal testing, as with all DNA testing is largely dependent on the quantity and quality of the genealogy that has been done. Nonetheless, autosomal testing provides in principle a way for male surname lines that have “daughtered out” to be validated nonetheless, by the targeted testing of one of the surviving females of such a line, and the corresponding autosomal testing of a proven male same-surname descendant of the same line. And as the databases grow in size, and as the genealogical enterprise advances, this value of this kind of testing should increase in proportion. However, the pricing needs to come down from the $300 range where it is currently situated.

Other kinds of DNA Testing: Haplogroups & Clades

Another important kind of ySTR testing is the determination of a man’s patrilineal haplogroup. Although of little use for genealogical purposes, the haplogroup can give one an idea of where a man’s remote male ancestors originally came from going back thousands and tens of thousands of years.

Just as haplotype is determined by testing ySTR microsatellites on the yChromosome, so haplogroup is determined by testing SNP (Single Nucleotide Polymorphisms) sites on the yChromosome (or ySNPs). Compared to ySTRs, ySNPs mutate very rarely—so rarely that when a ySNP mutation happens to occur in a particular father-son transmission event, it is considered practically unique, and is therefore sometimes called a UEP (Unique Event Polymorphism), although the chances are that many of these mutations aren't really unique, just so rare that its unlikely a second occurence of one will ever be found.

A Terminological Digression: “Haplogroup”, “Clade”, and “Patrilineage”

As I have explained elsewhere, the collection of ySTR values that constitute a man’s haplotype place him fairly reliably within a particular patrilineage, which I have defined narrowly to mean all the male descendants of a man who lived within genealogical time, or roughly the timespan since a particular hereditary surname came into use for that patrilineage. However, I must confess here to having somewhat hijacked the term “patrilineage” to represent this vitally important genealogical concept. In reality, “patrilineage” as it is generally used, has a wider application, meaning all the male descendants of any arbitrarily chosen male. In fact, since it can be shown that all living males descend from a single male yAdam who lived perhaps 40-60,000 years ago, all living males are ipso facto members of the same patrilineage, but at this point the term ceases to have much value.

This broader sense of patrilineage does help one in understanding haplogroups, though, because the first male bearer of a unique SNP mutation on the yChromosome becomes thereby the patriarch of his own patrilineage, and the founder of a new sub-haplogroup. Except that the terms “sub-haplogroup” and “patrilineage” aren’t much used in this context, but another term is: “clade”, or more often “subclade”. The term “clade” also has a broader meaning, but it is used (and understood, in this deep ancestry context) without qualification as a synonym for branching haplogroups, just as I have used patrilineage without qualification to represent the small recent portion of an ancestry that falls within the scope of genealogical research.

But why, exactly, has it been deemed necessary to bring in the alternate term “clade”, when “haplogroup” is meant (both terms being defined in a special restrictive sense)? I think it’s because what we really need to talk about are the way haplogroups are constantly branching off into subhaplogroups, and “subclades” sounds a little less awkward.

And, for that matter, why do we need the terms “haplogroup” or “clade”, when “patrilineage” (defined with a different scope from my “(genealogical) patrilineage” usage) would do as well? I suppose that it’s because as it is, when we see the words “haplogroup” or “clade”, as used by genetic genealogists, they invoke the deep ancestral context defined by SNP testing, just as “patrilineage”, in my usage, is meant to invoke its specifically genealogical meaning.

Back to the Concept of a Haplogroup or Clade

What’s important is to understand the underlying concepts: there is a single tree of descent from one ancient patriarch to all living men, which branches each time a ySNP occurs—a ySNP that we know about. Each such branch point defines a new subclade (or subhaplogroup—take your pick); thus every living man belongs to a set of nested subclades of an original haplogroup. Haplogrouping is a way of classifying a man’s kinship group from the top down.

Meanwhile, classifying a man into a patrilineage on the basis of a set of ySTR marker values called a haplotype represents the bottoms-up approach. Eventually, as more and more ySNPs are found, these approaches may converge in many cases, but in the meantime it is useful to distinguish them, and this can most economically be done by differential terminology. Thus I reserve the term “patrilineage” for genealogical purposes (implying a reference to ySTR testing and haplotypes), and otherwise refer by preference to “clades and subclades” (implying thereby a reference to ySNP testing and haplogroups).

Although the ascertainment of one’s lowest order subclade requires SNP testing in most cases, membership in a more general clade, or haplogroup, can usually be inferred with a high degree of confidence from one’s haplotype. Thus, the general clade, or haplogroup, for the DENNISON DNA Surname Project Patrilineage 1 group is R1b1a2—perhaps the single most common subclade of the R1b haplogroup, shared by about 65-85% of all men who have British ancestry, depending on where in Britain they live. A haplogroup predictor program for inferring broad haplogroup from haplotype is available online, and besides that, the FTDNA testing company is commited to performing free SNP testing for any of its haplotype customers whose broad haplogroup cannot be inferred with confidence from their haplotype; beyond that, FTDNA and other companies offer detailed SNP testing for a more fully resolved subclade determination.

Progress in this haplogroup classification field has been so rapid that new, more recent SNPs (further articulating the tree) are being added constantly. The best way to keep abreast of new developments is to check the ISOGG Haplogroup Tree from time to time. Even the nomenclature has been changing so frequently of recent years, that the old “Henry System” style of nomenclature, in which one of the more articulated branches on R1b has now become R1b1a2a1a1a4a1a1, is giving way to the more compact (and stable) terminology, R-L237, where the “R” refers to the master haplogroup clade, and the “L237” to the most recent mutation in one of its particular branches.

Thus R1b itself has become R-269, and the DENNISON Patrilineage 1 group mentioned above is now best designated, not as R1b1a2a1a1a, but as U106*, with the SNP mutation at U106 being the defining mutation of this haplogroup subclade, and the “*” meaning that all the currently known SNPs downstream of U106 have been tested and come up negative. If a new SNP more recent than U106 were to be discovered and added to the tree, the U106* designation would have to be changed to U106+, to indicate that at least one additional SNP test remains to be performed.

One’s haplogroup can harbor surprises. One DNA Surname project administrator I know who thought that his surname was of German origin, instead came up with a Norse (Viking) haplogroup, while my Robb genealogical patrilineage, which is clearly Scotch-Irish, turned out to come originally from northern Germany, probably part of the wave of Anglo-Saxon settlement that swept southern England in the wake of the Romans in the 4th Century—although my particular ancestors could have come over many hundreds of years either before or after that period.

All this is quite interesting in its own right, though it takes us far afield from genealogy, per se. However, one thing we may infer from the fact that two people with a common surname have different haplogroups, is that they have no common ancestor for at least thousands of years, and thus can hardly be of the same patrilineage.

Other kinds of DNA Testing: Mitochondrial DNA (mtDNA)

Besides the diploid (from two parents) DNA that lies coiled in the cell nucleus in a doubled helical spiral, there is the mitochondrial DNA in the cell’s cytoplasm. Every cell has mitochondria that both liase with the nuclear DNA and act as the high-volume factories of protein production, using copies of the nuclear DNA for much of their factory plan. Mitochondria, which are thus crucial to the life cycle, also have their own DNA blueprints that are independent of the diploid nuclear DNA, and these are inherited directly from the mother, via the egg cell that plays host to the fertilizing sperm.

Thus, analogous to the patrilineal yChromosome, the same mitochondrial DNA that your mother got from her mother, and so forth, is also subject to mutations, which allows one’s maternal line ancestors to be classified into one of a handful of deep matrilineages descended from a small number Eves who lived several tens of thousands of years ago. Mutations in two tested “hypervariable control regions” have been used to define mtDNA haplogroups, which in turn have been mapped onto the human population dispersion out of Africa. Thus, the general patterning of mtDNA, ascertainable with even a minimal mtDNA test can tell you which branch of the deep ancestral human population tree your mother’s remote matrilineal ancestor got her mtDNA from. Given the known articulation of this tree so far, it’s likely that your matriline diverged from the main trunk of the tree many thousands, or even tens of thousands of years ago. My own mtDNA haplotype, I1a1, is general but fairly rare (about 1%) throughout the Middle East, the Causcasus, Europe to Scandinavia, and extends even into Africa and Eurasia.

However, this form of testing has only limited genealogical value—in fact virtually none, I would say, unless you order the FMS (Full Mitochondrial Sequence) test from Family Tree DNA. FTDNA claims that a perfect match to someone else on your full mtDNA genome predicts a Most Recent Common (matrilineal) Ancestor in common with your match who lived within the last 22 generations (95% confidence interval), and a 50-50 chance that she will have been born within the last 5 Since this places the common ancestor most likely within the span of genealogical time, in principle this test could be used to identify common matrilineal ancestors in the same way that ySTR haplotype tests can be used to determine that two tested males are, or are not, of the same patrilineage.

But there is more to it than that. In the first place, the other numbers you will find in FTDNA’s referenced article are misleading. The range given for the 50-95% confidence intervals, 5-22 generation, doesn’t mean 125-550 years. The average length of a patrilineal generation is about 34 years, and for a matrilineal generation its 29 (see this paper), which would translate to a woman born, say, 1805 (50% chance), or at least one born by 1300 (95% chance).

Worse, few genealogists have been able to trace their earliest known matrilineal ancestor back more than a few generations. My own goes back just 8, to a woman born in 1719 (for an average generational length of 29), and I have been able to trace back that far only because the line runs back through New England, where complete vital records are available almost back to the very beginnings of settlement. In examining some of the mtDNA matches for people in my DNA projects, I happened to notice that the earliest known matrilineal ancestor of Elizabeth Shown Mills, who has largely southern roots, goes back only to a woman born about 1750 in North Carolina, a much more difficult area to research because vital records or their surrogates are so scant. The problem is that there are far fewer records that name women by their maiden names (often just a birth record and a marriage record, if that) than for men, and with female surnames changing every generation, the trail goes quickly cold.

Although the prospects for linking up with others are thus much dimmer for descendants of a common matriline, there is no reason in principle why those with very tight matches (say either 0 or 1 mutational differences at the FMS level of testing) might not form matrilineage groups with posted matrilineal genealogies that if they can be extended deeply enough, might point to a common matriarchal origin.

Beyond these practical genealogical considerations, you will find plenty of material on mtDNA and its mutation process by exploiting the online resources linked to the ISOGG Wiki on mtDNA testing, and particularly FTDNA’s FAQ on the subject.

Other kinds of DNA Testing: Ethnographic Testing

Still another kind of DNA testing with a wide time horizon, is ethnographic testing, which doesn’t bother trying to construct a mutational tree of descent, but rather simply samples DNA from all over the genome looking for characteristic markers associated with various ethnic populations. This kind of testing, which takes into account all of ones ancestors, and not just those at either edge of the tree (the purely patrilineal and matrilineal lines) has little to offer genealogy, but it does provide an estimate of the percentage contribution of various ethnic groups to one’s overall ancestry. It is thus one way to explore the popular tradition in many American families of Native American ancestry.

However, there are some caveats that come with this kind of testing. First, unless the Native American ancestor, for example is rather recent, there is a significant chance that in the sampling, his/her DNA will be altogether missed. Second, many Europeans whose ancestors have never left the continent also have Native American ancestry in their makeup. How can that be? Because, besides the original east Asians who crossed the Bering Straits during a period when the land bridge to America opened, and thus became “Indians”, many of their own deep ancestors, out of Africa, went north and west to the middle East and Europe instead of east into Asia, and so influenced the DNA on the opposite side of the world.

Thus, like all the other DNA tests, these ethnographic tests too need to be interpreted in the light of other, more conventional sorts of evidence—in this case, the evidence provided by archeaology.

Paternity and Forensic DNA Testing

This type of testing alone depends on no wider context than that of father and son. And because it aims for the maximum degree of certainty, testing both ySTR sites and SNPs, it is conclusive beyond any sane person’s definition of reasonable doubt. Much of the mutation rate literature relied on by genetic genealogists is predicated on paternity test databases, which have the advantage, thus, of completely eliminating the NPE factor. Paternity testing is itself the father of all the other kinds of DNA testing, with their varied purposes, and it is still the “gold standard” for measuring mutation rates; the only problem is that, like gold, paternity test results are relatively scarce, so they need to be fleshed out by data derived more problematically from genealogical DNA databases, applying sophisticated statistical and the mathematics of probability to try to compensate for the many unknowns in the equations.

*** THIS CONTENT WAS STOLEN FROM JOHNBROBB.COM IN VIOLATION OF COPYRIGHT LAW ***

-

A Brief DNA Glossary

GD (Genetic Distance)haplotypehaplogroup,
MRCA (Most Recent Common Ancestor)MHT (Mutation History Tree),
NPE (Non-Paternity Event)patrilineageRPHTMRCA (Time to MRCA),
ySNP, and ySTR.

These and other terms follow alphabetically.
For a more extensive glossary, see the ISOGG Wiki Glossary

invisible writing

autosomal
pertaining to the numbered human chromosome, 1-22; all the human chromosomes except the “sex chromosomes”, the yChromosome, and the xChromosome

invisible writing

chromosome
one of 46 strands of the complete human DNA that constitute the genetic blueprint for each individual, organized into pairs, with one member of each pair inherited from the father, the other from the mother. 22 of these 23 chromosomal pairs are called autosomal chromosomes, while the remaining pair, made up of the xChromosome and the yChromosome, are called the sex chromosomes. Other species have variant numbers of chromosomes. The chromosomes of an organism taken as a whole are called the “genome”.

invisible writing

clade
a (once) living organism and all of its descendants; in the context of genetic testing of the male yChromosome, a common patriarch and all his male descendants.

invisible writing

deep clade testing
the testing for particular ySNP values to determine a man’s most specific (closest to the present) haplogroup, also called a clade or subclade.

invisible writing

crossover
a process that occurs during the replication of one of a parent’s two chromosomal strands to pass on to the next generation, in which part of the genetic material is taken from the other chromosomal strand instead; since crossover is likely to occur at some point on most chromosomes each generation, over time the segments of DNA passed on from ancestors get smaller and smaller, and eventually frustrate attempts to demonstrate relationship through autosomal DNA testing.

invisible writing

genealogical time
the time period within which genealogical research is possible and practical—roughly coincident with the time since written records began to be kept identifying individuals by name, and especially by hereditary surname.

invisible writing

genetic distance (GD) (in the context of ySTR surname projects)
the number of mutation events that have occurred to a panel of tested ySTR markers in the descent of two male line cousins from their common male ancestor.
     Each generational passing of the male yChromosome from father to son represents a transmission event—an opportunity for one or more mutation events to occur amongst the set of tested ySTR markers on that chromosome, and the GD is a count of the number of mutation events that have occurred down the generations in both male descendants. So, given that the tested markers mutate at a widely varying, but roughly predictable rates, GD provides an estimate of the closeness of the genetic relationship between two male patrilineal cousins.
     Usually, the genetic distance between the ySTR haplotypes of two men is simply the sum of the absolute number marker value differentials (the stepwise mutation model), but a simpler way of measuring GD is to simply count the number of markers that are different (the infinite alleles model), which usually provides a close approximation to the number of mutation events. Markers only occasionally differ by more than one number, and when they do, the current scientific evidence says that this is usually due to multiple mutations to the same marker, but multistep mutations seem to occur about once in 50 mutations; one rare kind of mutation that simultaneously affects the values of several markers is the reclOH mutation event.

invisible writing

half-identical
said of two humans who share at least one allele value at a particular SNP. Long consecutive stretches of half-identical sampled SNPs, measured in CM's (centimorgans, which adjust for the variant rates of crossover in different chromosomes) are indicative of a shared descent from a common ancestor. The term HIR is sometimes used to mean half-identical region, whose length may be quantified either in cMs or in the number of SNPs. The principle testing companies at present, 23andME, and FTDNA, consider anywhere from 5-7 cMs (or about 500-700 SNPs) to be the minimum length to be possibly indicative of a reasonably close cousin relationship.

invisible writing

haplogroup
the deep ancestry of a particular individual
The common male ancestor of the members of a ySNP haplogroup usually goes back many thousands, or even tens of thousands of years. Haplogroups should not be confused with the ySTR-based haplotypes that are used for genealogical purposes, where the common male ancestor goes back only hundreds of years.
     Haplogroups have a branching tree structure, dividing meta-groups like R, called “clades”, into “subclades” like R1b, or R1b1a2, with each subclade branch defined by the particular sequence of SNP mutations that have accumulated in the genome of the common male ancestor of members of that subclade. Thus, a subclade like R1b1a2 is defined by the chain of sequential SNP mutations: M173, M343, P25, P297, M269.
     As the haplogroup tree has been progressively articulated over the years, the original Henry Sytem nomenclature for subclades has become increasingly unwieldy. There’s now a subclade of R1b1a2 denominated R1b1a2a1a1c2b2a1a1b2a1. For that reason, this old nomenclature is now deprecated in favor of one that appends to the the first, defining, letter of the human haplotree, the name of the lowest level SNP that has tested positive.
     Thus, R1b1a2 is now preferably called R-M269, and its subclade R1b1a2a1a1c2b2a1a1b2a1 is called R-S3334. Since new SNPs are constantly being found, most people haven’t tested the latest of their line, and this is recognized by designating their haplogroup, e.g. R-M269+, while in cases where all the more recent (subordinate) SNPs have been tested, but come up negative, their haplotype would be designated, e.g. R-M269*.
     For much more about haplogroup classification check out this section.

invisible writing

haplotype
a set of ySTR/mtDNA marker values associated with a particular individual (haplotypes are only rarely unique)
ySTR marker values (also called alleles) are determined by testing a subset of highly mutable microsatellite sites on the yChromosome called ySTRs.

invisible writing

IBD (Identical By Descent)
obfuscatory jargon for “inherited”, typically used to characterize a particular stretch of DNA that is known to have been inherited from some relatively recent ancestor (and perhaps shared with another descendant), as opposed to the same stretch of DNA that is IBS (Identical By State), meaning simply “identical” between two individuals and not known to have been inherited from a common ancestor.

invisible writing

infinite alleles mutation model
The assumption that each difference between ySTR marker values in a panel of tested ySTR marker values is due to a single mutation, even when there may have been a gain or loss of several repeats. This model of the way mutations work is a considerable simplification of the complex reality of the mutation process, but it provides a reasonable quantitative approximation to it over the period of genealogical time.

invisible writing

microsatellite
a stretch of DNA characterized by multiple repeats of the same 2-6 nucleotide base sequence letters in which the genetic code is written. Miscrosatellites occur throughout the genome, but the ones most useful for genealogical testing purposes are located on the yChromosome.

invisible writing

marker (in the context of DNA testing)
a stretch of DNA whose allele values are sampled as a means of identifying individuals or placing individuals within (deep) patrilineages

invisible writing

MRCA (Most Recent Common Ancestor)
the MRCA is relative to a particular set or subset of ySTR-tested subjects, and is not, therefore, necessarily the same as the ultimate patriarch of a patrilineage, as I have defined it here. As a DNA surname project grows in scope with the addition of more distant patrilineal cousins, the MRCA moves backwards in time and may eventually become identical with the patrilineage patriarch, but even if this does not happen, the patriarch is the ultimate genealogical focus of the project. For that reason projects are best subdivided into patrilineages, rather than clusters of descendants of a more recent common ancestor. Nonetheless, it should be kept in mind that the TMRCA estimates for the current set of patrilineage members all point back only as far as they need to, to coelesce in a common ancestor.

invisible writing

Mutation History Tree
is a schematic tree of descent constructed for a set of descendant haplotypes of the same patrilineage that shows when particular mutations within the patrilineage tree of descent occurred, and thus how the tested members of the set are related. Here is a sample mutation history tree.

invisible writing

NPE (Non-Paternity Event)
in Western cultures, an unexpected disjunction somewhere in the paternal ancestral chain between the inherited surname and the inherited ySTR, due to a replacement of a son’s biological father (with his inherited surname) by a surrogate father with (usually) a different surname. The most frequent cause of NPEs historically was probably adoption, but there are many other possible causes, including out-of-wedlock births. See Identifying/Confirming Your Patrilineage and Disconfirming your patrilineage for more on NPEs.

invisible writing

nucleotide
There are four of these protein bases, denominated “A”, “G”, “C”, and “T”, and they constitute the alphabet of the genetic code

invisible writing

(genealogical) patrilineage
the male line descendants of the earliest male ancestor, the patriarch, who lived within genealogical time.
     The patriarch of a patrilineage, thus defined, is typically the first of his male line to adopt a particular surname and pass it on to his children. The most recent common ancestor (MRCA) of any particular set of ySTR tested descendants is likely to be well downstream of the original patriarch. The methods (and pitfalls) of sorting people into genealogical patrilineages are discussed at length under See Identifying/Confirming Your Patrilineage and Disconfirming your patrilineage

invisible writing

patrilineage cousins
a set of tested or testable (male) paternal line cousins who are members of a patrilineage as defined above; more loosely (for genealogical purposes), any individuals with ancestors belonging to this patrilineage.

invisible writing

reclOH mutation
An uncommon, but not rare, kind of mutation to a portion of the yChromosome that can affect more than one of a set of ySTR markers that usually mutate separately and independently. Read this article, and this one, to learn more.

invisible writing

repeat
one iteration of a sequence of nucleotide letters that is repeated a number of times to make up a ySTR marker; when the marker mutates, it usually gains or loses a single repeat.

invisible writing

RPH (Root Prototype Haplotype)
the hypothetical haplotype of the ancestor of a common patrilineage
RPH may also be defined as the haplotype of that member of a set of tested patrilineage cousins who is most closely related to all of the others, collectively. For a fuller discussion of RPH (a term, and concept, developed by yours truly), see this paper.

invisible writing

SNP (Single Nucleotide Polymorphism)
an observed difference in allele values between single nucleotides on the chromosomal strands of two individuals of the same species. The term is also used to refer to the paired nucleotides, or "base pair" of the nuclear DNA of an individual of a diploid species, like we humans, who inherit a copy of each chromosome from each of our parents.
     In autosomal testing for genealogical purposes, large numbers of SNPs (base pairs) are sampled across whole chromosomes in two individuals, with the aim of identifying long half-identical stretches that are likely indicative of shared DNA from a common ancestor.

invisible writing

stepwise mutation model
The assumption that each unit of difference between measured ySTR marker values is due to the gain or loss of a single repeat. This model of the way mutations work provides a close approximation to the complex reality of the mutation process.

invisible writing

TMRCA (Time to the Most Recent Common Ancestor)
TMRCA, like genetic distance, is a measure of the closeness of relationship between two haplotypes. TMRCA may be measured in generations, or in years, where the number of years/generation is defined. TMRCA is calculated as a probabilistic function of the number of marker variations between the two haplotypes, and the calculation depends crucially on the estimated mutation rates for the particular markers that constitute the haplotype. Simple TMRCA calculators apply an average mutation rate across the marker panel, while more sophisticated calculators take account of which particular markers have mutated; if all the variant markers are fast ones, a closer relationship is indicated than if some of them are slow mutators. Another factor that may be taken into consideration is to adjust for the positive knowledge that there is no common ancestor back a certain number of generations from the present; this factor has the effect of pushing TMRCA farther back into the past. See my paper Deconstructing TMRCA & Genetic Distance for an extended discussion of TMRCA and GD (Genetic Distance).

invisible writing

transmission event
the event of male parentage in which the yChromosome of the father is replicated, with the possibility of mutations, and passed on to a son.

invisible writing

yChromosome (or “Y Chromosome”)
the yChromosome is that one of the 23 paired human chromosomes that is possessed only by the male, and which is handed down virtually unchanged to each of his sons.

invisible writing

yDNA (or “Y-DNA”)
the DNA of the male Y-Chromosome (or yChromosome), which is said to be “non-recombinant” because (except for a tiny “pseudoautosomal” region containing 9 genes) it cannot combine with its odd couple partner, a female xChromosome.
     Tests of panels of ySTR markers are offered by Family Tree DNA, and other companies, for genealogical purposes, and FTDNA and others also offer tests of particular ySNPs that are being used to reconstruct the deep ancestry of humankind.

invisible writing

ySNP (Single Nucleotide Polymorphism)
a single nucleotide on the male yChromosome for which a mutation has been found to occur; because such ySNP mutations occur so infrequently, they are used to mark branch points in the male descendancy from the original yAdam.

invisible writing

ySTR (Short Tandem Repeat)
a type of (male) yChromosome DNA sequence composed of multiple copies (or repeats) of the same multi-nucleotide sequence; another name for one of these sequences of repeats is microsatellite, and in the context of testing for genealogical purposes they are more familiarly called “marker”s. Sets of these ySTR markers are preferred for constructing test haplotypes for genetic genealogical purposes, because they mutate much faster than single point (SNP) loci. Several hundred of these ySTR sites, or markers, have been identified but only 120 or so are currently being tested for genealogical purposes.


Last updated 27Sep2014
© John Barrett Robb
Valid XHTML 1.0 Strict Valid CSS 3

*** THIS PAGE WAS STOLEN FROM JOHNBROBB.COM IN VIOLATION OF COPYRIGHT LAW ***