The root meaning of DNA testing today, means reading out portions of the unique genetic code which each of us inherits from our parents. Reading out the whole is usually called, misleadingly: “mapping the genome”—misleading because because the end product is nothing but a set of meaningless letters. The meaning comes in a bit at a time from painstakingly correlating tiny patches of a small portion of this code (the 5% or so which constitute the genes) with the development and manifestation of interesting traits. Thus, in time, the genetic part of the genome might truly be mapped in a general way onto the observed characteristics of species, or even at the detailed level of particular organisms. Those are the goals of classical DNA testing.
In parallel with the grand goal of mapping the genome, other more limited, but also more focused kinds of DNA testing have arisen. For example, where the testing is thorough enough to identify an individual uniquely, or at least to identify him and his closest blood relatives in a way which distinguishes them from all others on the planet, it can have important forensic applications, for example, in conclusively establishing paternity. And a set of tests performed on an individual DNA sample can, if extensive enough, establish a unique genetic fingerprint, and be used as such to place a suspected perpetrator at the scene of a crime, as with the O.J. Simpson evidence.
A different, though overlapping, kind of DNA testing aims to determine whether a set of tested males probably have a common ancestor within the purview of genealogical research.
Genealogy is concerned with working out the ancestries of people alive today, or, more broadly, on reconstructing the tree of descent from a common ancestor of a set of lineage cousins. The tree metaphor might, if we please, be extended to the whole human race, because it can be shown that all males descend from a single Adam, though the existing evidence instructs us that this forefather of us all was very far from being the first male of the homo sapiens species. Or, since man is a social animal, distinguished from the other animals perhaps in more than any other way by the elaborate transmissable body of knowledge and values he shares with his tribe (in a word, “culture”), the descent of the human race may also be conceived in ethnographic terms. But ethnographic family trees are of a scale which far exceeds the scope of any genealogical project.
The kind of DNA testing we are primarily concerned with here as would-be “genetic genealogists” has a far narrower focus: estimating the time back to a common paternal ancestor of two males. And its practical scope is confined to a period which one might call “genealogical time”—the period in each culture since written records began to be kept which document the lives of ordinary men, by name. Thus, genealogical time is roughly coincident, at least in Western cultures, with the time period since hereditary surnames came into general use—usually thought to compass the period1300-1500 in England, for example. To understand how and why DNA testing can be used to predict that two men have a common ancestor who lived not too many centuries ago, we need to review some of the basics of human DNA.
In order to explain DNA testing for genealogical purposes, it is first necessary to review some of the basics of human DNA, and it’s replication to produce offspring. Some of the DNA-related terms found in the following sections, and throughout this website, are defined in the glossary in the left column of this page.
Each of us has developed from a single cell containing our unique DNA blueprint. This DNA is organized into 23 paired “chromosomes”, one chromosome of each pair coming directly from the father, and one from the mother. Every cell in our body contains an exact copy of this complete genetic blueprint, except that when we produce germ cells for replication (sperm for men, eggs for women), our separate parental parts mix and recombine in a new and unique way for each sex cell we produce.
22 of these 23 chromosomal pairs are called “autosomal”, and each consists of matching paternal and maternal parts, perfectly aligned. DNA is a blueprint for producing the proteins of life, and two matched, but differing, versions of each gene sets up a genetic competition for determining the offspring’s characteristics, which results in some wins for the father, some for the mother, and a large proportions of compromises.
The remaining chromosomal pair, called the “sex chromosomes”, works quite differently. Instead of a matched pair, we find, at least in males, an odd couple: an X chromosome inherited from the mother, and a runty Y chromosome from the father (I prefer to style these “xChromosome” and “yChromsome”, just as I refer to yDNA, rather than the more conventional “Y-DNA” or “Y DNA”). The yChromosome contains fewer than 100 genes, only 9 of which match to those of the female xChromosome. A large proportion of the remaining yChromosome genes code for the specific attributes of the more specialized male (the female is the default type of the species).
The XX female, like the XY male, inherits one xChromosome from her mother, and the other from her father’s mother, so there is plenty of genetic competition between her two sex chromosome genes. However, since the male yChromosome fails to match up to most of the xChromosome, the male inherits most of his mother’s xChromosome genes as is. This can cause problems where the mother transmits a recessive genetic abnormality from one of her xChromosomes to a son; such an abnormality is hemophilia, which rarely occurs in women, but for those who are carriers, crops up in half of their sons. And of course those other genes on the yChromosome which have no xChromosome counterpart also operate to make men more exceptional. Interestingly, even with the autosomal chromosomes, women’s DNA recombines in a much more homogenized way than for men’s, keeping females much closer to the norms of the species, while males, more prone to extreme differentiation, may be considered nature’s experimental sex.
What matters most for our present purposes about DNA transmission from one generation to the next is that the yChromosome replicates virtually unchanged down the male paternal line. Current models of population genetics hypothesize that all men descend either from a single Adam, or at least a very small set of original progenitors, and women too have their Eves. But if all men descended from the same man, and if the yChromosome never changed at all, then all men would have identical yChromosomes and there would be nothing to be learned from testing. Fortunately for genetic genealogists, mutations creep into the germ cells, or occur during the replication process, and it is these mutations which produce the variations which yDNA testing measures.
I speak here exclusively of yDNA (the DNA of the male yChromosome) only because it is the testing of certain areas of the male yChromosome which has the highest payoff for genealogists. However, there are other kinds of DNA testing, all of which have their interest, and I have more to say about these below.
As it happens, in Western societies, and in many other cultures as well, surname runs with the paternal line, and since tracking surnames is the main preoccupation of genealogists, yDNA testing fits perfectly into their epistemological paradigms. If we test the yDNA of two males with the same surname and find that they are very closely matched, we have strong positive confirmation that they descend from a common ancestor of their patrilineage, while otherwise we may say that although they share a surname, they are probably no more likely to have a common ancestor than if one of them was surnamed Jones, and the other, Smith.
But yDNA can tell us more than just that two males do, or do not, have a common ancestor within the genealogical research horizon (the period since records of individuals began to be kept). Starting from the premises that yDNA is highly stable from generation to generation, but subject to change over very long stretches of time, and at a statistically predictable rate, the differential number of mutations which have accumulated between two tested male yChromosomes (the genetic distance) can serve as a kind of generational clock, measuring the time (in generations) back to their most recent common male ancestor—quite analogous to the archaeologists’s tool, radiocarbon dating. This estimate is called “TMRCA”.
The sensitivity of the clock depends on the average mutation rate at tested marker sites, and across the genome, these occur at a rate which ranges from 1 per billions of generations, to 1 per several hundred generations. The fastest mutations occur in stretches of DNA called microsatellites, and the most rapidly mutating microsatellites are those on the yChromosome, which are known as ySTRs (yChromosome Short Tandem Repeats). However, even with ySTRs which mutate at a rate of once every several hundred generations, it is still obviously necessary to test many of them in order to generate standard sets of markers which change within the narrow time span of genealogical time (roughly the last 400-1000 years). These standard sets of markers are called haplotypes.
It follows from this, that the more markers tested, the more mutations likely to show up, and thus the more finely calibrated would be the resulting TMRCA-measuring generational clock. However, it’s a bit more complicated than that because not all ySTR markers are created equal. In recent years, widely varying mutation rates have been observed across these markers, with some of them running at a mutational rate of 10 times those of the stodgiest ones. Thus, the average mutation rate across the various ySTR marker panels offered by the half-dozen or so yDNA testing companies, are at least as important as the number of markers tested. At present, the best “bang for the buck test” panel, and the one most useful for identifying patrilineage relationships, is the FTDNA 37-marker panel.
Incidentally, the marker sites sampled for the yDNA tests do not involve genes, per se. If they did, they might be subject to natural selection bias which would reduce the predictability of their mutation rates. Only about 5% of the genome actually codes for the genes which define our unique traits. The purpose and function of the rest of the genome, often called "junk DNA", is largely unknown at present.
What kinds of things can genealogists infer from sets of tested marker haplotypes for men bearing the same surname?
By patrilineage, I mean the exclusively male line descendants of the earliest male ancestor, the patriarch, who lived within genealogical time—usually the time since thus lineage first came to be identified in the records by a particular hereditary surname, or family name. More loosely, the term may be understood to include all the cousins of these male descendants (all the descendants of the patriarch), male or female.
The concept which I have labelled “patrilineage” is fundamental to genealogy, and I have deliberately given it a restrictive meaning in order to capture that concept. The unqualified term “patrilineage” might more usually be defined as any male-only tree of descent. It might be a tree as foreshortened as a father and his sons, or as deep as the tree of all living males (all of whom, it can be shown, descend from a common male ancestor). However, neither of these extremes, nor indeed the broad sweep of the unqualified term, are of much use to genealogists or family historians.
There is, though, a sense of “patrilineage” from which my usage needs to be distinguished. The MRCA (Most Recent Common Ancestor) of a set of yDNA-tested male cousins is very much the focal point of any current yDNA project, and rightly so. But as new, more remote cousins join the project, the MRCA gets pushed further and further back until it begins to approach the original patriarch of the patrilineage, and it is this wider scope which I think needs to be kept firmly in mind as we proceed with our investigations, be they in the documentary archives, or in the testing lab.
What we can say about yDNA testing is that subsets of reasonably closely matched haplotypes can be grouped into DNA surname patrilineages, comprising a group of men who almost certainly descend not only from a common ancestor, but from an original patriarch of the patrilineage (perhaps even the first bearer of the common hereditary surname) who lived anywhere from several 100 to as many as 1000 years ago. At the same time, and as a corollary of this, a pair of men thought to be closely related on the basis of genealogical research, but whose yDNA is widely disparate, may be said conclusively to be unrelated, at least within the time frame of the genealogical researcher, and thus are of different patrilineages.
yDNA testing is therefore likely to occasionally disconfirm membership in an expected patrilineage group. Although a disruption of one’s established assumptions or conclusions may seem to be a negative outcome of yDNA testing (and it is certainly likely to be at least mildly disturbing), the possibility of such unexpected results actually represents one of the chief benefits of yDNA testing. Where a fair amount of research has been done on a line, and testing is merely confirmatory of membership in an expected patrilineage, it may be reassuring, but it adds only a little to the strength of one’s much more focused genealogical theses. On the other hand, if an unexpected break in the lineage can be demonstrated, it is likely to set one off on new and productive research tracks.
What is the best way to interpret and deal with a failure of one’s yDNA haplotype to fall into an established and expected lineage? The first thing to suspect is that somewhere one has drawn a plausible, but invalid inference from one’s data; consequently, one needs to probe each link of the paternal ancestral chain for weaknesses. And where the lineage one expected to match is itself not throughly established, with a number of matching haplotypes backed by high quality research, the research behind the established lineage needs to be carefully scrutinized as well.
Second, if one is, after all, able to make a solid case from the evidence for each link of the paternal ancestral chain, it may be the evidence itself which is erroneous or incomplete. Record keepers made errors, then as now, which is why one tries to assemble several independent pieces of evidence, rather than relying on just one.
However, the greatest challenge facing all genealogists most of the time is the problem of incomplete records. That is why the BCG Genealogical Proof Standard calls for a “reasonably exhaustive search” of all the possibly pertinent records. Assuming that this standard has been met, the most likely remaining explanation for the failed match is that an NPE) has occurred. An NPE may be due to an adoption, an out-of-wedlock birth, or perhaps just an elective name change, none of which are likely to be reflected in the records. This may be due to inadvertence, or simply to a disinclination to publicize an unfortunate family or personal event.
Incidentally, while an extensive haplotype mismatch (and a careful re-examination of one’s evidence) can raise the possibility that an NPE has occurred, determining whether this is in fact the explanation for the divergence remains a conventional research task. Nor can even a perfect match between two haplotypes conclusively rule out the possibility that an NPE might have occurred—for example, a child might be fathered by a man everyone thought was his uncle, who would almost certainly have yDNA identical to the man everyone thought (mistakenly) was his real father. I’ve recently heard of such a case which turned up in one of the current DNA surname projects.
Besides disconfirmation, the positive value of yDNA testing is a function of two factors: (1) the number of people of the same patrilineage who have been tested; and (2) the quality of the genealogical research into their ancestral lines.
Independent of any genealogical research, the most that may be said for sure about two men whom yDNA testing show to be of a common patrilineage, is that they have a male ancestor in common within the last, say, 1000 years (at the outside), and the exact degree of their genetic distance (the number of divergent marker mutations between their haplotypes), can yield a rough estimate of the TMRCA, accurate only to within a century or two either way.
A piece of knowledge this imprecise, considered in isolation, doesn’t advance the genealogical enterprise much. However, where there are a number of tested members of a patrilineage, preferably bringing into play several well-researched sub-lineages, one can begin to correlate particular patterns of mutation with particular sub-lineages. And at that point, each addition to the tested pool adds significantly to the accumulated knowledge of the genetic tree, the research tree, or both. That is the ultimate desideratum of these surname DNA projects.
Which is why I’ve defined the prime goal of these DNA surname projects as the bringing together of DNA patrilineage cousins for the purpose of sharing research, and why I try to promote the recruiting of additional male bearers of the common surname to the project, with an emphasis on distant cousins within existing project patrilineages.
Hopefully, we may look forward soon to the day when the costs of testing come down further so that many more bearers of the same surnames may be tested, and on more and/or faster markers than at present. It would also help a great deal if the current mutation rate estimates for individual markers could be more precisely quantified so that the corresponding estimates of TMRCA could be narrowed in range.
In the meantime, these DNA surname projects are already acting as powerful catalysts, not only for the sharing of chunks of heretofore disconnected research, but in promoting new and broader research activities on the part of those interested in particular patrilineages. One reason most genealogists run into “dead ends” sooner than need be is that they don’t cast their nets wide enough. Accomplished (or professional) genealogists know that where the records are scant, it is usually necessary to research everyone of the same surname for a particular time and place. Surname DNA projects have a way of teaching that lession and widening everyone’s research horizons.
This is a brand new kind of DNA testing for genealogical purposes, which opens up the possibility of identifying all of one’s reasonably close cousins and relatives, not just those of the patrilineal and matrilineal lines. These tests scan the autosomal chromosomes, noting the values of hundreds of thousands, or even millions of particular SNPs, assumed to be representative of particular chromosomes. Then, a comparison is made with the set of SNP values of others in the test database, looking for long stretches of SNPs on the same chromosome which are at least half-identical, and thus indicative of a possible close cousin, or relative relationship.
Because shared inheritance decreases by a factor of 3-1 with each generation, these shared segments decrease rapidly in size and numbers with each generation, and in practice, identification of cousins beyond the 3rd cousin level becomes increasingly problematical. The blocks of half-identical DNA, often called HIRs (or Half-Identical Regions) are chopped into ever smaller fragments due to a phenomenon called crossover, which can affect a particular chromosome each time it is copied for replication to the next generation. The length of HIRs is measured in cMs (centimorgans), or sometimes in the number of SNPs sampled across the HIR, and both the length of the largest HIR shared between two people, and the number of HIRs they share are relevant to assessing the probability of descent from a common ancestor, and in estimating the genetic distance back to that common ancestor.
The two principal companies offering autosomal SNP testing for genealogical purposes, 23andMe, and FTNDA set the threshold of possible significance at 7 cM, or 5 cM, respectively. Across those HIR ranges, both companies test over 500,000 SNPs, all of which have to be half-identical for the segment to qualify as an HIR (although both companies do make a very limited allowance for the occasional testing glitch). Both companies provide to their customers software tools for identifying possible cousin or relative matches in their databases, and for estimating the closeness of the relations between them.
While this approach to genetic genealogy has great potential, it’s value at present is limited by the relatively small sizes of the databases, and by the fact that few pairs of possible tested cousins have both worked out their descendancies sufficiently to identify the ancestor, or ancestors, they have in common. I say “or ancestors”, because in cases where there has been much intermarriage within a small set of families, the estimates of relationship closeness are likely to be skewed as well by the fact that there are probably many different shared ancestors of various degrees.
In the end, the value of autosomal testing, as with all DNA testing is largely dependent on the quantity and quality of the genealogy which has been done. Nonetheless, autosomal testing provides in principle a way for male surname lines which have “daughtered out” to be validated nonetheless, by the targeted testing of one of the surviving females of such a line, and the corresponding autosomal testing of a proven male same-surname descendant of the same line. And as the databases grow in size, and as the genealogical enterprise advances, this value of this kind of testing should increase in proportion. However, the pricing needs to come down from the $300 range where it is currently situated.
Another important kind of yDNA testing is the determination of a man’s patrilineal haplogroup. Although of little use for genealogical purposes, the haplogroup can give one an idea of where a man’s remote male ancestors originally came from going back thousands and tens of thousands of years.
Just as haplotype is determined by testing ySTR microsatellites on the yChromosome, so haplogroup is determined by testing SNP (Single Nucleotide Polymorphisms) sites on the yChromosome. Compared to ySTRs, SNPs mutate very rarely—so rarely that when a ySNP mutation happens to occur in a particular father-son transmission event, it is considered practically unique, and is therefore sometimes called a UEP (Unique Event Polymorphism), although the chances are that many of these mutations aren't really unique, just so rare that its unlikely a second occurence of one will ever be found.
As I have explained elsewhere, the collection of ySTR values which constitutes a man’s haplotype place him fairly reliably within a particular patrilineage, which I have defined narrowly to mean all the male descendants of a man who lived within genealogical time, or roughly the timespan since a particular hereditary surname came into use for that patrilineage. However, I must confess here to having somewhat hijacked the term “patrilineage” to represent this vitally important genealogical concept. In reality, “patrilineage” as it is generally used, has a wider application, meaning all the male descendants of any arbitrarily chosen male. In fact, since it can be shown that all living males descend from a single male yAdam who lived perhaps 40-60,000 years ago, all living males are ipso facto members of the same patrilineage, but at this point the term ceases to have much value.
This broader sense of patrilineage does help one in understanding haplogroups, though, because the first male bearer of a unique SNP mutation on the yChromosome becomes thereby the patriarch of his own patrilineage, and the founder of a new sub-haplogroup. Except that the terms “sub-haplogroup” and “patrilineage” aren’t much used in this context, but another term is: “clade”, or more often “subclade”. The term “clade” also has a broader meaning, but it is used (and understood, in this deep ancestry context) without qualification as a synonym for branching haplogroups, just as I have used patrilineage without qualification to represent that portion of an ancestry which is within the scope of genealogical research.
But why, exactly, has it been deemed necessary to bring in the alternate term “clade”, when “haplogroup” is meant (both terms being defined in a special restrictive sense)? I think it’s because what we really need to talk about are the way haplogroups are constantly branching off into subhaplogroups, and “subclades” sounds a little less awkward.
And, for that matter, why do we need the terms “haplogroup” or “clade”, when “patrilineage” (defined with a different scope from my “(genealogical) patrilineage” usage) would do as well? I suppose that it’s because as it is, when we see the words “haplogroup” or “clade”, as used by genetic genealogists, they invoke the deep ancestral context defined by SNP testing, just as “patrilineage”, in my usage, is meant to invoke its specifically genealogical meaning.
What’s important is to understand the underlying concepts: there is a single tree of descent from one ancient patriarch to all living men, which branches each time a ySNP occurs—a ySNP that we know about. Each such branch point defines a new subclade (or subhaplogroup—take your pick); thus every living man belongs to a set of nested subclades of an original haplogroup. Haplogrouping is a way of classifying a man’s kinship group from the top down.
Meanwhile, classifying a man into a patrilineage on the basis of a set of ySTR marker values called a haplotype represents the bottoms-up approach. Eventually, as more and more ySNPs are found, these approaches may converge in many cases, but in the meantime it is useful to distinguish them, and this can most economically be done by differential terminology. Thus I reserve the term “patrilineage” for genealogical purposes (implying a reference to ySTR testing and haplotypes), and otherwise refer by preference to “clades and subclades” (implying thereby a reference to ySNP testing and haplogroups).
Although the ascertainment of one’s lowest order subclade requires SNP testing in most cases, membership in a more general clade, or haplogroup, can usually be inferred with a high degree of confidence from one’s haplotype. Thus, the general clade, or haplogroup, for the DENNISON DNA Surname Project Patrilineage 1 group is R1b1b2—perhaps the single most common subclade of the R1b haplogroup, shared by about 65-85% of all men who have British ancestry, depending on where in Britain they live. A haplogroup predictor program for inferring broad haplogroup from haplotype is available online, and besides that, the FTDNA testing company is commited to performing free SNP testing for any of its haplotype customers whose broad haplogroup cannot be inferred with confidence from their haplotype; beyond that, FTDNA and other companies offer detailed SNP testing for a more fully resolved subclade determination.
Progress in this haplogroup classification field has been so rapid that the terminology itself seems to change every year or two. In the latest twist, the old haplogroup designations have been replaced by the first letter of the major haplogroup family, suffixed by the lowest-order (most recent) SNP mutation of an individual’s particular male line of descent. Thus, the broad DENNISON Patrilineage 1 haplogroup, R1b1b2, has now become R-M269, with the SNP mutation at M269 being the defining mutation of this haplogroup subclade. No deep clade testing has yet been done on this line, so it’s likely that there are more recent SNPs than M269, and for that reason the more accurate designator is R-M269+ (if we knew that M269 was the most recent mutation as a result of deep clade testing (that these DENNISONs did not share subsequent mutations within their subclade), the designator would be R-M269*).
One’s haplogroup can harbor surprises. One DNA Surname project administrator I know who thought that his surname was of German origin, instead came up with a Norse (Viking) haplogroup, while my Robb genealogical patrilineage, which is clearly Scotch-Irish, turned out to come originally from northern Germany, probably part of the wave of Anglo-Saxon settlement which swept southern England in the wake of the Romans in the 4th Century—although my particular ancestors could have come over many hundreds of years either before or after that period.
All this is quite interesting in its own right, though it takes us far afield from genealogy, per se. However, one thing we may infer from the fact that two people with a common surname have different haplogroups, is that they have no common ancestor for at least thousands of years, and thus cannot be of the same patrilineage.
Besides the diploid (from two parents) DNA which lies coiled in the cell nucleus in a doubled helical spiral, there is the mitochondrial DNA in the cell’s cytoplasm. Every cell has mitochondria which both liase with the nuclear DNA and act as the high-volume factories of protein production, using copies of the nuclear DNA for much of their factory plan. Mitochondria, which are thus crucial to the life cycle, also have their own DNA blueprints which are independent of the diploid nuclear DNA, and these are inherited directly from the mother, via the egg cell which plays host to the fertilizing sperm.
Thus, analogous to the patrilineal yChromosome, the same mitochondrial DNA which your mother got from her mother, and so forth, is also subject to mutations, which allows one’s maternal line ancestors to be classified into one of a handful of deep matrilineages descended from a small number Eves who lived several tens of thousands of years ago. Mutations in two tested “hypervariable control regions” have been used to define mtDNA haplogroups, which in turn have been mapped onto the human population dispersion out of Africa. Although the mutation rates of mtDNA are not high enough to be genealogically useful, it could be interesting to learn that even though one’s mother’s mother’s ... mother, known to have descended from a family rooted in Britain since the time of the Norman Conquest, nonetheless inherited her mitochondrial DNA from a woman who lived 30,000 years ago in what is today the Russian steppes.
Still another kind of DNA testing with a wide time horizon, is ethnographic testing, which doesn’t bother trying to construct a mutational tree of descent, but rather simply samples DNA from all over the genome looking for characteristic markers associated with various ethnic populations. This kind of testing, which takes into account all of ones ancestors, and not just those at either edge of the tree (the purely patrilineal and matrilineal lines) has little to offer genealogy, but it does provide an estimate of the percentage contribution of various ethnic groups to one’s overall ancestry. It is thus one way to explore the popular tradition in many American families of Native American ancestry.
However, there are some caveats which come with this kind of testing. First, unless the Native American ancestor, for example is rather recent, there is a significant chance that in the sampling, his/her DNA will be altogether missed. Second, many Europeans whose ancestors have never left the continent also have Native American ancestry in their makeup. How can that be? Because, besides the original east Asians who crossed the Bering Straits during a period when the land bridge to America opened, and thus became “Indians”, many of their own deep ancestors, out of Africa, went north and west to the middle East and Europe instead of east into Asia, and so influenced the DNA on the opposite side of the world.
Thus, like all the other DNA tests, these ethnographic tests too need to be interpreted in the light of other, more conventional sorts of evidence—in this case, the evidence provided by archeaology.
This type of testing alone depends on no wider context than that of father and son. And because it aims for the maximum degree of certainty, testing both ySTR sites and SNPs, it is conclusive beyond any sane person’s definition of reasonable doubt. Much of the mutation rate literature relied on by genetic genealogists is predicated on paternity test databases, which have the advantage, thus, of completely eliminating the NPE factor. Paternity testing is itself the father of all the other kinds of DNA testing, with their varied purposes, and it is still the “gold standard” for measuring mutation rates; the only problem is that, like gold, paternity test results are relatively scarce, so they need to be fleshed out by data derived more problematically from genealogical DNA databases, applying sophisticated statistical and the mathematics of probability to try to compensate for the many unknowns in the equations.
*** THIS CONTENT WAS STOLEN FROM JOHNBROBB.COM IN VIOLATION OF COPYRIGHT LAW ***
-
The menu buttons at top right take you to other pages on this site, while the nav panel above targets other points on this page, or brings up other resources (papers I’ve written, and the like). If you find yourself lost, the browser BACK button will take you back to where you were (some people also have a convenient BACK button on their mouse, right under their thumb). Or hitting the Home key of your keyboard will take you back to the top of this page where you are now.
Some Key Terms:
GD (Genetic Distance),
haplotype,
haplogroup,
NPE (Non-Paternity Event),
patrilineage,
RPH, and
TMRCA.
For a more extensive glossary, see Kerchner’s Glossary of DNA Terms
invisible writing
autosomal
pertaining to the numbered human chromosome, 1-22; all the human chromosomes except the “sex chromosomes”,
the yChromosome, and the xChromosome
invisible writing
chromosome
one of 46 strands of the complete human DNA which
constitute the genetic blueprint for each individual, organized into pairs, with one member of each pair inherited from the
father, the other from the mother. 22 of these 23 chromosomal pairs are called autosomal chromosomes, while the remaining
pair, made up of the xChromosome and the yChromosome, are called the sex chromosomes. Other species have variant numbers
of chromosomes. The chromsomes of an organism taken as a whole are called the “genome”.
invisible writing
clade
a (once) living organism and all of its descendants; in the
context of genetic testing of the male yChromosome, a common patriarch and all his male descendants.
invisible writing
deep clade testing
the testing for particular ySNP values to determine a man’s most specific (closest to the present) haplogroup, also called a clade or subclade.
invisible writing
crossover
a process which occurs during the replication of one of a parent’s two chromosomal strands to pass on to the
next generation, in which part of the genetic material is taken from the other chromosomal strand instead; since crossover is likely to occur at some point
on most chromosomes each generation, over time the segments of DNA passed on from ancestors get smaller and smaller, and eventually frustrate attempts to
demonstrate relationship through autosomal DNA testing.
invisible writing
genealogical time
the time period within which genealogical
research is possible and practical—roughly coincident with the time since written records began to be kept
identifying individuals by name, and especially by hereditary surname.
invisible writing
half-identical
said of two humans who share at least one allele value at a particular SNP. Long consecutive stretches of half-identical sampled SNPs,
measured in CM's (centimorgans, which adjust for the variant rates of crossover in different chromosomes) are indicative of a
shared descent from a common ancestor. The term HIR is sometimes used to mean half-identical region, whose length may be quantified either in cMs or in the
number of SNPs. The principle testing companies at present, 23andME, and FTDNA, consider anywhere from 5-7 cMs (or about 500-700 SNPs) to be the minimum
length to be possibly indicative of a reasonably close cousin relationship.
invisible writing
genetic distance (GD) (in the context of yDNA surname projects)
the number of mutation events
which have occurred to a panel of tested ySTR markers in the descent of two male line cousins from their common male ancestor.
Each generational
passing of the male yChromosome from father to son represents a transmission event—an opportunity for one or more mutation events to occur
amongst the set of tested ySTR markers on that chromosome, and the GD is a count of the number of mutation events which have occurred down the generations
in both male descendants. So, given that the tested markers mutate at a widely varying, but roughly predictable rates, GD provides an estimate
of the closeness of the genetic relationship between two male patrilineal cousins.
Usually, the genetic distance between the ySTR haplotypes of two
men is simply the sum of the absolute number marker value differentials (the stepwise mutation model), but a
simpler way of measuring GD is to simply count the number of markers which are different (the infinite alleles
model), which usually provides a close approximation to the number of mutation events. Markers only occasionally differ by more than one number, and
when they do, the current scientific evidence says that this is usually due to multiple mutations to the same marker, but multistep mutations seem to
occur about once in 50 mutations; one rare kind of mutation which simultaneously affects the values of several markers is the reclOH mutation event.
invisible writing
haplogroup
the deep ancestry of a particular individual
The
common male ancestor of the members of a yDNA haplogroup usually goes back many thousands, or even tens of thousands of
years. Haplogroups have a branching tree structure, dividing meta-groups like R, called “clades”, into
“subclades” like R1b, or R1b1b2, with each subclade branch defined by the particular SNP
mutation which occurred in the common male ancestor of members of that subclade. Thus, a subclade like R1b1b2 is defined by
the chain of accumulated SNP mutations: M173, M343, P25, P297, M269. In the new terminology, haplogroups are denominated
by their most recent SNP; thus R1b1b2 becomes simply R-M269.
invisible writing
haplotype
a set of yDNA/mtDNA marker values associated with a
particular individual (haplotypes are only rarely unique)
yDNA marker values (also called alleles) are determined by
testing a subset of highly mutable microsatellite sites on the yChromosome called ySTRs.
invisible writing
IBD (Identical By Descent)
obfuscatory jargon for “inherited”, typically used to characterize a particular stretch of DNA which is known
to have been inherited from some relatively recent ancestor (and perhaps shared with another descendant), as opposed to the
same stretch of DNA which is IBS (Identical By State), meaning simply “identical” between two individuals and
not known to have been inherited from a common ancestor.
invisible writing
infinite alleles mutation model
The assumption that each difference between ySTR
marker values in a panel of tested ySTR marker values is due to a single mutation, even when there may have been a gain or loss of several repeats. This model of the way mutations work is a considerable simplification of the complex reality of the mutation process, but it
provides a reasonable quantitative approximation to it over the period of genealogical time.
invisible writing
microsatellite
a stretch of DNA characterized by multiple
repeats of the same 2-6 nucleotide base sequence letters in which the genetic code is written. Miscrosatellites occur
throughout the genome, but the ones most useful for genealogical testing purposes are located on the yChromosome.
invisible writing
marker (in the context of DNA testing)
a stretch of DNA whose allele
values are sampled as a means of identifying individuals or placing individuals within (deep) patrilineages
invisible writing
MRCA (Most Recent Common Ancestor)
the MRCA is relative
to a particular set of yDNA-tested subjects, and is not, therefore, necessarily the same as the ultimate patriarch of a patrilineage, as I have defined it here. As a DNA surname project grows in scope with the
addition of more distant patrilineal cousins, the MRCA moves backwards in time and may eventually become identical with the
patrilineage patriarch, but even if this does not happen, the patriarch is the ultimate genealogical focus of the
project. For that reason projects are best subdivided into patrilineages, rather than clusters of descendants of a more
recent common ancestor. Nonetheless, it should be kept in mind that the TMRCA estimates for the
current set of patrilineage members all point back only as far as they need to, to coelesce in a common ancestor.
invisible writing
NPE (Non-Paternity Event)
in Western cultures, an unexpected disjunction somewhere in the paternal ancestral chain between the inherited surname
and the inherited yDNA, due to a replacement of a son’s biological father (with his inherited surname) by a surrogate
father with (usually) a different surname. The most frequent cause of NPEs historically was probably adoption, but there
are many other possible causes, including out-of-wedlock births. See the final paragraphs of Identifying/Disconfirming Your Patrilineage for more on NPEs.
invisible writing
nucleotide
There are four of these protein bases, denominated
“A”, “G”, “C”, and “T”, and they constitute the alphabet of the genetic
code
invisible writing
(genealogical) patrilineage
the male line descendants of the
earliest male ancestor, the patriarch, who lived within genealogical time.
The patriarch of a patrilineage, thus defined, is typically the first of his male line to
adopt a particular surname and pass it on to his children. The most recent common ancestor (MRCA) of any particular set of yDNA tested descendants is likely to be well downstream of the original
patriarch. The methods (and pitfalls) of sorting people into genealogical patrilineages are discussed at length under
Identifying/Disconfirming Your Patrilineage
invisible writing
patrilineage cousins
a set of tested or testable
(male) paternal line cousins who are members of a patrilineage as defined above; more loosely (for genealogical purposes),
any individuals with ancestors belonging to this patrilineage.
invisible writing
reclOH mutation
A rare kind of mutation to a portion of the yChromosome which can affect
more than one of a set of ySTR markers which usually mutate separately and independently. Read this
article, and this
one, to learn more.
invisible writing
repeat
one iteration of a sequence of nucleotide letters which is repeated
a number of times to make up a ySTR marker; when the marker mutates, it usually gains or loses a single repeat.
invisible writing
RPH (Root Prototype Haplotype)
the hypothetical haplotype of the ancestor
of a common patrilineage
RPH may also be defined as the haplotype of that member of a set of tested patrilineage
cousins who is most closely related to all of the others, collectively. For a fuller discussion of RPH (a term, and
concept, developed by yours truly), see this paper.
invisible writing
SNP (Single Nucleotide Polymorphism)
an observed difference in allele values between single nucleotides on the chromosomal
strands of two individuals of the same species. The term is also used to refer to the paired nucleotides, or "base pair" of the nuclear DNA of an
individual of a diploid species, like we humans, who inherit a copy of each chromosome from each of our parents.
In autosomal testing for
genealogical purposes, large numbers of SNPs (base pairs) are sampled across whole chromosomes in two individuals, with the aim of identifying long
half-identical stretches which are likely indicative of shared DNA from a common ancestor.
invisible writing
stepwise mutation model
The assumption that each unit of difference between
measured ySTR marker values is due to the gain or loss of a single repeat. This model of the way mutations work provides a close
approximation to the complex reality of the mutation process.
invisible writing
TMRCA (Time to the Most Recent Common Ancestor)
TMRCA, like genetic distance, is a measure of the closeness of relationship between two haplotypes. TMRCA may be
measured in generations, or in years, where the number of years/generation is defined. TMRCA is calculated as a
probabilistic function of the number of marker variations between the two haplotypes, and the calculation depends crucially
on the estimated mutation rates for the particular markers which constitute the haplotype. Simple TMRCA calculators apply
an average mutation rate across the marker panel, while more sophisticated calculators take account of which particular
markers have mutated; if all the variant markers are fast ones, a closer relationship is indicated than if some of them are
slow mutators. Another factor which may be taken into consideration is to adjust for the positive knowledge that there is
no common ancestor back a certain number of generations from the present; this factor has the effect of pushing TMRCA
farther back into the past. See my paper Deconstructing TMRCA & Genetic Distance
for an extended discussion of TMRCA and GD (Genetic Distance).
invisible writing
transmission event
the event of male parentage in which
the yChromosome of the father is replicated, with the possibility of mutations, and passed on to a son.
invisible writing
yChromosome (or “Y Chromosome”)
the yChromosome is
that one of the 23 paired human chromosomes which is possessed only by the male, and which is handed down virtually
unchanged to each of his sons.
invisible writing
yDNA (or “Y-DNA”)
the DNA of the male yChromosome, which is
said to be “non-recombinant” because (except for a tiny “pseudoautosomal” region containing 9
genes) it cannot combine with its odd couple partner, a female xChromosome.
invisible writing
ySNP (Single Nucleotide Polymorphism)
a single nucleotide on the male
yChromosome for which a mutation has been found to occur; because such ySNP mutations occur so infrequently, they are used
to mark branch points in the male descendancy from the original yAdam.
invisible writing
ySTR (Short Tandem Repeat)
a type of yDNA sequence composed of multiple
repeats of the same multi-nucleotide sequence; these areas are also called microsatellites.
Sets of these ySTRs are preferred for constructing yDNA haplotypes because they mutate much faster than single point (SNP)
loci. Several hundred of these ySTR sites have been identified but only 100 or so are currently being tested, and
unfortunately, reliable mutation rate data exist for only a minority of these.
*** THIS PAGE WAS STOLEN FROM JOHNBROBB.COM IN VIOLATION OF COPYRIGHT LAW ***