A Hundred Up

Three years ago I submitted a DNA sample to FamilyTreeDNA, and sat back and waited for the test results. The genetic 'matches' have duly been coming in, week by week since then, over 5000 of them and counting. Today I received my 100th match at the level of 2nd-to-4th Cousin. A cause for celebration, you would think - a host of new connections, new cousins, the expansion of our family tree, new family stories to hear and tell.

Well, I have not been able to establish a connection with a single one of them, with the exception of one I already knew, since before she was born so to speak - she's my cousin's daughter.

The problem is, my ancestors. And those of my matches. We are Ashkenazi Jews, and belong to a group that has been endogamous - ie, has intermarried within the group - not just for generations, but for centuries. So the DNA testing companies, and the science that underlies them, struggle to fit us into the pattern that works well for most other populations. They say they compensate to take account of the effects of endogamy, and I'm sure they do, but in my experience I have to say they end up grossly over-estimating the closeness of our relationships.

Here's the listing of my top 5 matches. Katy, the first one, is a 1st Cousin Once Removed - my cousin's daughter. So I know her. The next four are classed as probable 2nd-3rd Cousins, which means we should share great-great-grandparents, or closer. 

The thing is, I know my family quite well. I know all the descendants of all my grandparents, most of them personally. I know the given and family names of all 8 of my great-grandparents, and where most of them were born and where they lived. I know the names and places for the vast majority of their descendants - ie, the brothers and sisters of my own grandparents, and their children in turn, who are my 2nd Cousins.

Moving back to the previous generation, my great-great-grandparents should in theory be the source of my 2nd-3rd Cousin matches. I know the given and family names of 12 out of 16 of them, including all 8 men and 4 of the women, and many of their places; I also know the given names of the other 4 women. My knowledge of their descendant lines - ie, those of my great-grandparents' siblings - is much more sketchy. In some cases I know only the name of my own ancestor, and have no information at all on possible siblings. Some of my DNA 2nd-4th Cousin matches will undoubtedly come from these unknown lines, maybe most of them. But surely not all 99 of them?

From my grandparents to my great-great-great-grandparents
To take this one step further, the above implies that I actually know the family names of 12 out of 16 of the families of my great-great-great-grandparents - in other words, I know the family names that all of the siblings of my great-great-grandparents would have had, even if I don't actually know whether they existed or not. And these, of course, are the family names that the men would have passed on to the next generation.

At this point, let's make a few uncontroversial, generalising, assumptions: 

i) that any descendants that married and had children would be more or less equally divided between male and female
ii) that most women would take on their husband's surname on marriage, and thereby not pass on their own
iii) that any children they had would again be 50% male and 50% female, and so on

In this scenario, my knowledge of the surnames of any potential cousins would more or less halve with each generation, as the women don't pass on the known family name. However I do actually know who the siblings are in some cases, in particular who the women were, and who they married, and this knowledge increases the closer we get to the present day - so the halving process I am suggesting here is an exaggeration. I know much more than half of the names in my grandparents generation, but the calculation is easier to follow like this - let's just bear in mind we're being severe with the numbers.

So whereas I know all the family names of my great-great-grandparents' generation - the source of my 3rd Cousins - I will only know about half of those of my great-grandparents' generation, and a quarter of those of my grand-parents'. Which means I should expect to recognise the names of an eighth - 12.5% - of my parents' generation. And 6.25% of my own. Not 0% of any of them, which is where I am with my DNA matches at the moment.

We can halve again to get the picture for 4th Cousins - I should recognise fewer names in each succeeding generation: 12.5% of my 4th Cousins in my grandparents' generation, 6.25% in my parents', and 3.125% in my own. But again, not 0% of any of them. Especially considering we're just being theoretical, and not taking my actual knowledge into account.

And it's not just me recognising names in my own family tree - I am sharing trees with a number of my closer matches - and I don't recognise what's on theirs, nor they what's on mine.

So my conclusion is that FTDNA's match estimates exaggerate the closeness of our relationships. My guess at the moment is that they are a couple of generations out at least. I'm in touch this week with a couple of the 2nd-3rd Cousin matches in the list above, and I'll be surprised if we manage to confirm FTDNA's ratings. More than surprised - I'll be overjoyed! But I'm not expecting anything closer than 4th-5th.

A major issue of course is that most of us are finding it very difficult to trace our families back more than two or three generations, which is where we need to be to locate 3rd Cousins and further. In many areas the documentary trail has been disrupted, by emigration, war, revolution and the Holocaust, not to mention those inconsiderate ancestors who wilfully changed their names when it suited them. And all this makes it even more difficult to trace the descendants of those generations. But I still think I should be able to recognise one or two of them, at least.

On the plus side, it is useful having Katy in the list, as I can do a check on whether the people that match me also match her - she's on my father's side, so this gives me a rough orientation as to which side the others probably match me on. If they match Katy, they're probably on my Schreibman-Ilyutovich side, if they don't, they're probably on my Frankenstein-Waxman side. Reassuringly, across all 5000 matches, there's more or less half on each side.

It would help even more to have a few more known cousins do the test, as this would enable us to refine the analysis further, and get closer to identifying how our matches connect to us. Ideally I would like to have one of each line - a Frankenstein who's not a Waxman, and a Waxman who's not a Frankenstein, and similarly a Schreibman who's not an Ilyutovich, and an Ilyutovich who is not a Schreibman. That woud help us identify matches for each of my four lines.

And of course it's not just for 'my' family - they would all get matches on the other sides of their own families, as well.

Any offers?

