Blog Post

Geneology with SQL Graph – Part II

,

In the previous post I explored a query on how to construct a basic family tree with sql graph – including table set up and queries. In this post am going to explain how to query this to get relationships and some clues on DNA/ethnic mix.
1 Relationships: We can use the ‘shortest path’ clause to find how people are related in this database. So, if I want all the people a character named Jerusha Hale is related to – I can say this –

SELECT fromperson.name as Fromperson,STRING_AGG(f.remarks+' '+toPerson.Name, '->') WITHIN GROUP (GRAPH PATH) AS FriendConnections,
          LAST_VALUE(toPerson.Name) WITHIN GROUP (GRAPH PATH) AS FriendName,
          COUNT(toPerson.Name) WITHIN GROUP (GRAPH PATH) AS levels
  
FROM
          PersonNode AS fromPerson,
          PersonRelatedTo FOR PATH AS f,
          PersonNode FOR PATH AS toPerson
WHERE
        
 MATCH(SHORTEST_PATH((toPerson<-(f)-)+fromPerson))
 AND fromPerson.Name = 'Jerusha Hale'
    

This is perfectly cool – it tells us who is she is married to, who her parents are, who her grandparents are, cousins are and so on. There are a few things to watch for though, here –

1 I chose not to store derived relationships like grandparents, cousins etc, because that is too much work. So what is pulled has some roundabout derived relationships like row 8, Malama Kanakoa is her grandmother but is being referred to in two steps, she is the daughter of her mother who is the daughter of her grandmother. In a real geneological database, these relationships may be defined more clearly/straightforward ways.
2 If you choose to store derived relationships, it might also be a good idea to store some kind of criteria to it that indicate if the person is immediate family or extended family or something similar. SQL Server will not know that a spouse is a closer relationship than a cousin, for example. And there may be many situations where this priority is important to distinguish.
3 Shortest path only pulls one relationship between two people(or ‘nodes’). For example, Jerusha could be someone’s sister and her sister could be married to her husband’s brother, say, then you will only get one relationship show up here. We have no control over what SQL Server chooses to show. This is a disadvantage and something to be aware of.

2 How to calculate ethnicity percentage, knowing the family tree?

Quite a lot of us think of ethnicity percentage as a 50-50 or an even match between parents somewhat. This is actually very erroneous. We can’t exactly tell how much of our ancestor’s DNA we receive, and the mileage may vary depending on many factors. There is a reason why Geneology is a business and a science in of itself. It is very complex and difficult to understand inherited patterns across many years. That said, 50% may be considered an average from each parent for mundane understanding. So the farther we go, the lower this percentage gets across multiple grandparents. This is a useful table as below:

Level# from Me100%DNA per parent
12Parents50%
24Grandparents25%
38Great-grandparents12.50%
416Great-great-grandparents6.25%
532Great-great-great-grandparents3.12%
664Great-great-great-great-grandparents1.56%
7128Great-great-great-great-great-grandparents0.78%

It needs a very detailed family tree to have all this information on every ancestor, even going back 5 levels. But given what is available – in this test data set for example – a character named Bromley Hale has some ancestors 7 levels away. I used below query to arrive at this.

DECLARE @Familydaughter TABLE(ID int, daughterID int)
DECLARE @Familyson TABLE(ID int, sonID int)
DECLARE @personid INT, @personsonid INT
SELECT @personid = personid FROM dbo.personnode WHERE name = 'Bromley Hale II'
INSERT INTO @Familydaughter
(id,daughterid)
select p1.personid,p.personid
FROM dbo.personnode p, dbo.personrelatedto r, dbo.personnode p1
where match(p-(r)->p1) and (r.remarks = 'daughter of')

INSERT INTO @Familyson
(id,sonid)
select p1.personid,p.personid
FROM dbo.personnode p, dbo.personrelatedto r, dbo.personnode p1
where match(p-(r)->p1) and (r.remarks = 'son of')

;WITH FamilyCTE
AS
(
SELECT
@personid as ID,
1 AS Level,
null as sonid,
null as daughterid
UNION ALL
SELECT
m.ID,
Level+1,
null,
m.daughterid
FROM @Familydaughter AS M
INNER JOIN FamilyCTE c ON m.daughterID = c.ID
UNION ALL
SELECT
f.ID,
Level+1,
f.sonid,
null
FROM @Familyson AS f
INNER JOIN FamilyCTE c ON f.sonID = c.ID
)

SELECT distinct p.name,p.born,p.died, q.level, p.sex,p.origin
FROM familyCTE Q
INNER JOIN dbo.personnode p ON q.id = p.personid
LEFT JOIN dbo.personnode M ON q.daughterid = m.personid
LEFT JOIN dbo.personnode F ON q.sonid = f.personid
order by q.level,p.name ;

Two of his ancestors 7 levels away seem hawaiian/native, so he may be perhaps, 1.56% Hawaiian, or has a wee bit of hawaiian blood and the rest of him is probably American or significantly unknown. The more data we have the easier it will be to calculate this, although this is only mundane approximation and not by any means an accurate measure of DNA from ancestors.

SQL Graph offers some tools to play with this kind of data and perhaps arrive at some interesting conclusions for our personal database. Thanks for reading.

Original post (opens in new tab)
View comments in original post (opens in new tab)

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating