Relational database or graph database? Why not have both?

  • Comments posted to this topic are about the item Relational database or graph database? Why not have both?


    Tony

  • Excellent Article Tony. I have been playing with Neo4j myself for awhile, and you have me very excited as well. I am going to have to try this out. Thanks.

  • Hi Toni,

    very interesting paper.

    I could be that I could use these ideas in the near future (I hope !).

    Since I would like to do some tests it would be important for me to work with some realistic (sample) data. Are those publicly available ?

    Thanks

    Teodoro Marinucci

  • teodorom (1/8/2014)


    Hi Toni,

    very interesting paper.

    I could be that I could use these ideas in the near future (I hope !).

    Since I would like to do some tests it would be important for me to work with some realistic (sample) data. Are those publicly available ?

    Thanks

    Teodoro Marinucci

    Teodoro, unfortunately not. The SnomedCT thesaurus distribution files are only available to download under licence from the International Health Terminology Standards Development Organisation (http://ihtsdo.org) or affiliated national organisations


    Tony

  • Good work. Good doc.

    Can you, please, let me know when you have CRUD working?

    Thanks 🙂

  • Interesting and very eclectic.

  • Great work. On the one hand, I'd be really excited to try this out. On the other hand I hope I never have to.:hehe:

  • It is important to note that:

    SNOMED more facilitates information input into an electronic record during patient care (OLTP in concept)

    Conversly, ICD facilitates data retrieval for purposes outside of patient care (i.e. reporting/OLAP).

    This is a nice solution for extending SNOMED to serve in the latter role; although, I believe there are efforts going on to map SNOMED to ICD (if it is even possible?).

  • I like to think of NOSQL as meaning Not Only SQL rather than No SQL

    Haha - perfect.

    Thank you for writing this; one of the most interesting articles I've seen for a while.

    I am interested to know how scalable this approach is. Have you performed any tests with, say, a hundred or more simultaneous connections all using your TVF in queries?

  • sneumersky (1/8/2014)


    It is important to note that:

    SNOMED more facilitates information input into an electronic record during patient care (OLTP in concept)

    Conversly, ICD facilitates data retrieval for purposes outside of patient care (i.e. reporting/OLAP).

    This is a nice solution for extending SNOMED to serve in the latter role; although, I believe there are efforts going on to map SNOMED to ICD (if it is even possible?).

    Mapping is available from SnomedCT to ICD10, ICD-O and OPCS4.

    There is much work being done to facilitate the analytical role of SnomedCT and the approach described hereis part of that.


    Tony

  • I am interested to know how scalable this approach is. Have you performed any tests with, say, a hundred or more simultaneous connections all using your TVF in queries?

    Not yet. As yet I haven't deployed it to a proper server situation. The only tests were done on my somewhat under-resourced laptop!


    Tony

  • teodorom (1/8/2014)


    Hi Toni,

    very interesting paper.

    I could be that I could use these ideas in the near future (I hope !).

    Since I would like to do some tests it would be important for me to work with some realistic (sample) data. Are those publicly available ?

    Thanks

    Teodoro Marinucci

    I've seen people use Twitter as a source for this kind of data. Tweets are considered public, so you should be able to do some testing with them

  • Good work .

    Considering performance gain I'd make some query optimization first. Looks like CTE may be a bit shorter, eliminating join which is only needed at final step

    WITH children (ConceptID, Level) AS

    (

    SELECT r.ConceptID1, 1

    FROM dbo.Concepts c

    INNER JOIN Relationships r ON r.ConceptID2=c.ConceptID

    AND r.RelationshipType='116680003' -- conceptID for 'IS_A' relationship

    WHERE c.ConceptID='371984007' -- conceptID for malignant neoplasm of oesophagus

    UNION ALL

    SELECT r.ConceptID1, c.Level+1

    FROM children c

    INNER JOIN Relationships r ON r.ConceptID2=c.ConceptID

    AND r.RelationshipType='116680003' -- conceptID for 'IS_A' relationship

    )

    SELECT DISTINCT children.*, c.FullySpecifiedName

    FROM children

    INNER JOIN dbo.Concepts c ON c.ConceptID=children.ConceptID

    Next one may load relationships data as 81 separate tables. So the query can be rewritten as

    WITH children (ConceptID, Level) AS

    (

    SELECT r.ConceptID1, 1

    FROM dbo.Concepts c

    INNER JOIN IS_A r ON r.ConceptID2=c.ConceptID

    WHERE c.ConceptID='371984007' -- conceptID for malignant neoplasm of oesophagus

    UNION ALL

    SELECT r.ConceptID1, c.Level+1

    FROM children c

    INNER JOIN IS_A r ON r.ConceptID2=c.ConceptID

    )

    SELECT DISTINCT children.*, c.FullySpecifiedName

    FROM children

    INNER JOIN dbo.Concepts c ON c.ConceptID=children.ConceptID

    After the optimization NoSQL perfomance gain may be not so dramatic. And mind extra efforts to keep two DBs in sync.

  • The thing I like about graph databases is that the relationship between nodes is a whole concept in its own right. It performs the equivalent of a link table between two objects and the necessary foreign keys all at once!

    I saw a discussion on Neo4J where the original use case was described as an accidental discovery that they had cracked the problem of the relationships that take place within a content management system.

    To be honest, all the types of NOSQL DB have their place. The challenge is to understand what their use case is and where it adds significant benefit to offset the cost of having to manage the technical diversity.

    I'm wondering if a natural development path for future developments of SQL Server will be as a general data management platform that can use relational, key-value, document and distributed data all from a common framework.

    Blending in Hadoop, column store indexes and in-memory OLTP may be the first step in this process.

  • David.Poole (1/11/2014)


    I'm wondering if a natural development path for future developments of SQL Server will be as a general data management platform that can use relational, key-value, document and distributed data all from a common framework.

    I couldn't agree more. That was partly the driver for getting me into Neo4j. SQL Server remains the bedrock of the RDBMS development, but using the power of other systems to deliver functionality that is best performed outside tSQL. What we really need is a set of ODBC drivers for the various NoSQL offerings, then my SQLCLR approach would be redundant!


    Tony

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply