Graphing Performance

  • Comments posted to this topic are about the item Graphing Performance

  • I saw recently a paper comparing the performance of Postgres, MySql, Oracle and SQL Server.  As loads got more complex so the difference in performance between the proprietary and the open source became astounding.  That shouldn't surprise us.  The resources and time dedicated to the query engines of both Oracle & SQL Server are immense.  The problems that RDBMS' have had to solve remain intractably complex.
    That graph DBs, particularly distributed graphs like Titan/Aurelius are slower than expected should not surprise us for the same reason.  Oracle's approach to Graph DBs appears to be to implement graph algorithms over the top of an Oracle DB.
    Personally I have enjoyed playing with Neo4J.  I particularly like the Cypher query language.  It shares many syntax constructs with SQL while adding relatively straightforward graph extensions.  SQL Server 2017 seems to have recognised Cypher and implemented something with a high degree of similarity.
    I have no doubt that we could write SQL queries that produce the desired results that a graph database has been designed to produce. However such queries are non-trivial in SQL while straightforward in Cypher.  A graph database may satisfy the performance criteria while offering a simpler and more easily understood implementation.  RDBMS may offer performance far beyond the use case but require top notch SQL skills and associated cost to maintain and develop

  • Fair points, David. Certainly simpler coding might be a benefit that makes sense for the development side, even if performance were similar (or worse).

    I thought  SQL 2017 had  Gremlin, not Cypherm, as the query language.

  • There are similarities between Gremlin and Cypher.  I'm not confident to say one way or the other but have noticed the pictorial style of the MATCH statement looks just like Cypher.
    Gremlin is an Apache foundation language and I'm not sure what that would mean for licencing within a closed source product.
    I really think SQL Server DBA's are going to love the graph syntax.  It solves some really interesting problems

  • The article referenced in Steve's writing is completely biased and unfair in my opinion.  Graph databases are for graph data but the data used in the study was simple two level relational data.  Graphs become exponentially more beneficial when finding routes of four or more hops.

  • Bill Talada - Monday, August 28, 2017 11:23 AM

    The article referenced in Steve's writing is completely biased and unfair in my opinion.  Graph databases are for graph data but the data used in the study was simple two level relational data.  Graphs become exponentially more beneficial when finding routes of four or more hops.

    They used the benchmark for graphing data. Perhaps that's not fair, but I think it's a fair tests of one aspect of the platforms.

  • When I was experimenting with Neo4J I found that consistent performance was a strength.  A 4 hop query seems to take the same amount of time regardless of data volumes

  • It seems to me that NoSQL database engines rely more heavily on just-in-time record parsing and computation, spreading an order of magnitude more reads across multiple nodes rather than relying on indexes or predefined schema. That makes database design more flexible, not necessarily more performant. The thing about old-school relational B-tree indexes is that they are powerful things; they can pack a lot of structured information in a relatively small amount of space, and when a single covering index is optimized to facilitate a specific problem domain, it's hard to beat. Likewise, flat de-normalized ColumnStore tables can compress data by 90% and facilitate a broad range of queries very efficiently. Likewise again for OLAP databases like SQL Server Analysis Services; for specific types of analytical processing it can't be beat.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • The thing with NOSQL is that it is not a single thing, it's a massive range of things.  Each one is intended to solve a particular problem.  The trick is to understand which problem they are designed to solve and then do an honest assessment of
    a. Do you have the problem
    b. Is the problem big enough to be worth solving
    C. Will solving it cause much bigger problems elsewhere (most likely with integration)

    In the analytics world ORC & Parquet formats are becoming more prevalent.  These use column store techniques to achieve acceptable performance.  Parquet allows nested structures in the same way that Postgres has an array data type.
    The question I have is at what point do the facilities in SQL Server fall into the NOSQL camp?  MDX? Full-Text search? XML?  JSON?

  • Well, keep in mind that NoSQL has meant "Not-Only-SQL" for some, and "Not (relational) SQL" for others.

  • I recall that MySpace used SQL Server, and had scale issues. Regardless, they are tools, it is just that with technology, sometimes individuals want to use the latest tool before understanding what it can and cannot do well.  And several years back, a well know company was 'trying' to migrate is multi-client system off a relational platform over to Cassandra.  Until they realised that it was not the correct tool for key parts of the system (especially the accounting related parts). Know thy tool!

    The more you are prepared, the less you need it.

  • Steve Jones - SSC Editor - Tuesday, August 29, 2017 10:34 AM

    Well, keep in mind that NoSQL has meant "Not-Only-SQL" for some, and "Not (relational) SQL" for others.

    Point taken but hasn't SQL Server become a Not Only SQL system?

  • David.Poole - Tuesday, August 29, 2017 11:29 AM

    Point taken but hasn't SQL Server become a Not Only SQL system?

    Yep

  • David.Poole - Tuesday, August 29, 2017 11:29 AM

    Steve Jones - SSC Editor - Tuesday, August 29, 2017 10:34 AM

    Well, keep in mind that NoSQL has meant "Not-Only-SQL" for some, and "Not (relational) SQL" for others.

    Point taken but hasn't SQL Server become a Not Only SQL system?

    I guess NoSQL usually means non-relational, or on a more physical level, non RowStore/B-tree. Funny that SQL is becoming the defacto query language for NoSQL databases like Hadoop and Cassandra.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Certainly HIVE gives a sort of SQL called HQL which is based on the MySQL dialect.
    Cassandra has CQL, Aerospike has AQL and Couchbase has NiQL.
    Apache Spark goes for ANSI SQL 2011 but many of the SQL on distributed platforms do not support ROLLUP and CUBE operations.
    Here's a tip.  When a vendor of an amazing new data technology claims ANSI SQL compliance ask them which one.  Remember without stating a year its a pretty meaningless  feature.  You'd hope for ANSI SQL 2008 or higher.I think the NOSQL movement started off as something akin to teenage rebellion and its now reached a certain maturity to find out that "by the time a man thinks his father may have been right he usually has a son who thinks he is wrong".  I think some really good ideas have come out of the NOSQL movement and new ways of thinking.  But equally I think that over a beer or two the NOSQL champions would concede that some of their complaints against SQL were because they didn't grasp the full complexity of what a traditional database has to do i.e. transactions, multi-concurrent locking, security etc.  
    I think both the SQL and the NOSQL worlds have a lot to teach each other but I would be wary of trying to build a "one true solution".  To misquote Terry Prattchet the use of the NOSQL crowd is that they have not learned the limits of the possible

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply