Research on possibility of RDBMS to have performance benchmarks like no-sql Databases

  • There is a blog series i have created on wordpress.

    http://ratneshparihar.wordpress.com/2014/07/28/research-on-possibility-of-rdbms-to-have-performance-benchmarks-like-no-sql-databases/

    feel free to disagree.

    let discuss this .

    Please put your thoughts /ideas/ critics/comments in blog itself or here .

  • The 6 points raised in the first post of teh series are all a bit boring/mundane and don't, for me, address the title (which in any case is ambiguous) at all.

    Taking the title first, it's not clear whether you are looking at the possiblity of benchmark results for RDBMS similar to those obtained for no-sql databases or for benchmarks of similar workloads on no-sql databases and on RDBMS.

    I'll comment on your 6 numbered points in order:

    1) Abandoning a persistent redo log means abandoning the ability to recover from system breaks unless it means insisting that when and where where a log-based system would secure redo data in the log the system secures it in the main persistent data store. The performance penalty of securing data in main persistent data store at an early enough point to provide an acceptable recovery capability is far greater than the penalty of securing the data in a redo log unless the recovery requirement is extremely weak.

    2) The relational model makes no statement about what business logic can be included in the database, other than that those business rules which can be used to determine normality of schemas must be in the database. There are of course people who claim that no business rule should ever be in the database, which of course is nonsense because that would mean that no normalisation could ever be carried out. There are others who claim that no business logic other than that implemented as normalisation (in the form of keys and constraints) should ever be in the database, but that is not an RDBMS vs No-Sql distinction, the same people who make that claim about RDBMS make it also about no-sql databases; it will often be nonsense for RDBMS, since a single SQL statement carrying out a MERGE and a PIVOT a couple of joins and a projection or two actually embodies a large amount of business logic and the proponents of "no business logic in the database" are careful to stress that this business logic doesn't count as business logic lest they been seen as totally inept. Principles of modular design tend to separate logic in such a way as to maximise the narrowness of inter-component interfaces, and also to ensure that components which run in different contexts are sufficiently large that the context switching overhead is not unacceptably great, and modular design is equally important in no-sql systems and in RDBMS. So I find it hard to believe that point 2 says anything at all about difference between RDBMS and No-Sql.

    3) An undo log can be eliminated only when it is guaranteed that the system never fails, unless it is acceptable for part of a transaction to be done and the rest not; moving money from one account to another is a nice example of where this is usually not acceptable. The RDBMS view is that it is so rare for this to be acceptable that the simplicity of always having the undo log rather than sometimes having it and sometimes not is better than the complication of not having it in the rare cases where it is not needed (for example where no transaction ever updates more than one value in the database). A different argument can be made for writing the undo log as rarely as possible - that nothing should be written to the undo log until it is neccesary to write data which may need redoing to permanent store; but that's a considerably weaker position that your point 3, and it's still possible that on the whole it is better to keep things simple and always log undo information.

    4) Sql Server currently provides a range of isolation levels, not all of which use traditional locking. Depending on the workload, tradiditional locking may deliver better or worse performance than maintaining mulitple consistent views. The idea that traditinal locking always costs more than the alternatives is pure nonsense.

    5) Single threaded execution will only deliver decent performance when all IO latencies are very low indeed. Asynchronous IO at some level is necessary for most workloads to run efficiantly on most hardware, and Asynchronous IO won't deliver the required concurrency unless it is used to assist multi-threading. This is true whether there is a formal RDBMS or N-SqL database or just a bunch of ad-hoc files with no formal database concept.

    6) Two-phase commit should be avoided where possible is an overstatement; it should not be avoided when it is more efficient and/or more acceptable than the alternatives. It is always possible to avoid it, although it may or may not be efficient or accptable to do so (for example the system could be made to hold all data at one central location, so that there would no requirement at all for two-phase commit).

    Tom

  • Quick thought

    😎

  • Thank you Tom for providing feedback .

    Yes , title is little out of context and i put that way intentionally (backfired though :-D) because i am hearing a lot that for a RDBMS it will be difficult to have the Webscale like no-sql Dbs are providing , But it is difficult to understand why they can't . Now there are vendors in OLTP market to claim have that and in order to evaluate we have to understand how a futuristic RDBMS can handle if Modern hardware (mostly memory and cpu ) is cheap .

    Simply , i am not bench marking anything here just putting some thoughts what criterion will help us to decide next OLTP for our future application . sql-server , oracle are written on 30 year old architecture and complete redesign is required if they are compete with modern OLTPs.

    I am completely agree on your feedback for all 6 points as far as current state of RDBMS are concern , but you can visualize a OLTP completely on MainMemory and other non oltp issues on Datawarehouse (columnar ) then all transaction will be very small and all processing will indeed occur in mainmemory and no need of file handles either for accessing datapages for accessing log files and the whole database will be a cluster of multiple PCs .

    If you consider above scenario make sense and look again into 6 points i mentioned , then they might make sense . Since both distributed processing and memory is getting cheap and feasible the whole ecosystem around the RDBMS is likely to be changed and vendors has started to build systems without long lasting transaction logs , two phase commits and locking (ofcourse) .

    Thank you again and please provide if i was able to answer your feedback.

  • ratneshsinghparihar-1130833 (8/6/2014)


    but you can visualize a OLTP completely on MainMemory and other non oltp issues on Datawarehouse (columnar ) then all transaction will be very small and all processing will indeed occur in mainmemory and no need of file handles either for accessing datapages for accessing log files and the whole database will be a cluster of multiple PCs .

    Which is fine if you're happy with all of your data vanishing in a power outage, but that's not a very common situation.

    You need some form of persistent storage, whether it be disk or flash, unless you can afford multiple redundant power sources (from multiple suppliers), have perfect UPSs and infallible staff.

    As soon as you have a chain of PCs, you either need to accept 'eventual consistency', which is fine for something like twitter, not so fine for my bank account, or have some form of quorum/master server/multi-commit to ensure accuracy of data. This is not easy.

    vendors has started to build systems without long lasting transaction logs , two phase commits and locking (ofcourse) .

    That's fine, as long as you don't need durability, consistency, isolation and atomicity. If you have a application that doesn't require those, then such a system will serve your needs. Right tool for the job.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • @Gail Shaw ,

    Thanks for your feedback .

    The vendors are putting there every effort in synchronizing the Main Memory to persistent data layer (Be it log file or data pages) . In my personal opinion these days the power systems are indeed way better than a decade ago and if you are in cloud then it can never be a criterion for OLTP selection .

    Now for distributed processing one thing is pretty much sure that information generated by social , web ,sensors will way greater than the information hold by bank account , so the business will be more focus on storing these data and our research and concerns will be mostly on new OLTPs . You can have your banking system in current RDBMS but it’s a drop in ocean. And you are correct in order to have these distributed system work , they will be on BASE not in ACID .

    http://ratneshsinghblogs.blogspot.in/2014/07/from-acid-to-base.html

    In modern applications which collects huge volume of data the BASE is their reality . ACID is way overrated for distributed and scalable applications.

  • A common misconception about the idea of "NO-SQL" is that such products must be "non-relational". That isn't what NO-SQL implies however. The NO-SQL wave of products try to address limitations of current SQL DBMSs and they do it partly by moving away from the SQL data model and SMP server architecture. That doesn't mean those NO-SQL products have to be non-relational. Remember that NO-SQL is relatively new and immature technology. The products in that category are currently more or less at the place where mainframe DBMSs were 40+ years ago - using data models which are technically simpler to implement but in many ways inferior to the relational model. It could be that NO-SQL software will truly come of age when it fully embraces the relational model and we start to see more RDBMS NO-SQL products (i.e. truly relational RDBMSs that have left behind the SQL paradigm and the SMP/spinning disc architecture). At that point NO-SQL really would be a viable alternative to SQL, whereas right now NO-SQL is a realistic alternative only in a small number of situations.

  • @david-2

    Thanks for your feedback .

    My intention is not to replace the RDBMS with a no-sql database .

    I put down some aspects (logs , locking , two phase commit etc.) which are preventing the current RDBMS to perform (given the cheap memory and distributed processing is available ) .

  • ratneshsinghparihar-1130833 (8/6/2014)


    @David

    Thanks for your feedback .

    My intention is not to replace the RDBMS with a no-sql database .

    I put down some aspects (logs , locking , two phase commit etc.) which are preventing the current RDBMS to perform (given the cheap memory and distributed processing is available ) .

    You missed out one important thing: SQL. Get rid of the SQL model and its associated baggage and many of the things you want to achieve could become a lot easier.

  • @david-2

    Yes , i missed the sql as well .

    30 years ago when RDBMS were taking into shape and wining over file systems . The choice for developers was to use lanuages like basic or develope a scripting like sql . the sql won at that time becuase of simplicity and less no of code . But modern language is far more superior than sql (like linq in .net ) even languages are coming to process data faster and utilize more memory . So i think yes coin has been flipped for languages and sql has to go.

  • The other thing you are leaving out is the vendor ecology. RDBMS technology and SQL had great success partly because it provided a consistent data manipulation method across multiple products. All the report writer vendors and analytics vendors didn't have to tailor their products for 15 different languages and data storage mechanisms. If you have to write a generic tool for 15 different No-SQL languages and 15 different data storage paradigms, it costs more and your tool will cost more.


    And then again, I might be wrong ...
    David Webb

  • ratneshsinghparihar-1130833 (8/6/2014)


    @David

    Yes , i missed the sql as well .

    30 years ago when RDBMS were taking into shape and wining over file systems . The choice for developers was to use lanuages like basic or develope a scripting like sql . the sql won at that time becuase of simplicity and less no of code . But modern language is far more superior than sql (like linq in .net ) even languages are coming to process data faster and utilize more memory . So i think yes coin has been flipped for languages and sql has to go.

    I just can't even get past your equating SQL to BASIC. Just like there are more tools than simply a sledgehammer, there are multiple flavors of programming tools, each to be used for their own specific purpose, and yes each with their own strengths and weaknesses. Sorry to say it this way - but there will never be the one language to rule them all, so we should stop pretending there will be.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • David Webb-CDS (8/6/2014)


    The other thing you are leaving out is the vendor ecology. RDBMS technology and SQL had great success partly because it provided a consistent data manipulation method across multiple products. All the report writer vendors and analytics vendors didn't have to tailor their products for 15 different languages and data storage mechanisms. If you have to write a generic tool for 15 different No-SQL languages and 15 different data storage paradigms, it costs more and your tool will cost more.

    I'm pretty sure you are right there.

    But we do need a replacement for SQL; IBM screwed up badly when they pushed Ted Codd out of the way and redesigned his relational calculus wheel to have several very sharp corners and the odd completely flat or even concave section of outer surface here and there. The trouble is that now, about 4 decades on, everone has been living with the faults of SQL and has got used to them, and fixing SQL would be a nightmare - it was already just about impossible to get anything non-trivial changed in the SQL standard 25 years ago .- so we need a new language and we need all the RDBMS suppliers to commit to offering it if it is not to be a waste of time. So we may have to do without.

    Tom

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply