Pitfalls of cross-database views?

  • Hi,
    I currently work for myself and provide IT services to a hedge fund with around ~30 traders and analysts who have direct access to a home-grown operational analytics database.  I have wanted to rip through and clean up the database in order to improve performance long-term, as every time I make a huge performance win, my hand giveth and the firm's growth taketh away: they've grown about 50% since I started helping them.
    My current idea is to push all data they currently have read access to through direct SQL connection to the operational analytics db, and move it into a new database OperationalAnalyticsNew, rename the operational db OperationalAnalyticsCore, and rename the new database again to OperationalAnalytics.  In this way, I won't break anyone's SQL connections when they come into work on Monday morning.

    I'm trying to think through everything that could go wrong with this:
    1. Is it too clever by half?  If so, why?  ~12 years ago when I got my first professional job, I had the "bright idea" to use "INSTEAD OF" triggers, which backfired on me.  Since then, I've learned to try not to be too clever.
    2. Lose support for SCHEMABINDING on views, as the below example demonstrates:

    CREATE DATABASE OperationalAnalyticsNew
    GO
    USE OperationalAnalyticsNew
    GO
    CREATE VIEW dbo.Accounts
    WITH SCHEMABINDING
    AS
    SELECT
    Account_CD,
    Version     FROM OperationalAnalyticsCore.dbo.Accounts
    /*
    Msg 4512, Level 16, State 3, Procedure Accounts, Line 4 [Batch Start Line 48]
    Cannot schema bind view 'dbo.Accounts' because name 'OperationalAnalyticsCore.dbo.Accounts' is invalid for schema binding. Names must be in two-part format and an object cannot reference itself.
    */

    Interestingly, the error message is correct, but not complete, since if you use a two-part name that is a synonym, it will also fail (I knew this, but I'm documenting it here so that I can refer my teammates to this thread later on when they may ask):

    USE OperationalAnalyticsNew
    GO
    CREATE SYNONYM dbo.CoreAccounts FOR OperationalAnalyticsCore.dbo.Accounts

    CREATE VIEW dbo.Accounts2
    WITH SCHEMABINDING
    AS
    SELECT
    Account_CD,
    Version     FROM dbo.OperationalAnalyticsCoreAccounts

    3. Because you can't use SCHEMABINDING, you cannot Create Indexed Views, because Indexed Views require SchemaBinding. https://docs.microsoft.com/en-us/sql/relational-databases/views/create-indexed-views?view=sql-server-2017
    4. Potential issues with WAIT resource types being confusing - I'm not really sure how a lock escalates on a cross-database join, for example, because I've never really used this approach except for unifying yearly archive databases with barely any activity.
    5. Both databases should probably have AUTO_CLOSE OFF, which is a SQL Server best practice, anyway.
    6. Both databases should probably use the same collation.  Ideally, any joins would take place on non-character columns, but "iff if was a skiff, we'd all be sailing", right?
    7. Would need synonyms so that code doesn't break in development environments.
    8. Somebody might later get the "bright idea" to enable cross-database ownership chaining.  Users would then need to exist in both databases.  Rather than go over the security risks and administrative burden of this, I'll just link to Soloman Rutzky's blog post (scroll down to "Reasons to Avoid These Methods"): https://sqlquantumleap.com/2017/12/30/please-please-please-stop-using-impersonation-execute-as/
    9. Even heeding good advice and using certificate signing, after I am gone, developers might find certificate signing to be too much of a hassle to maintain.  I may just create a situation where someone will look at it in 10 years from now and think, "This guy was a cowboy/idiot for doing it this way with all these certificates".  As Erland summarizes: "If you have a multi-database application where there are cross queries all over the place, it is likely that you will find certificates to be a bit too much hassle. Even if you can mitigate the situation with good automation, it will have to be admitted that certificates are good for the few and exceptional cases, but if you need to sign everything to handle permissions it gets a little out of hand."  http://www.sommarskog.se/grantperm.html
    10. Database integrity is not possible, because foreign keys cannot cross databases, but also not an issue if the Excel-facing database ONLY and ONLY contains views.
    11. If someone writes business logic with a BEGIN TRANSACTION wrapping the view, then conceivably this could cause the resource manager to think it needs to escalate the transaction into a distributed transaction. https://docs.microsoft.com/en-us/sql/t-sql/language-elements/commit-transaction-transact-sql?view=sql-server-2017

    If the transaction committed was a Transact-SQL distributed transaction, COMMIT TRANSACTION triggers MS DTC to use a two-phase commit protocol to commit all of the servers involved in the transaction. When a local transaction spans two or more databases on the same instance of the Database Engine, the instance uses an internal two-phase commit to commit all of the databases involved in the transaction.

    Have I missed anything else?

  • If you move to the cloud, cross database queries are even slower. (less control where the databases are placed)

  • Why not separate functionality using different schemas rather than databases?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • johnzabroski - Thursday, March 14, 2019 10:50 AM

    9. Even heeding good advice and using certificate signing, after I am gone, developers might find certificate signing to be too much of a hassle to maintain.  I may just create a situation where someone will look at it in 10 years from now and think, "This guy was a cowboy/idiot for doing it this way with all these certificates".  As Erland summarizes: "If you have a multi-database application where there are cross queries all over the place, it is likely that you will find certificates to be a bit too much hassle. Even if you can mitigate the situation with good automation, it will have to be admitted that certificates are good for the few and exceptional cases, but if you need to sign everything to handle permissions it gets a little out of hand."  http://www.sommarskog.se/grantperm.html

    Just commenting on this aspect.
    With a combination of a metadata database/tables, certificates and ddl triggers (and associated procs/functions) it is possible to fully automate certificate signing/counter signing and granting permissions to both code and data without  developers worrying too much about it.
    As it stands they only need to add a new entry to the metadata specifying if a proc is a entry proc.

    I've implemented this recently and only permissions that developers have on db is read/write and ddl - permissions are out of their hand and are managed through the setup above.
    In prod users have access to a subset of main entry-point procs associated with read only access.

  • frederico_fonseca - Monday, March 18, 2019 11:23 AM

    johnzabroski - Thursday, March 14, 2019 10:50 AM

    9. Even heeding good advice and using certificate signing, after I am gone, developers might find certificate signing to be too much of a hassle to maintain.  I may just create a situation where someone will look at it in 10 years from now and think, "This guy was a cowboy/idiot for doing it this way with all these certificates".  As Erland summarizes: "If you have a multi-database application where there are cross queries all over the place, it is likely that you will find certificates to be a bit too much hassle. Even if you can mitigate the situation with good automation, it will have to be admitted that certificates are good for the few and exceptional cases, but if you need to sign everything to handle permissions it gets a little out of hand."  http://www.sommarskog.se/grantperm.html

    Just commenting on this aspect.
    With a combination of a metadata database/tables, certificates and ddl triggers (and associated procs/functions) it is possible to fully automate certificate signing/counter signing and granting permissions to both code and data without  developers worrying too much about it.
    As it stands they only need to add a new entry to the metadata specifying if a proc is a entry proc.

    I've implemented this recently and only permissions that developers have on db is read/write and ddl - permissions are out of their hand and are managed through the setup above.
    In prod users have access to a subset of main entry-point procs associated with read only access.

    Interesting. Please write an article about it and inform me when it's published.  Your comments are interesting but too abstract for me to wrap my head around. No offense, I'm just getting old and when a developer starts off by saying "it is possibly to fully automate" (something I think is hard to deal with), I take it with a grain of salt until I see a working demo.

  • Jo Pattyn - Monday, March 18, 2019 10:48 AM

    If you move to the cloud, cross database queries are even slower. (less control where the databases are placed)

    This is an excellent point, but not an issue in my case.   Still, I think this sort of comment is exactly what I am looking for to add to my checklist.  I can say, "Hey, here are 11 things I thought of, and then Jo Pattyn and a few other suggested a few more pitfalls, but I still think we should do it, because it will save us 3 months of development time and not interrupt high touch investment professionals experience in Excel at all".

  • ChrisM@Work - Monday, March 18, 2019 10:58 AM

    Why not separate functionality using different schemas rather than databases?

    How would you do that?  Would you create a "Dbo2" schema?  In this case, I don't think that is worth the effort, because I would have to move every object in "OperationalAnalyticsCore" into dbo2, if I understand you correctly.  I think that's extremely risky, and it puts me more or less in the same boat I am now of worrying about interrupting Excel users and not having a strategy other than chasing issues if and when they occur.
    In my current workflow, I now have had dpa running for a long enough time where I know exactly which objects traders have access to.  But if I screw up moving references across database, that could be harder to catch.  While I can run RedGate "Find Invalid Objects" and my own script to fix broken dependencies, the risk seems rather high/unknown to me.
    The other risk is - what if you have other schemas already?
    Maybe I'm just being obtuse and don't see the benefits of your approach.

  • johnzabroski - Monday, March 18, 2019 5:34 PM

    ChrisM@Work - Monday, March 18, 2019 10:58 AM

    Why not separate functionality using different schemas rather than databases?

    How would you do that?  Would you create a "Dbo2" schema?  In this case, I don't think that is worth the effort, because I would have to move every object in "OperationalAnalyticsCore" into dbo2, if I understand you correctly.  I think that's extremely risky, and it puts me more or less in the same boat I am now of worrying about interrupting Excel users and not having a strategy other than chasing issues if and when they occur.
    In my current workflow, I now have had dpa running for a long enough time where I know exactly which objects traders have access to.  But if I screw up moving references across database, that could be harder to catch.  While I can run RedGate "Find Invalid Objects" and my own script to fix broken dependencies, the risk seems rather high/unknown to me.
    The other risk is - what if you have other schemas already?
    Maybe I'm just being obtuse and don't see the benefits of your approach.

    In your original post you state "Have I missed anything else?" Yes - your purpose for performing this exercise.
    I guess two common reasons would be separation of functionalities into two or more groups and the application of different security mappings to each. If these are your reasons, then you can do this using schemas, hence avoiding the pitfalls you've identified if you were to separate the objects (or objects referencing them) into separate databases.
    For instance, you mention OperationalAnalyticsCore.dbo.Accounts, which could become OperationalAnalytics.Core.Accounts instead. The 'core' schema could have it's own security model making it invisible or unusable to reporting users.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work - Tuesday, March 19, 2019 2:36 AM

    johnzabroski - Monday, March 18, 2019 5:34 PM

    ChrisM@Work - Monday, March 18, 2019 10:58 AM

    Why not separate functionality using different schemas rather than databases?

    How would you do that?  Would you create a "Dbo2" schema?  In this case, I don't think that is worth the effort, because I would have to move every object in "OperationalAnalyticsCore" into dbo2, if I understand you correctly.  I think that's extremely risky, and it puts me more or less in the same boat I am now of worrying about interrupting Excel users and not having a strategy other than chasing issues if and when they occur.
    In my current workflow, I now have had dpa running for a long enough time where I know exactly which objects traders have access to.  But if I screw up moving references across database, that could be harder to catch.  While I can run RedGate "Find Invalid Objects" and my own script to fix broken dependencies, the risk seems rather high/unknown to me.
    The other risk is - what if you have other schemas already?
    Maybe I'm just being obtuse and don't see the benefits of your approach.

    In your original post you state "Have I missed anything else?" Yes - your purpose for performing this exercise.
    I guess two common reasons would be separation of functionalities into two or more groups and the application of different security mappings to each. If these are your reasons, then you can do this using schemas, hence avoiding the pitfalls you've identified if you were to separate the objects (or objects referencing them) into separate databases.
    For instance, you mention OperationalAnalyticsCore.dbo.Accounts, which could become OperationalAnalytics.Core.Accounts instead. The 'core' schema could have it's own security model making it invisible or unusable to reporting users.

    I guess the "safety net" is to write a query after all database migrations are finished, and verify via INFORMATION_SCHEMA.TABLES that no BASE_TYPE is 'TABLE' in the 'dbo' table_schema.
    In this sense, I would at least verify all objects are moved - but there is the additional risk of lots and lots of stored procedures that would need to be updated all at once from 'dbo' to 'Core'.
    If I'm moving a house, I want my caterpillar to move as little dirt as possible, without knocking over any endangered plants like cacti.  That's the purpose - to do as little work as possible, while not taking any shortcuts, documenting the risks, and minimizing any disruptions.

  • johnzabroski - Tuesday, March 19, 2019 11:04 AM

    ChrisM@Work - Tuesday, March 19, 2019 2:36 AM

    johnzabroski - Monday, March 18, 2019 5:34 PM

    ChrisM@Work - Monday, March 18, 2019 10:58 AM

    Why not separate functionality using different schemas rather than databases?

    How would you do that?  Would you create a "Dbo2" schema?  In this case, I don't think that is worth the effort, because I would have to move every object in "OperationalAnalyticsCore" into dbo2, if I understand you correctly.  I think that's extremely risky, and it puts me more or less in the same boat I am now of worrying about interrupting Excel users and not having a strategy other than chasing issues if and when they occur.
    In my current workflow, I now have had dpa running for a long enough time where I know exactly which objects traders have access to.  But if I screw up moving references across database, that could be harder to catch.  While I can run RedGate "Find Invalid Objects" and my own script to fix broken dependencies, the risk seems rather high/unknown to me.
    The other risk is - what if you have other schemas already?
    Maybe I'm just being obtuse and don't see the benefits of your approach.

    In your original post you state "Have I missed anything else?" Yes - your purpose for performing this exercise.
    I guess two common reasons would be separation of functionalities into two or more groups and the application of different security mappings to each. If these are your reasons, then you can do this using schemas, hence avoiding the pitfalls you've identified if you were to separate the objects (or objects referencing them) into separate databases.
    For instance, you mention OperationalAnalyticsCore.dbo.Accounts, which could become OperationalAnalytics.Core.Accounts instead. The 'core' schema could have it's own security model making it invisible or unusable to reporting users.

    I guess the "safety net" is to write a query after all database migrations are finished, and verify via INFORMATION_SCHEMA.TABLES that no BASE_TYPE is 'TABLE' in the 'dbo' table_schema.
    In this sense, I would at least verify all objects are moved - but there is the additional risk of lots and lots of stored procedures that would need to be updated all at once from 'dbo' to 'Core'.
    If I'm moving a house, I want my caterpillar to move as little dirt as possible, without knocking over any endangered plants like cacti.  That's the purpose - to do as little work as possible, while not taking any shortcuts, documenting the risks, and minimizing any disruptions.

    Right but don't you have the same problem with any jobs/procedures whatever that actually need to access the base tables regardless of what solution you take?  It seems like it would be safer just to make a new database with all the views in it and keep the view names the same as the table names and just tell all the excel users to update their connections.  Which should be pretty simple at that point as all they would need to do is change a database name.

  • Excel users have 1,000s of Investments and each investment is tracked as an Excel spreadsheet.

    Asking the traders who are paid more than me to fix their 1,000s of sheets is a non-starter.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply