Separating Historic and Current Information on the cloud

  • In another thread someone was asking about having separate or combined databases for historic and current information. I suggested that keeping them separate gives you the ability to perform better DR among other reasons.

    However I also noted that current data could be moved to the cloud and you could leave older data local. Someone asked why and here are my thoughts.

    First, current data is usually much smaller amount of information. Since you pay by the space used and transaction, as well as data transfers, I'd think the cloud would give you smaller charges for current rather than older data. Current data is also usually accessed from more places, and with a quicker response time, which a service like SQL Azure should give you if you work in a distributed environment. It's common that companies might have multiple offices and force everyone to connect back to the main server in one location. Connectivity, theoretically, should be better in the cloud.

    However, if you are active analyzing or working with older data, maybe running data mining, you might want that data in the cloud instead. However either way, the separation of databases makes sense.

    Does this sound logical to anyone?

    Edit: Fixed logic

  • Interesting opinion on moving things to the cloud.

    I'm for keeping the current and historical separate for disaster-recovery situations (smaller database -> faster recovery, even other backup model (full vs differential)).

    If it needs to be combined, there are replication/mirroring options to have the current database on the historical site.

    For the cloud I'm confused with your explanation.

    Since you pay by the space used and transaction, as well as data transfers, I'd think the cloud would give you smaller charges for older data

    .

    My view (no experience):

    Option 1: Move current data to the cloud

    Pro: accessible from nearly everywhere, handy for disasterrecovery. Pricing/speed ?

    Cons: strain on your internet connection. What if the cloud decides to vaporize for x time?

    Option 2: Move historical data to the cloud

    Pro: accessible from nearly everywhere, optimized bi-mining for the cloud (leveraging cloud cpu?) , usually not so critical.

    Cons: prices increase with volume data stored (special pricing available for large volumes).

    Option 3: Keep everything local

    Pro: great overview of implications

    Cons: cost for implementing business continuity

  • Sorry, corrected my statement. I meant to say lower charges for current rather than historical data.

    Jo Pattyn (9/13/2011)


    Cons: strain on your internet connection. What if the cloud decides to vaporize for x time?

    There shouldn't be a strain. Most transactional applications pull back relatively little data, but YMMV, depending on the application.

    In terms of the cloud going down, the companies have as much incentive, usually, as the local IT folks. People act as if in-house IT has incredible reliability. In my experience, semi-competent admins sometimes cause a decent amount of downtime for various reasons. I'm definitely concerned about this in the cloud, but so many web hosters, and companies like Salesforce have proven this isn't necessarily a shaky proposition.

  • If you moved your app to the same cloud, I might be able to see it. At least then you're not doing double bandwidth, even for OLTP systems. A lot of the combobox stuff and the like could be cached, but even in OLTP systems a lot of data gets transferred around. Search queries, product lookups, patient records, whatever.

    My biggest concern with cloud data, in particular in this scenario, is how difficult it is to interact with clouds, in particular Azure. No backups and silly things like that. Offloading OLTP data back to the OLAP system could become rather annoying.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • I'm with you. I think it makes a lot of sense. Especially when you consider how small this particular data set seemed to be. The one big issue for me was that they said they frequently had combined data sets where they queried both databases at the same time. I haven't tried that yet with the cloud, but I can't imagine it works all that well, if at all.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • The app definitely needs to be in the cloud, or it needs to be mindful of bandwidth.

    Regarding combining data sets. I can think of a few ways to do this. One is pull data from the cloud through a linked server/Openrowset into the local server and process. The other is to combine things on the client. For so many reports I see it's some listing of data or an aggregation that could easily be pulled through 2 connections and then combined on the client with a little intelligence in the programming.

    You could also pull the data regularly from OLTP and put it the historic data with an incremental load of some sort, so the local server always had the whole data set.

    Note that I'm not necessarily saying the cloud is the best place, but even with 2 local servers, you have flexibility if you keep the data sets separate.

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply