designing portal so we don't run into performance issues.

  • My team is building a self service portal for users who want to purchase the data from us. Registered users will log into our portal and make the purchase depending on the criteria they specify. For example,  I am a registered users who wants to know who is growing what crops or if they have any livestock etc. The table which contains all the details about any crops and livestock has about 5 millions records. The server we have has enough resources. 128GB of RAM, 64 cores (don't know about storage).
    I am pretty sure that we won't run into any performance related issues but I still want to make sure we build this portal in a way where we don't run into issues later. So the question is that I am not sure which is going to be optimal for performance if we let users put in the number where they specify potatoes = 200 or potatoes > 200 or is it better to have them select from one of the options from the drop down menu where they will have an option to choose from these values (101 - 250, 251 - 500,501 - 1000.1000+) 
    Table structure:
    CREATE TABLE [Crops](
        [FarmID] [int] NOT NULL,
        [Potatoes] [int] NULL,
        [PotatoesBand] [varchar](10) NOT NULL,
        [TotalLivestock] [int] NULL,
        [TotalLivestockBand] [varchar](10) NOT NULL,
    ) ON [PRIMARY]

  • NewBornDBA2017 - Thursday, April 26, 2018 2:51 PM

    My team is building a self service portal for users who want to purchase the data from us. Registered users will log into our portal and make the purchase depending on the criteria they specify. For example,  I am a registered users who wants to know who is growing what crops or if they have any livestock etc. The table which contains all the details about any crops and livestock has about 5 millions records. The server we have has enough resources. 128GB of RAM, 64 cores (don't know about storage).
    I am pretty sure that we won't run into any performance related issues but I still want to make sure we build this portal in a way where we don't run into issues later. So the question is that I am not sure which is going to be optimal for performance if we let users put in the number where they specify potatoes = 200 or potatoes > 200 or is it better to have them select from one of the options from the drop down menu where they will have an option to choose from these values (101 - 250, 251 - 500,501 - 1000.1000+) 
    Table structure:
    CREATE TABLE [Crops](
        [FarmID] [int] NOT NULL,
        [Potatoes] [int] NULL,
        [PotatoesBand] [varchar](10) NOT NULL,
        [TotalLivestock] [int] NULL,
        [TotalLivestockBand] [varchar](10) NOT NULL,
    ) ON [PRIMARY]

    That's hard to say with no context on any of this.
    What's a PotatoesBand and a TotalLivestockBand? What is the the user doing when entering something like potatoes=200 and potatoes > 200? And what is the difference and what happens when they select one or the other?

    Sue

  • NewBornDBA2017 - Thursday, April 26, 2018 2:51 PM

    My team is building a self service portal for users who want to purchase the data from us. Registered users will log into our portal and make the purchase depending on the criteria they specify. For example,  I am a registered users who wants to know who is growing what crops or if they have any livestock etc. The table which contains all the details about any crops and livestock has about 5 millions records. The server we have has enough resources. 128GB of RAM, 64 cores (don't know about storage).
    I am pretty sure that we won't run into any performance related issues but I still want to make sure we build this portal in a way where we don't run into issues later. So the question is that I am not sure which is going to be optimal for performance if we let users put in the number where they specify potatoes = 200 or potatoes > 200 or is it better to have them select from one of the options from the drop down menu where they will have an option to choose from these values (101 - 250, 251 - 500,501 - 1000.1000+) 
    Table structure:
    CREATE TABLE [Crops](
        [FarmID] [int] NOT NULL,
        [Potatoes] [int] NULL,
        [PotatoesBand] [varchar](10) NOT NULL,
        [TotalLivestock] [int] NULL,
        [TotalLivestockBand] [varchar](10) NOT NULL,
    ) ON [PRIMARY]

    How about:
    CREATE TABLE [Crops](
      [FarmID] [int] NOT NULL,
      CropID INT NOT NULL,
       Quantity INT NOT NULL
       Units VARCHAR(10) NOT NULL
    CONSTRAINT pkCrops PRIMARY KEY (FarmID, CropID) );

    But without more detail, it's hard to know for sure what the proper design should be.

  • pietlinden - Thursday, April 26, 2018 3:48 PM

    NewBornDBA2017 - Thursday, April 26, 2018 2:51 PM

    My team is building a self service portal for users who want to purchase the data from us. Registered users will log into our portal and make the purchase depending on the criteria they specify. For example,  I am a registered users who wants to know who is growing what crops or if they have any livestock etc. The table which contains all the details about any crops and livestock has about 5 millions records. The server we have has enough resources. 128GB of RAM, 64 cores (don't know about storage).
    I am pretty sure that we won't run into any performance related issues but I still want to make sure we build this portal in a way where we don't run into issues later. So the question is that I am not sure which is going to be optimal for performance if we let users put in the number where they specify potatoes = 200 or potatoes > 200 or is it better to have them select from one of the options from the drop down menu where they will have an option to choose from these values (101 - 250, 251 - 500,501 - 1000.1000+) 
    Table structure:
    CREATE TABLE [Crops](
        [FarmID] [int] NOT NULL,
        [Potatoes] [int] NULL,
        [PotatoesBand] [varchar](10) NOT NULL,
        [TotalLivestock] [int] NULL,
        [TotalLivestockBand] [varchar](10) NOT NULL,
    ) ON [PRIMARY]

    How about:
    CREATE TABLE [Crops](
      [FarmID] [int] NOT NULL,
      CropID INT NOT NULL,
       Quantity INT NOT NULL
       Units VARCHAR(10) NOT NULL
    CONSTRAINT pkCrops PRIMARY KEY (FarmID, CropID) );

    But without more detail, it's hard to know for sure what the proper design should be.

    I was wondering if the bands were for quantities. If you need to do ranges, you'd probably want to use integers or some type of numeric - min, max or lowest, highest something along those lines. Similar to having something like Begin date and end date. I still don't really get the context to answer anything in relation to your question.
    But don't let the front end drive your database design. If you need to display a range, it's just a matter of a label or something with: Please enter a value between (code that selected the min value) and (code for max value) If you get the database design to a good point, most of the time much of what you need to do on the front end won't matter in terms of performance. Then you can focus on the User Experience side of things.  I said most of the time as there will always be some one from the business that wants things like every column selected from your 1 billion row table. 

    Sue

  • When thinking about performance of the database, two things should drive your processes. First, you only get one clustered index and it defines data storage. Therefore, the most common path to the data should, generally, be your clustered index. That is frequently the primary key, but it isn't always the primary key. It's frequently a single column, but it isn't always a single column. It's frequently a unique value, but it isn't always a unique value. Identify that common path to the data and create an appropriate clustered index. Step one.

    Next, normalize your storage. Yes, joins are not cost-free. However, the data accuracy and storage savings that they provide more than make up for the cost they add to your querying. Add that to the fact that the query engine is designed and built around the concept that you will have relationships between multiple tables, you get an improvement in performance.

    As  Sue has already pointed out, getting you specifics without more requirements is tough. However, I can very much recommend you focus on these two ideas, even without knowing anything about your situation.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Sue_H - Thursday, April 26, 2018 3:36 PM

    NewBornDBA2017 - Thursday, April 26, 2018 2:51 PM

    My team is building a self service portal for users who want to purchase the data from us. Registered users will log into our portal and make the purchase depending on the criteria they specify. For example,  I am a registered users who wants to know who is growing what crops or if they have any livestock etc. The table which contains all the details about any crops and livestock has about 5 millions records. The server we have has enough resources. 128GB of RAM, 64 cores (don't know about storage).
    I am pretty sure that we won't run into any performance related issues but I still want to make sure we build this portal in a way where we don't run into issues later. So the question is that I am not sure which is going to be optimal for performance if we let users put in the number where they specify potatoes = 200 or potatoes > 200 or is it better to have them select from one of the options from the drop down menu where they will have an option to choose from these values (101 - 250, 251 - 500,501 - 1000.1000+) 
    Table structure:
    CREATE TABLE [Crops](
        [FarmID] [int] NOT NULL,
        [Potatoes] [int] NULL,
        [PotatoesBand] [varchar](10) NOT NULL,
        [TotalLivestock] [int] NULL,
        [TotalLivestockBand] [varchar](10) NOT NULL,
    ) ON [PRIMARY]

    That's hard to say with no context on any of this.
    What's a PotatoesBand and a TotalLivestockBand? What is the the user doing when entering something like potatoes=200 and potatoes > 200? And what is the difference and what happens when they select one or the other?

    Sue

    The DDL I shared is incomplete because I was just trying to make my audience understand the structure. There are way more columns than what I had mentioned. Besides potatoes, we have 15+ vegetables as well as columns about cows, goats, etc. The portal will allow users to enter values such as potatoes = 20 (there is OR criteria, there is only AND) if they want to see the list of farmers who are growing 20 potatoes or more than 20 potatoes or less than 20 potatoes. Based on their selection, the report will be generated which will have farmers details.
    Anything which ends with band is the range value we are using which I have shared in a form of an attachment. 
    Totallivestock would be a farmer having 50 or 100 cows and the column Totallivestockband is going to be a range like 1-100 or 101-200. I am not sure if I have answered your question correctly?

  • NewBornDBA2017 - Friday, April 27, 2018 12:35 PM

    Sue_H - Thursday, April 26, 2018 3:36 PM

    NewBornDBA2017 - Thursday, April 26, 2018 2:51 PM

    My team is building a self service portal for users who want to purchase the data from us. Registered users will log into our portal and make the purchase depending on the criteria they specify. For example,  I am a registered users who wants to know who is growing what crops or if they have any livestock etc. The table which contains all the details about any crops and livestock has about 5 millions records. The server we have has enough resources. 128GB of RAM, 64 cores (don't know about storage).
    I am pretty sure that we won't run into any performance related issues but I still want to make sure we build this portal in a way where we don't run into issues later. So the question is that I am not sure which is going to be optimal for performance if we let users put in the number where they specify potatoes = 200 or potatoes > 200 or is it better to have them select from one of the options from the drop down menu where they will have an option to choose from these values (101 - 250, 251 - 500,501 - 1000.1000+) 
    Table structure:
    CREATE TABLE [Crops](
        [FarmID] [int] NOT NULL,
        [Potatoes] [int] NULL,
        [PotatoesBand] [varchar](10) NOT NULL,
        [TotalLivestock] [int] NULL,
        [TotalLivestockBand] [varchar](10) NOT NULL,
    ) ON [PRIMARY]

    That's hard to say with no context on any of this.
    What's a PotatoesBand and a TotalLivestockBand? What is the the user doing when entering something like potatoes=200 and potatoes > 200? And what is the difference and what happens when they select one or the other?

    Sue

    The DDL I shared is incomplete because I was just trying to make my audience understand the structure. There are way more columns than what I had mentioned. Besides potatoes, we have 15+ vegetables as well as columns about cows, goats, etc. The portal will allow users to enter values such as potatoes = 20 (there is OR criteria, there is only AND) if they want to see the list of farmers who are growing 20 potatoes or more than 20 potatoes or less than 20 potatoes. Based on their selection, the report will be generated which will have farmers details.
    Anything which ends with band is the range value we are using which I have shared in a form of an attachment. 
    Totallivestock would be a farmer having 50 or 100 cows and the column Totallivestockband is going to be a range like 1-100 or 101-200. I am not sure if I have answered your question correctly?

    Then you have a performance issue.  The data  needs to be normalized.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sounds like a reporting type of database to me, not an OLTP.  Perhaps a star schema may make more sense.  Just another observation.

  • Thanks for the additional info NewBorn. I think everyone is saying similar things in different ways.
    None of us know the business end of things as well as you do so we aren't going to be much help in the database design. Bits and pieces here or there sure but nothing along the lines of how to design the database for a new piece like your portal. I know you weren't asking or expecting that but you really don't want to design a database "by forum". That phase is of design is often more difficult than any other so our first inclination is to ask how to do that since it can result in such a headache from thinking too hard. But no one here is going to know it as well as you do.
    The best way to avoid performance issues is just a decent overall database design so spend the time on that as it will make a significant impact on performance. You won't end up with a table that has columns potatoes, corn, 15+ vegetables, etc and their respective quantities by farm. So a farm has some new crop type that you don't have a column for, you don't want to change tables to accommodate things like that. And wouldn't each of those crop types essentially be like repeating groups? What if they sell firewood and some farms only sell by full cords, some by bundles and all also sell 1/2 cords? (I don't know farming). That leads you towards things like I need a table for farms, I need one for crops, I need one of Unit Of Measurement, etc
    So then if they say the users really want to see a report by farm and what they have available listed in columns across the report, that's still doable but it's a presentation issue, not a database issue. That's some of what I meant by not having the front end, web page whatever drive the database design. You can get to where you need to be with a decent database design. Plenty of people up here would certainly help you out with different areas but you're the one who really knows it and will be able to pull together the pieces you need to get this done.

    Sue

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply