Need to return query result in 0.02 sec

  • To all DBA guru,

    My company is an software company to design ecommerce website. I have to design a shopper database and my boss requires that the website will be hit 5000 times in a sec to require shopper info and I have to return the query result in 0.02 sec.

    BTW each client has its database and there are as many databases as they can to put in the NT box.

    How do I design the database (shopper table) to achieve this performance? The shopper table contains from 5000 to 10 million records depends on clients.

    I was thinking to have the shopper table contains the main info only (the frequent data that the website requires) and the rest data stores in another shopper info table.

    I have designed database before but I never cares too much about this kind of performance issue.;)

    Can someone help me?

    Thanks

  • Steady... don't let this shake you, Loner...

    And 5000 hits per second does not equal .02 seconds... on a single cpu system, that would be .0002 seconds. Multiply that times the number of CPU's and maybe the number of read-write heads on the disk system.

    Dunno what would go into such a table... you do, though. You know what would go into your shopper table. Show us what it looks like in the form of a create statement and show us some sample data (you know the rules... see the URL in my signature line)... then, I'll show you how to generate 10 million rows of test data and we can do some hard-core performance tuning...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Build the table, and determine what the query needs to return. You can index on key fields to return things quickly in an index seek, though you need hardware to support multiple hits.

    You also can cheat. Move stuff not likely to get hit to an archive table automatically. Denormalize data if you need to get performance and take the IO/resource hit for stuff that isn't being hit hard, or can be moved in other processing.

    The key is to think it through, build what your instincts and experience tell you, test it, come back and ask questions.

    Also remember that planning for 5000 hits/sec is a lot. Not many sites get that. Granted you want to design for performance, but work on building it to run quick and efficiently, then test what hardware gets you 1000/sec, 2000/sec, 5000/sec, etc.

  • In addition to what everyone else says, be sure to focus on carefulling picking the clustered index. Because you only get one, you want to be sure it's storing the data in an optimal fashion for later retrieval. Don't assume that the default of a clustered primary key on an identity column is necessarily the right place for the cluster (or that an identity column is the right answer for the PK for that matter). Testing, testing and more testing will be the most important thing you can do.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • The current shopper table was developed by the web developers so it was terrible, no foreign key to other tables that used shopper id.

    Here is the old shopper table.

    CREATE TABLE Shopper (

    [shopper_id] [char](32) NOT NULL Primary key,

    [shopper_guid] [char](32) NULL,

    [date_created] [datetime] NOT NULL,

    [name] [varchar](100) NULL,

    [first_name] [varchar](100) NULL,

    [last_name] [varchar](100) NULL,

    [company] [varchar](100) NULL,

    [password] [varchar](100) NULL,

    [street] [varchar](100) NULL,

    [street1] [varchar](100) NULL,

    [city] [varchar](100) NULL,

    [state] [varchar](100) NULL,

    [zip] [varchar](100) NULL,

    [country] [varchar](100) NULL,

    [phone] [varchar](100) NULL,

    [fax] [varchar](100) NULL,

    [varchar](100) NOT NULL,

    [shopper_pref_1] [varchar](100) NULL,

    [shopper_pref_2] [varchar](100) NULL,

    [shopper_pref_3] [varchar](100) NULL,

    [shopper_pref_4] [varchar](100) NULL,

    [shopper_text_info] [text] NULL,

    [shopper_text_history] [text] NULL,

    [discount_code] [varchar](100) NULL,

    [preferred_code] [varchar](100) NULL,

    [orders] [int] NULL Default 0,

    [orders_value] [int] NULL DEFAULT 0,

    [pos_synchronized] [tinyint] NULL,

    [rpro_cust_sid] [varchar](30) NULL,

    [langpref] [varchar](30) NULL,

    [emailformatpref] [tinyint] NULL,

    [passwordReminder] [varchar](100) NULL,

    [passwordReqid] [varchar](100) NULL,

    [CustomerID] [varchar](100) NULL,

    [Prefix] [varchar](30) NULL,

    [Gender] [tinyint] NULL,

    [ShopperBirthDate] [datetime] NULL,

    [Custom1] [varchar](100) NULL,

    [Custom2] [varchar](100) NULL,

    [Custom3] [varchar](100) NULL,

    [Custom4] [varchar](100) NULL,

    [Custom5] [varchar](100) NULL,

    [lastupdate] [datetime] NULL,

    [updatesource] [varchar](30) NULL,

    [Memberstatus] [tinyint] NULL,

    [Membernumber] [varchar](32) NULL,

    [Memberpointstodate] [int] NULL Default 0,

    [Memberpointbalance] [int] NULL Default 0,

    [Memberpointout] [int] NULL DEFAULT 0,

    [Membershipjoindate] [datetime] NULL,

    [Groupid] [varchar](30) NULL,

    [taxstatus] [varchar](30) NULL,

    [taxid] [varchar](100) NULL,

    [Giftpointstodate] [int] NULL DEFAULT 0,

    Giftpointsbalance] [int] NULL DEFAULT 0,

    [Giftpointsout] [int] NULL DEFAULT 0,

    [Coupon_history] [text] NULL,

    [date_modified] [datetime] NOT NULL ,

    [date_modified_by] [varchar](100) NULL)

    Email is unique non-cluster index.

    This is what I want to modify.

    CREATE TABLE Shopper (

    shopper_id INT IDENTITY(1,1) NOT NULL,

    shopper_guid char(32) NULL,

    date_created datetime NOT NULL,

    first_name varchar(100) NULL,

    last_name varchar(100) NULL,

    company varchar(100) NULL,

    email varchar(100) NOT NULL,

    shopper_password varchar(100) NULL,

    CustomerID varchar(100) NULL,

    Prefix varchar(30) NULL,

    Gender tinyint NULL,

    ShopperBirthDate datetime NULL,

    rpro_cust_sid varchar(30) NULL,

    langpref varchar(30) NULL,

    emailformatpref tinyint NULL,

    passwordReminder varchar(100) NULL,

    passwordReqid varchar(100) NULL,

    Groupid varchar(30) NULL,

    taxstatus varchar(30) NULL,

    taxid varchar(100) NULL,

    date_modified datetime NOT NULL DEFAULT (getdate()),

    date_modified_by varchar(100) NULL)

    Email non-cluster index

    CREATE TABLE Address_type (

    Address_type_id INT IDENTITY(1,1) NOT NULL PRIMARY KEY,

    Address_type VARCHAR(100))

    CREATE TABLE Shopper_address(

    ShopperAddressID INT IDENTITY(1,1) NOT NULL PRIMARY KEY,

    Address_type_id INT Foreign key AddressType(AddressTypeID),

    Address1 VARCHAR(225) NULL,

    Address2 VARHCAR(225) NULL,

    City VARCHAR(100) NULL,

    State VARCHAR(100) NULL,

    ZIP varchar(50) NULL,

    Country VARCHAR(100) NULL,

    Phone VARCHAR(50) NULL,

    Fax VARCHAR(50) NULL,

    ModifiedDate DATETIME DEFAULT GETDATE(),

    MOdifiedBy VARCHAR(100))

    The rest of the shopper fields I am thinking to put in a Shopper_supplimentary_table. I asked the business analyst to make sure how many shopper actually has those fields populated.

    With concurrent user = 5000, Assume the table has 5 million records and I need to return the shopper info in 0.002 sec. Is this design ok?

    Actually Email is the sign on ID, should I make it into cluster index and make the PK to non-cluster index?

    Thanks

  • I asked the business analyst to make sure how many shopper actually has those fields populated.

    Wait a minute... are you saying that a system already exists and already has customers loaded into tables?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • If the emailaddress is the primary means of accessing this table, then yeah, I'd make it the clustered index. If you have it as a unique non-clustered index, you'll have to do a lookup to the clustered index to get the data, which will kill performance. You'll want to test that.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • It also looks like you are using SQL Server 2005, so when considering indexes, also look at covered indexes using included columns. Remember there is a trade off using the included columns as the database engine has to update redundant data stored in the indexes as well as the data pages.

  • Steve Jones - Editor (3/4/2008)


    Build the table, and determine what the query needs to return. You can index on key fields to return things quickly in an index seek, though you need hardware to support multiple hits.

    You also can cheat. Move stuff not likely to get hit to an archive table automatically. Denormalize data if you need to get performance and take the IO/resource hit for stuff that isn't being hit hard, or can be moved in other processing.

    The key is to think it through, build what your instincts and experience tell you, test it, come back and ask questions.

    Also remember that planning for 5000 hits/sec is a lot. Not many sites get that. Granted you want to design for performance, but work on building it to run quick and efficiently, then test what hardware gets you 1000/sec, 2000/sec, 5000/sec, etc.

    Another easy way to cheat: small static tables (like reference tables) should be turned into XML files and cached on the IIS box. No sense in having to go back to SQL for stuff that you don't need to. With that much activity - any request you can fulfill without a round trip to SQL should be done some other way.

    New values trigger something to recreate the XML file on the IIS box.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Grant Fritchey (3/5/2008)


    If the emailaddress is the primary means of accessing this table, then yeah, I'd make it the clustered index. If you have it as a unique non-clustered index, you'll have to do a lookup to the clustered index to get the data, which will kill performance. You'll want to test that.

    I would just steer clear of making it the Primary key. Email addresses change, and sometimes really often. I have 7 right now, and there's no telling which one I might favor using at any given time (I've actually been trying to get rid of 2 for some time, but I keep getting e-mail I want on them); I also know I am not unique in that. Having to propagate PK changes will wreck your perf, in my experience. Keep them as surrogate keys, etc.... but don't base your relations on them.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • What data in your shopper table actually gets used frequently?

    A one-to-one relationship is not necessarily a bad thing so if you put your main data in one table and the largely superfluous stuff in a subsidiary table then at least the database will have more records per page and should be faster.

    Just because a shopper is a database table doesn't necessarily mean that your solution should be purely database driven.

    A lot of information is mainly static so get the app developers to load it into cache that way a large propotion of requests for data won't even require database access!

  • Matt Miller (3/5/2008)


    Grant Fritchey (3/5/2008)


    If the emailaddress is the primary means of accessing this table, then yeah, I'd make it the clustered index. If you have it as a unique non-clustered index, you'll have to do a lookup to the clustered index to get the data, which will kill performance. You'll want to test that.

    I would just steer clear of making it the Primary key. Email addresses change, and sometimes really often. I have 7 right now, and there's no telling which one I might favor using at any given time (I've actually been trying to get rid of 2 for some time, but I keep getting e-mail I want on them); I also know I am not unique in that. Having to propagate PK changes will wreck your perf, in my experience. Keep them as surrogate keys, etc.... but don't base your relations on them.

    Oh, if I wasn't clear, my bad. I wouldn't make it the primary key either. I'm just thinking the clustered index.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • The system is already existed but my company tries to re-write it. The reason is the existing system cannot handle more than 5 million shoppers.

    The system is a basically a database that contains a whole bunch of info that when you go to the web site eg Amazon.com.

    The shopper table is the big one that contains the people sign on to the web site. Then there is basket table. When you click something to put in the basket. Of course there are other tables liked product and order.

    However the first query when you go to the web site and sign on, that is the one retrieves the shopper info, probably use the cookie stores in your pc.

    The existing DB is terrible, the basket table has no foreign key to the shopper table, so I find out some shopper id in the basket table does not exist in the shopper table. The order table shopper id is not a foreign key to shopper table either.

    God knows who designed this database. But the worst thing is somehow it cannot handle the company has more than 5 million shoppers. The query came back too slow.

    Also I find out my manager would design something to return faster query than using standard normalize form. For example if you need to get something using a join, he would rather create a table. They want to capture what the shopper's favorite product. I would create the following table.

    ShopperFavorite

    shopperfavoriteID INT

    ShopperID INT foreign key to shopper table

    ProductID INT foreign key to product table

    My manager would rather create the table

    Shopperfavorite

    ShopperID INT

    ProductName

  • Ok...when a shopper logs in, what are the key field(s) that are looked at in the "old" table? Also, how long is it currently taking to validate the shopper?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Loner (3/5/2008)


    God knows who designed this database. But the worst thing is somehow it cannot handle the company has more than 5 million shoppers. The query came back too slow.

    Can you post that query and its exec plan? Might help us suggest solutions.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply