SQLServer2008-Multiple Import Process Simultaneous

  • Requirement:

    The query is regarding data modeling for core functionality of my application.

    I have a scenario where multiple users will be doing import processes, but all of them will be working for different clients. Also, at the same time client's data could be shown to the user and can be modified/inserted too, while the import process for the same or different client is in process.

    I have 2 core tables which get the most hits whenever import processes run. I have 2 options now

    1. To have 2 core tables and do the sequential imports by making queue for the import processes.

    Table 1

    ID

    ClientID

    SourceID

    Count

    AnotherCol1

    AnotherCol2

    AnotherCol3

    Table 2

    ID

    ClientID

    OrderID

    Count

    AnotherCol4

    AnotherCol5

    AnotherCol6

    2. To have 1000 core table, 2 for each client (I may have maximum 500 clients), it will allow the users to work on the import processes simultaneously without waiting for one another.

    More information about the import process:

    1. These table is not going to be used in any Reporting.

    2. Each import process will insert 20k-30k records (7 columns) in these each table. And there will be around 40-50 such imports in a day.

    3. While the import process is going on, data could be retrieved from these tables by some other user and INSERT OR UPDATED too.

    4. These are going to be one of the most usable tables in the application.

    5. BULK INSERT will be used for insertion.

    6. Clustered index is on the Primary Key which is an Identity column.

    7. We are considering the table partitioning too.

    Can you please suggest which option is better and why?

    Also, if you suggest to go with option 2, then would it not be a performance hit to create so many tables in the database? Should we create a separate database for these 1000 tables in this case?

  • You don't need to create 1,000 tables.

    If you create a staging table then you can minimize the impact on performance.

    You could create a job to load the data from the staging table and control when it got loaded, how many which client records got loaded, etc.

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Hi Welsh

    Thanks for that. It helped.

  • I would start out by creating the objects, code, etc in the development environment.

    Try simulating on a smaller scale the intended scenario on the production environment.

    Then as you move to the Test & QA Environment you can hit it with a heavy load.

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • I'll be following the same to see if it works for me.

  • Sounds like a fun project.

    I would probably write a program that sequentially processing one file at a time.

    You could create a table that has a column to set the process to active or inactive (Active Boolean).

    If you want to end the program and not process the next file you change the value in the table.

    When the current file completes it checks the value of Active it determines that the program (SQL) should end.

    You should wrap your code inside a transaction so that if you need to kill the process before the import of a particular completed processing, etc.

    You should also have error handling and transaction in the staging table as well.

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • What I realized is that importing 40k-50k records (with 10-15 columns) into a staging table using BULK INSERT take 2-3 seconds.

    Once the data is in staging then I can transform it and then use SQL Server 2008's MERGE statement to make changes in my main table which also does not take long (2-3 seconds) on a heavy database.

    Hence, BULK INSERT->Transformations->MERGE works for me.

    Anyways, Thanks for giving attention to me problem.

    You can still provide your comments on my above method.

  • Sounds good.

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Hi Welsh

    Before finalizing my DB design, I seek you advice again to make sure I am going the right way. To give you more insight into my database schema so that you can address the issue properly, please have a look at the below:

    1.Firstly, please have a look at the attached database design.

    •MS_HospitalTrusts - This refers the actual trust.

    •MS_Hospitals - TrustID is the foreign key in this table. There will be many hospitals under one trust, hence ther 1-n relationship.

    •MS_HospitalUnits - HospitalID is the foreign key in this table. There will be 3-4 units under one hospital, hence ther 1-n relationship.

    •MS_HospitalDepartments: UnitID is the foreign key in this table. There will be 30-40 departments under one unit, hence ther 1-n relationship.

    •MS_HospitalPatients: DepartmentID is the foreign key in this table. This is the table in which acutal import has to happen. There will be 20K-30 records/import i.e. per department, hence ther 1-n relationship.

    2.Secondly, data in table MS_HospitalPatients is going to be of same format for all trusts.

    Main Problem

    As there will be multiple imports simultaneously and also at the same time system will be reading data (dirty read), showing in UI where user can make changes in the data. All these things will happen on MS_HospitalPatients table.

    My Analysis

    What I analyzed is that importing 40k-50k records into a staging table using BULK INSERT take 2-3 seconds.

    Once the data is in staging then I can transform it and then use SQL Server 2008's MERGE statement to make changes in my main table i.e. MS_HospitalPatients which should not take long even on a heavy database.

    Hence, BULK INSERT->Transformations->MERGE seems to be the solution.

    Questions for you

    1.Can you please comment on my proposed solution explained above? Should I go with one common table i.e MS_HospitalPatients for all hosipitals/units/departments OR should I create separate MS_HospitalPatients table for each trust (which will be created when a new trust is added) and then use dynamic queries to insert/update/delete/select from that trust specific table.

    2.Data from MS_HospitalPatients table would be accessed based on UnitID wise most of the times and DataID wise sometimes. So, should the table MS_HospitalPatients be de-normalized by adding field UnitID into it to avoid joins with table MS_HospitalDepartments which contains the UnitID. In either cases what should be the clustered index for table MS_HospitalPatients table and why?

    3.On what basis (columns) the table partioning should be implemented in MS_HospitalPatients table? Would it be good idea to add another column HospitalID in MS_HospitalPatients table and to do the table partioning on that column?

    4.What other techniqiues (like indexing, table partioning) can be implemented to make the database design efficient, robust so that there are NO performance issues later on?

    Kindly provide your reply at the earliest.

  • With respect to the design thanks for the information. You provided the Foreign Keys but what table and column does the child tables relate to (Parent Table & Column)?

    I'm familiar with US Health Care but I'm not not familiar with Health Care Trust? I looked it up but it was very generic.

    What is the difference between a unit and a department?

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • imadiamond2016 (8/23/2011)


    1. Can you please comment on my proposed solution explained above? Should I go with one common table i.e MS_HospitalPatients for all hosipitals/units/departments OR should I create separate MS_HospitalPatients table for each trust (which will be created when a new trust is added) and then use dynamic queries to insert/update/delete/select from that trust specific table.

    You definitely should not create a separate table you probably should create one table and add a column to identify the trust but I'm not sure about your relationships.

    I would not denormalize the table. Keep it in the same structure but add a column to identify the source of the input. I suggested staging tables to minimize the impact on performance. If you denormalize the data then you are going to have to turn around and normalize the data.

    On what basis (columns) the table portioning should be implemented in MS_HospitalPatients table? Would it be good idea to add another column HospitalID in MS_HospitalPatients table and to do the table portioning on that column?

    Yes add another column. When you say partition I think you mean to be able to distinguish the record, not an actual partition.

    What other techniques (like indexing, table partioning) can be implemented to make the database design efficient, robust so that there are NO performance issues later on?

    Why don't you start on this and refine your design and then we can provide recommendations with respect to partitioning. You need to define your process and make sure you are happy with your design.

    Once your basic design is good then you can worry about partitioning.

    As far as your diagram, I don't open attachments and I can tell what the diagram looks like by looking at the DDL.

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Hi Hugo

    I am providing the below things which you can find at the end of my reply:

    A snap shot of how the data would look like once the data for all of the departments is imported.

    Scripts of Database Schema

    Points to consider while analyzing are:

    1. Trust is an organization which runs the hospitals. And there will be many trusts in the system.

    2. Though, I have shown only 1 hospital under a trust, but in actual scenario there will be many hospitals under a trust (That’s why we have TrustID 1:N in table MS_Hospitals) and for each hospital we will have a different set of values for Units, Departments and Amounts.

    3. There will be around 2-3 units under each hospital. That’s why we have HospitalID 1:N in table MS_HospitalUnits.

    4. There will be around 20-30 deparments under each unit. That’s why we have UnitID 1:N in table MS_HospitalDepartments.

    5. There will be around 20K-30K patient ids in each deparment. That’s why we have DepartmentID 1:N in table MS_HospitalPatients.

    AND THE IMPORT IS DONE FOR EACH DEPARTMENT, HENCE THERE WILL BE ONLY ONE ENTRY IN TABLE MS_HospitalDepartments AND 20K-30K ENTERIES IN MS_HospitalPatients TABLE FOR EACH IMPORT.

    This is going to be the core data which we will receive in XML and will be imported in the system. So, we will be importing patient’s data for each Department which belongs to a Unit which further belongs to a Hospital & which further belongs to a Trust.

    6. DataID column in table MS_HospitalPatients is a surrogate key and it can be removed.

    Agreed Points

    I agreed that I would need only one table NOT multiple tables (accessed with dynamic queries)

    Challanges

    The database schema prepared by me is a normalized one. Now, as I have explained the business logic and provided the schema, please help in fixing my next two challenges i.e. De-Normlazation and Index

    1. De-Normlization:

    Data from MS_HospitalPatients table would be accessed on the basis of HospitalID and UnitID mostly and DepartmentID sometimes. So, should the table MS_HospitalPatients be de-normalized by including fields HospitalID, UnitID into it to avoid joins with table MS_HospitalDepartments and MS_HospitalUnits which contains the UnitID and HospitalID.

    So, I strongly feel that HospitalID and UnitID should be included. This de-normalization is a very-2 important point here in terms of performance, it could save a couple of joins in my core screen which could result in a better user experience then. Please emphasize on that.

    2. Indexes:

    a. What should be the clustered index for table MS_HospitalPatients table and why?

    b. What non-clustered indexes be added for table MS_HospitalPatients table and why?

    Once I am clear about these two, I would be able to look into “Table Partioning” and some other features which can further help improve the performance.

    How the data would look like after import, please see the attached file.

    Scripts for Database Schema

    /****** Object: Table [dbo].[MS_HospitalTrusts] Script Date: 08/25/2011 00:56:10 ******/

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[MS_HospitalTrusts]') AND type in (N'U'))

    DROP TABLE [dbo].[MS_HospitalTrusts]

    GO

    /****** Object: Table [dbo].[MS_HospitalTrusts] Script Date: 08/25/2011 00:56:10 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE TABLE [dbo].[MS_HospitalTrusts](

    [TrustID] [int] IDENTITY(1,1) NOT NULL,

    [TrustName] [nvarchar](100) NOT NULL,

    CONSTRAINT [PK_MS_Trusts] PRIMARY KEY CLUSTERED

    (

    [TrustID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    IF EXISTS (SELECT * FROM sys.foreign_keys WHERE object_id = OBJECT_ID(N'[dbo].[FK_MS_Hospitals_MS_Trusts]') AND parent_object_id = OBJECT_ID(N'[dbo].[MS_Hospitals]'))

    ALTER TABLE [dbo].[MS_Hospitals] DROP CONSTRAINT [FK_MS_Hospitals_MS_Trusts]

    GO

    /****** Object: Table [dbo].[MS_Hospitals] Script Date: 08/25/2011 00:56:31 ******/

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[MS_Hospitals]') AND type in (N'U'))

    DROP TABLE [dbo].[MS_Hospitals]

    GO

    /****** Object: Table [dbo].[MS_Hospitals] Script Date: 08/25/2011 00:56:31 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    SET ANSI_PADDING ON

    GO

    CREATE TABLE [dbo].[MS_Hospitals](

    [HospitalID] [int] IDENTITY(1,1) NOT NULL,

    [HospitalName] [varchar](400) NOT NULL,

    [TrustID] [int] NOT NULL,

    CONSTRAINT [PK_MS_Hospitals] PRIMARY KEY CLUSTERED

    (

    [HospitalID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    SET ANSI_PADDING OFF

    GO

    ALTER TABLE [dbo].[MS_Hospitals] WITH CHECK ADD CONSTRAINT [FK_MS_Hospitals_MS_Trusts] FOREIGN KEY([TrustID])

    REFERENCES [dbo].[MS_HospitalTrusts] ([TrustID])

    GO

    ALTER TABLE [dbo].[MS_Hospitals] CHECK CONSTRAINT [FK_MS_Hospitals_MS_Trusts]

    GO

    IF EXISTS (SELECT * FROM sys.foreign_keys WHERE object_id = OBJECT_ID(N'[dbo].[FK_MS_HospitalUnits_MS_Hospitals]') AND parent_object_id = OBJECT_ID(N'[dbo].[MS_HospitalUnits]'))

    ALTER TABLE [dbo].[MS_HospitalUnits] DROP CONSTRAINT [FK_MS_HospitalUnits_MS_Hospitals]

    GO

    /****** Object: Table [dbo].[MS_HospitalUnits] Script Date: 08/25/2011 00:56:57 ******/

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[MS_HospitalUnits]') AND type in (N'U'))

    DROP TABLE [dbo].[MS_HospitalUnits]

    GO

    /****** Object: Table [dbo].[MS_HospitalUnits] Script Date: 08/25/2011 00:56:57 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE TABLE [dbo].[MS_HospitalUnits](

    [UnitID] [int] IDENTITY(1,1) NOT NULL,

    [HospitalID] [int] NOT NULL,

    [UnitName] [nvarchar](100) NOT NULL,

    CONSTRAINT [PK_MS_HospitalUnits] PRIMARY KEY CLUSTERED

    (

    [UnitID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[MS_HospitalUnits] WITH CHECK ADD CONSTRAINT [FK_MS_HospitalUnits_MS_Hospitals] FOREIGN KEY([HospitalID])

    REFERENCES [dbo].[MS_Hospitals] ([HospitalID])

    GO

    ALTER TABLE [dbo].[MS_HospitalUnits] CHECK CONSTRAINT [FK_MS_HospitalUnits_MS_Hospitals]

    GO

    IF EXISTS (SELECT * FROM sys.foreign_keys WHERE object_id = OBJECT_ID(N'[dbo].[FK_MS_HospitalDepartments_MS_HospitalUnits]') AND parent_object_id = OBJECT_ID(N'[dbo].[MS_HospitalDepartments]'))

    ALTER TABLE [dbo].[MS_HospitalDepartments] DROP CONSTRAINT [FK_MS_HospitalDepartments_MS_HospitalUnits]

    GO

    /****** Object: Table [dbo].[MS_HospitalDepartments] Script Date: 08/25/2011 00:57:15 ******/

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[MS_HospitalDepartments]') AND type in (N'U'))

    DROP TABLE [dbo].[MS_HospitalDepartments]

    GO

    /****** Object: Table [dbo].[MS_HospitalDepartments] Script Date: 08/25/2011 00:57:15 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    SET ANSI_PADDING ON

    GO

    CREATE TABLE [dbo].[MS_HospitalDepartments](

    [DepartmentID] [int] IDENTITY(1,1) NOT NULL,

    [DepartmentName] [varchar](100) NULL,

    [UnitID] [int] NOT NULL,

    CONSTRAINT [PK_MS_HospitalDepartments] PRIMARY KEY CLUSTERED

    (

    [DepartmentID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    SET ANSI_PADDING OFF

    GO

    ALTER TABLE [dbo].[MS_HospitalDepartments] WITH CHECK ADD CONSTRAINT [FK_MS_HospitalDepartments_MS_HospitalUnits] FOREIGN KEY([UnitID])

    REFERENCES [dbo].[MS_HospitalUnits] ([UnitID])

    GO

    ALTER TABLE [dbo].[MS_HospitalDepartments] CHECK CONSTRAINT [FK_MS_HospitalDepartments_MS_HospitalUnits]

    GO

    IF EXISTS (SELECT * FROM sys.foreign_keys WHERE object_id = OBJECT_ID(N'[dbo].[FK_MS_HospitalPatients_MS_Sources]') AND parent_object_id = OBJECT_ID(N'[dbo].[MS_HospitalPatients]'))

    ALTER TABLE [dbo].[MS_HospitalPatients] DROP CONSTRAINT [FK_MS_HospitalPatients_MS_Sources]

    GO

    /****** Object: Table [dbo].[MS_HospitalPatients] Script Date: 08/25/2011 00:57:32 ******/

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[MS_HospitalPatients]') AND type in (N'U'))

    DROP TABLE [dbo].[MS_HospitalPatients]

    GO

    /****** Object: Table [dbo].[MS_HospitalPatients] Script Date: 08/25/2011 00:57:32 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE TABLE [dbo].[MS_HospitalPatients](

    [DataID] [int] IDENTITY(1,1) NOT NULL,

    [DepartmentID] [int] NOT NULL,

    [PatientID] [int] NOT NULL,

    [Amount] [bigint] NOT NULL,

    CONSTRAINT [PK_MS_HospitalPatients] PRIMARY KEY CLUSTERED

    (

    [DataID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[MS_HospitalPatients] WITH CHECK ADD CONSTRAINT [FK_MS_HospitalPatients_MS_Sources] FOREIGN KEY([DepartmentID])

    REFERENCES [dbo].[MS_HospitalDepartments] ([DepartmentID])

    GO

    ALTER TABLE [dbo].[MS_HospitalPatients] CHECK CONSTRAINT [FK_MS_HospitalPatients_MS_Sources]

    GO

  • Are users going to be entering data into the HospitalPatients Table?

    If so you should not Denormalize.

    This will impact on your selection of a Clustered Index.

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Can you please provide a detailed reply for both of my questions regarding De-normalization and Indexes, so that I can understand the logic behind that.

  • imadiamond2016 (8/24/2011)


    Can you please provide a detailed reply for both of my questions regarding De-normalization and Indexes, so that I can understand the logic behind that.

    Please answer my questions in the previous post.

    If you are going to enter data then you should not denormalize.

    The HospitalPatients Table Primary Key should include the HospitalID and The PatientID.

    I'm not sure why you have the DepartmentID on that Table.

    You Database is not in the 3rd normal form.

    Please don't name every table with a prefix of Hospital. If it is an intersection table than use the two entity tables involved.

    The naming conventions need to be corrected.

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply