Refresh question

  • I have a web directory that will be populated with the most current employees at my institution. The employee information comes from our ERP system which is not SQL Server. My question is, is it better to query the other system, then compare each field for differences and update the SQL web directory or should I truncate the table and populate with all of the new data? What is the best practice?

    Thanks,

    Mike

    Mike Scalise, PMP
    https://www.michaelscalise.com

  • There's no best practice here. If the truncate works, and it doesn't impact your system, then that is simpler to me. However this might reset things or cause issues, so be sure it isn't a problem.

    If it is, I'd load the data completely from the other system into a staging table, and do the comparisons in SQL Server.

  • Try a couple of solutions.

    The SQL 2008 "Merge" command might do exactly what you need. Try that.

    Truncate and reload works, but gets slower and slower as you get more and more data to load, and isn't necessary if most of the data is static. It's best in some situations, and it's easy to build and manage, but it's also got definite drawbacks.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thanks Steve. If I were to load the data into a staging table in sql server, do you think the best way to do the comparison would be to update every field in the record where, for example,

    a.lastname <> b.lastname or a.firstname <> b.firstname, etc.?

    I'm thinking that would capture the record if any fields do not match their corresponding fields in the staging table, then i'd be blanket updating every field in that record.

    Is there a way to efficiently do a field by field comparison and only update that field which is different?

    The way I just mentioned seems to be clean and shouldn't be too taxing on the system, right?

    Mike Scalise, PMP
    https://www.michaelscalise.com

  • GSquared, you're right, that may be what I need. Thanks.

    Mike Scalise, PMP
    https://www.michaelscalise.com

  • the MERGE command suggested by GSquared works well.

    However slowness depends on how you do things. If you have a way to just pull across the changes from your source system, I'd do that. You can load those into a staging table and then MERGE, or if you aren't familiar, I've done an UPDATE the matches, then DELETE the matches, then INSERT everything else.

    I was thinking that you didn't know what had changed, which can be common in many directories that are simple. In that case, if you need to check all entries, a line by line comparison doesn't make sense. Just move it all. If you can tell the changes, move those.

    One more thing, I might store the last update time in a table in SQL Server somewhere so you know which changes to get. Trying for changes every day works until it fails, and then you have to figure out how far to go back. Instead, if you always query for DateChanged > LastXferDate, you'll be fine.

  • I realized that I cannot use the MERGE function as we are running SQL Server 2005, so let me throw another question out there. Let's say I have three tables: People, Positions and People_Positions. I am thinking of truncating the data for people and positions and reloading it each night, but there are certain people and positions that will not be part of the dataset that's coming over. They are only in the SQL Server database and therefore I want to preserve those records. So let's say I go so far as to write a query that reads

    DELETE FROM People WHERE id > 25

    with records 1-25 being the ones I want to preserve, then do an insert with the dataset that's coming over in the refresh. That should solve problem number 1 of preserving the records that are only in SQL Server. However, I think that will cause me problems with the foreign keys in the People_Positions table as the id fields from the People and Positions tables will have different numbers and cause mismatched positions, right?

    I guess my question is, if I'm going to go the delete everything and reload route, do I have to rebuild the bridge tables too, and if so, how would I do that. I manually entered the records into the bridge table to say Person X (id 1) has Position Y (id 15), so in People_Positions, PersonId = 1 and PositionId = 15.

    The heart of the issue is that there's some data currently in the SQL Server database that should remain untouched, and possibly more in the future, and then there's data from the ERP system that should be updated on a nightly basis, and I'm looking for the way to do it that makes the most sense.

    I'm really close, I just need an expert to help me get past this road block. I'm not sure if this will help, but here are my table structures. Thanks.

    CREATE TABLE [dbo].[People](

    [ID] [int] IDENTITY(1,1) NOT NULL,

    [LastName] [varchar](25) NOT NULL,

    [FirstName] [varchar](25) NOT NULL,

    [MiddleName] [varchar](25) NULL,

    [EmpEmail] [varchar](50) NULL,

    [IsHiddenFromEmpDir] [bit] NULL,

    [StuEmail] [varchar](50) NULL,

    [IsHiddenFromStuDir] [bit] NULL,

    CONSTRAINT [PK_People] PRIMARY KEY CLUSTERED

    (

    [ID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    CREATE TABLE [dbo].[Positions](

    [ID] [int] IDENTITY(1,1) NOT NULL,

    [Title] [varchar](200) NOT NULL,

    CONSTRAINT [PK_Positions_1] PRIMARY KEY CLUSTERED

    (

    [ID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    CREATE TABLE [dbo].[People_Positions](

    [PersonID] [int] NOT NULL,

    [PositionID] [int] NOT NULL,

    [PreferredTitle] [varchar](200) NULL,

    [PhoneNo] [varchar](14) NOT NULL,

    [PhoneExt] [varchar](4) NULL,

    [AltPhoneNo] [varchar](14) NULL,

    [AltPhoneNoExt] [varchar](4) NULL,

    [FaxNo] [varchar](14) NULL,

    [OfficeBldgId] [int] NOT NULL,

    [OfficeRoom] [varchar](10) NULL,

    [PositionType] [varchar](50) NOT NULL,

    [DepartmentID] [int] NOT NULL,

    CONSTRAINT [PK_Positions] PRIMARY KEY CLUSTERED

    (

    [PersonID] ASC,

    [PositionID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    PositionType in People_Positions is a way that I can distinguish which records should remain untouched.

    Mike Scalise, PMP
    https://www.michaelscalise.com

  • I assume that your ERP system also has some sort of ID column that can be used to uniquely identify each person. Why not add that data as an additional column to your People table. You should be able to use that in combination with your PositionType column to determine whether to do an update or insert.

  • I can understand how that would make things a little easier, but I'm not sure how that would help in rebuilding the People_Positions table in terms of the PositionID which will not have that additional column that you spoke of that the People table will have, so won't that connection be broken?

    Mike Scalise, PMP
    https://www.michaelscalise.com

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply