Seeking input on Best Method to Join on First or Last Row in a 1:M Join

  • This is correct You got so much wrong in your design. I'm trying to figure out where to start. Your column declarations make no sense; a variable length. 30 characters ZIP penal_code? In what universe? These are even tables because they don't have a key. Having a table named "Person" is absurd; it's the equivalent of having a table named "Things", which would also violate the law of identity (remember your freshman logic course? To be is to be something in particular; to be nothing in particular or something in general, is to be nothing at all). Also, since you have a singular table name that means you have only one entry in the table so it is this person do? What role does he play in your logical data model? Since all of your columns can be null, there's no way you could ever have a key, and this could ever be a table!

    If you get a chance to spend some time reading ISO and other standards, you will notice there is no such thing as a universal magic generic "code", and that the codes they do standardize are usually fixed length. This is so we can figure out how to put them into printed forms, display screens, and so on,

    While they are very boring, you might want to actually read the ISO 11179 naming rules for data elements. You might also want to stop using the formatting that we used to have on punchcards over 50 years ago. All uppercase data element names are hard to read and lead to a high error rate (I spent a couple of years at AIRMICS looking at the research on code readability).

    Let me do a quick rewrite on all of this, using International Postal Union recommendations for the length of various fields in an address label. These standards are based on a 3-1/2 inch label using a 10 characters per inch printer and they been established for several decades. I also arbitrarily decided, since we have no other clues, that you're dealing with criminals! Hey, why not? Are you starting to understand when I say things cannot be so generic?

    Besides not having the key, you don't seem to understand that identifiers are not numeric and can never be numeric by definition. You've never had a course or read a book on scales and measurements, have you? There is no such thing as a generic magical physical identifier, in RDBMS.

    CREATE TABLE Criminals

    (inmate_nbr CHAR(18) NOT NULL PRIMARY KEY, -- required, not optional

    penal_code CHAR(8) NOT NULL, -- no idea what this might be, but it's less vague than just "code"

    inmate_lastname VARCHAR(35) NOT NULL, -- postal Union rules

    inmate_firstname VARCHAR(35) NOT NULL,

    street_address_1 VARCHAR(35) NOT NULL,

    street_address_2 VARCHAR(35) NOT NULL,

    city_name VARCHAR(25) NOT NULL,

    state_code CHAR(2) NOT NULL,

    zip_code CHAR(5) NOT NULL);

    As I said this a quick rewrite. I did not put in any of the constraints that you would if you were doing a real table. But at least the table is table; it has a key and the datatypes are sensible. Let's go to your second table. Did you know we have a date data type? There is no reason to use the old Sybase DATETIME data type for the last 15 or 20 years.

    CREATE TABLE Criminal_History

    (inmate_nbr CHAR(18) NOT NULL

    REFERENCES Criminals (inmate_nbr),

    event_name VARCHAR (75) NOT NULL,

    PRIMARY KEY (inmate_nbr, event_name),

    event_occurence_date DATE NOT NULL,

    event_start_date DATE NOT NULL,

    event_end_date DATE NOT NULL,

    CHECK (event_occurence_date BETWEEN event_start_date AND event_end_date),

    CHECK (event_start_date <= event_end_date) );

    Please carefully study everything that I added to your DDL. Most important one is a references clause. This means that your two tables are part of an RDBMS schema and not unrelated, disjoint, separate files. They are related. Your SQL engine will handle a lot of the work that you would otherwise have to do by hand. After 35+ years working in SQL, I tell people that 80 to 90% of the work of an RDBMS is done in the DDL; what you have that right. The DML is trivial.

    Please note the CHECK constraints. The first one puts your event between the start and end time stamps of the event. The second constraint guarantees the event interval is correctly ordered. The optimizer will use this.

    I'm not trying to be nasty, but I'm also trying to write a book on why people write bad SQL. Can you give me some idea what your programming back out round was and why you thought this was the right way to code something.

    You can reply to me privately.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • jcelko212 32090 wrote:

    (inmate_nbr CHAR(18) NOT NULL PRIMARY KEY, -- required, not optional.

    Man... talk about a worse practice ever.ย  If you're going to do something this silly, use a GUID... at least it doesn't require a central authority to be generated as is 2 bytes shorter.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden wrote:

    jcelko212 32090 wrote:

    (inmate_nbr CHAR(18) NOT NULL PRIMARY KEY, -- required, not optional.

    Man... talk about a worse practice ever.ย  If you're going to do something this silly, use a GUID... at least it doesn't require a central authority to be generated as is 2 bytes shorter.

    No to mention the royal p.i.t.a. of having to make sure you always pad the column to exactly 18 chars with leading zeros -- gadzooks!

    Just a hideously pedantic approach to data typing.

    SQL DBA,SQL Server MVP(07, 08, 09) Prosecutor James Blackburn, in closing argument in the Fatal Vision murders trial: "If in the future, you should cry a tear, cry one for them [the murder victims]. If in the future, you should say a prayer, say one for them. And if in the future, you should light a candle, light one for them."

  • Jeff, have you noticed that most credit cards are 16 digits broken into groups of four? American Express or Diners Club or older credit card systems uses a little bit more; I think 20 for Amex (someone correct me, please). I pulled up 18 digits from memory (have to check on it. I guess after all these decades) from my time doing statistics for the Georgia prison system. The inmate numbers we used displayed only the first part of it, but it actually extended onto 18 or 20 digits. It was very important not to let ax-murdering cannibals out of prison before their time ๐Ÿ™‚ If you work with some of the international banking numbers, you'll find even longer codes. I assume that you are familiar with the VIN on your automobile. It changed a few years ago, but I think it's at least 19 characters now.

    The "G" in GUID stands for "global", and was intended for use in external data, never for local variables. I am seriously bothered by people who use GUIDs to mimic 1950s pointer chains in SQL. No validation, no verification, certainly not an attribute of any entity in the data model and they are oversized!

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • None of that actually has anything to do with the OP's table except that you've done the same unnecessary expansion of a column like you claim he did with the Zip Code column. ๐Ÿ˜‰

    So please tell me how your "Criminals" table and Credit Card Numbers have anything at all to do with what the OP posted.ย  And, seriously, you don't need the problems associated with an 18 byte character based PK... if it really is a number as you've labeled it, just use a BIGINT which will handle 18 Digits just fine.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Since all the people who have ever existed fit very comfortably into 12 digits, I can't imagine why you'd need 18 digits for your person id.

    SQL DBA,SQL Server MVP(07, 08, 09) Prosecutor James Blackburn, in closing argument in the Fatal Vision murders trial: "If in the future, you should cry a tear, cry one for them [the murder victims]. If in the future, you should say a prayer, say one for them. And if in the future, you should light a candle, light one for them."

  • UPDATE: Since getting back from being out sick I've not had a chance to catch up on all the feedback for this post but I did want to take a minute to provide some information so that the ongoing posts about how poorly a job I've done with the design of these tables can cease.

    1. I did NOT have any involvement with the creation of the DB's design. It was created by the vendor who's accounting software we use
    2. I do NOT have any access to modify the design nor add to it. I am limited to querying the data only

    Thank you

    Kindest Regards,

    Just say No to Facebook!
  • After all these decades. I don't remember why the inmate number was that long. I believe we display the first nine digits because that matches the Social Security number. The other digits, I believe, had to do with aliases, incarceration in other jurisdictions, etc. but again this is 40 years ago. I don't know if you believe this, and I'm sure will come as a shock, but criminals are dishonest and will lie about things. ๐Ÿ™‚

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • jcelko212 32090 wrote:

    After all these decades. I don't remember why the inmate number was that long. I believe we display the first nine digits because that matches the Social Security number. The other digits, I believe, had to do with aliases, incarceration in other jurisdictions, etc. but again this is 40 years ago. I don't know if you believe this, and I'm sure will come as a shock, but criminals are dishonest and will lie about things. ๐Ÿ™‚

    So it violated basic first normal form by putting two pieces of info in the same column.ย  Not to mention the horrific security implications of using an actual soc sec num in an id.ย  But, gee, it's not like known criminals would try to take advantage of that to pull off any fraud, huh?

    My first job (in programming) was at a big university.ย  They used SSN as the student id; EVERYTHING was keyed by it.ย  Royal pain when one changed and you had to update all those (in those days) master file keys.ย  And that happened reasonably frequently.

    SQL DBA,SQL Server MVP(07, 08, 09) Prosecutor James Blackburn, in closing argument in the Fatal Vision murders trial: "If in the future, you should cry a tear, cry one for them [the murder victims]. If in the future, you should say a prayer, say one for them. And if in the future, you should light a candle, light one for them."

  • >> I did NOT have any involvement with the creation of the DB's design. .. do NOT have any access to modify the design <<

    There is an old poem:

    " I do not drive the train. I do not ring the bell. Let the damn thing jumped the track, and see who catches hell"

    When you post, we have to assume that, much like a Bastard child, the last one we saw with it is responsible ๐Ÿ™‚

    Unfortunately, I found over the 30+ years I've been doing SQL, that 80 to 90% of the problems are the result of bad DDL. Very often you need to tell people to stop what you're doing, and tear everything apart and start over. Nobody wants this is an answer, even if it's true, so they keep looking for horrible kludge is and wonder why things get even worse

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • I almost hesitate to jump in at this point, but if I may.... I generally have an aversion to anything that's a triangular join.ย  Indexes aside, if I were to attempt to get the most recent history row for each person, I would do something like the following.

    Note: Rather than make another quoted reply, I took the script that Jonathan AC Roberts supplied on June 20, 2021 at 4:21 am.ย  Final queries at the end of that script:

    DROP TABLE IF EXISTS #t1
    DROP TABLE IF EXISTS #t2
    DROP TABLE IF EXISTS #t3

    -- FROM ORIGINAL SCRIPT:
    SET STATISTICS IO, TIME ON

    SELECT P.Code,
    P.LastName,
    P.FirstName,
    PH.*
    INTO #t1
    FROM dbo.tmpPERSON P
    LEFT JOIN dbo.tmpPERSON_HISTORY PH
    ON P.MyPerson = PH.hMyPerson
    AND PH.My = (SELECT MAX(PHX.hMyPerson)
    FROM dbo.tmpPERSON_HISTORY PHX
    WHERE PH.hMyPerson = PHX.hMyPerson)
    ORDER BY P.Code, PH.My
    GO

    SELECT P.Code,
    P.LastName,
    P.FirstName,
    PH.*
    INTO #t2
    FROM dbo.tmpPERSON P
    OUTER APPLY (SELECT TOP (1) *
    FROM dbo.tmpPERSON_HISTORY PH
    WHERE P.MyPerson = PH.hMyPerson
    ORDER BY OccurredDate DESC, My DESC) AS PH
    ORDER BY P.Code, PH.My
    GO

    -- MY TWO CENTS:
    ; WITH RankedData AS
    (SELECT P.Code,
    P.LastName,
    P.FirstName,
    PH.*,
    RowNum = ROW_NUMBER() OVER (PARTITION BY PH.hMyPerson ORDER BY PH.OccurredDate DESC)
    FROM dbo.tmpPERSON P
    LEFT JOIN dbo.tmpPERSON_HISTORY PH
    ON P.MyPerson = PH.hMyPerson
    )

    SELECT *
    INTO #T3
    FROM RankedData
    WHERE RankedData.RowNum = 1

    SET STATISTICS IO, TIME OFF

    Results indicate:

    Table 'tmpPERSON_HISTORY'. Scan count 13, logical reads 86894, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'tmpPERSON'. Scan count 13, logical reads 37611, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 36, logical reads 32000012, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:
    CPU time = 54668 ms, elapsed time = 12089 ms.

    (2000000 rows affected)
    SQL Server parse and compile time:
    CPU time = 0 ms, elapsed time = 2 ms.
    Table 'tmpPERSON_HISTORY'. Scan count 2000000, logical reads 6375300, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'tmpPERSON'. Scan count 13, logical reads 37763, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:
    CPU time = 15312 ms, elapsed time = 1647 ms.

    (2000000 rows affected)
    SQL Server parse and compile time:
    CPU time = 0 ms, elapsed time = 2 ms.
    Table 'tmpPERSON'. Scan count 13, logical reads 37711, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'tmpPERSON_HISTORY'. Scan count 13, logical reads 86854, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:
    CPU time = 37684 ms, elapsed time = 8493 ms.

    (2000000 rows affected)

    I welcome feedback from the gurus in this thread.

  • ROW_NUMBER in DESCending order, cool trick. Thank you SoCal_DBD. In spite being a newbie, I knew about this one, but hesitated to apply itย  due to possible slow execution. Live and learn...

  • I've noticed, and been impressed by, the overall great performance of the ROW_NUMBER() functions.

    SQL DBA,SQL Server MVP(07, 08, 09) Prosecutor James Blackburn, in closing argument in the Fatal Vision murders trial: "If in the future, you should cry a tear, cry one for them [the murder victims]. If in the future, you should say a prayer, say one for them. And if in the future, you should light a candle, light one for them."

Viewing 13 posts - 16 through 28 (of 28 total)

You must be logged in to reply to this topic. Login to reply