query puzzle

  • Hello - I have an existing SQL database that keeps track of student information.  We use a field called STU.NID to store a unique Network ID number (a 5 digit, sequential number).
    Iโ€™m trying to write a query that will determine if a student is missing a value for STU.NID.  If so, the studentโ€™s record should be updated with the next NID number in the sequence.

    At this point the query throws an error about the stu.nid in line three not being bound.  Is that because WHILE only works with variables?  Once I determine what the next NID is, how to update just one student with that one NID then check and go through the loop again?

    Iโ€™ve read that loops take up a lot of resources and should generally be avoided in SQL.  Can anyone suggest a non-loop solution to this puzzle?
    Thank you in advance for your feedback

    David

    DECLARE @nextNID smallint
    select NID from STU
    while stu.nid = ' '
    BEGIN
    set @nextNID = ((SELECT MAX(NID) FROM STU) + 1)
           select top 1 STU.ID, STU.NID
                  FROM STU
                         WHERE
                               STU.NID = ' ' and
                               stu.sc in (11,12,14,15,16,17,27,28) and
                               stu.tg = ' '
           update STU
                  set NID = @nextNID
    END

  • Yes, loops bad. Set-based good. 

    First for some sample data:
    IF OBJECT_ID('tempdb..#stu') IS NOT NULL DROP TABLE #stu;
    CREATE TABLE #stu(nid int NULL, student_name varchar(100) NOT NULL);
    INSERT #stu VALUES (1000, 'billy'), (1109, 'sue'), (NULL, 'fred'), (NULL, 'pete');
    GO

    Now we have:
    SELECT * FROM #stu;
    nid   student_name
    ----------- ------------
    1000   billy
    1109   sue
    NULL   fred
    NULL   pete

    Now for a loop-free solution:

    WITH
    maxid AS (SELECT maxid = MAX(nid) FROM #stu), -- get the highest nid value
    orphins AS
    (
    SELECT #stu.*, new_id = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))+maxid -- starting at the highest nid, add 1
    FROM #stu
    CROSS JOIN maxid
    WHERE nid IS NULL
    )
    UPDATE orphins
    SET nid = new_id; -- set the nid to be the new_id value

    Now we have:

    nid   student_name
    ----------- -------------
    1000   billy
    1109   sue
    1110   fred
    1111   pete

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Thank you for your reply, Alan.  I modified your query to populate the temp table with actual student ID numbers.   When I run, it does create the temp table, and it does include all of the student numbers but when it comes to the second part it doesn't populate any NIDs.  Then I realized that in your example you seeded the NID field with a couple of demo NIDs.  So, for the lowest ID# I set the NID = to 11111 and tried to run the second part (after the table creation) but it doesn't do anything, - no errors, just 0 rows affected.  Any suggestions?
    thanks again ๐Ÿ™‚
    David

    heres the updated script...


    IF OBJECT_ID('tempdb..#stu1') IS NOT NULL DROP TABLE #stu1
    CREATE TABLE #stu1
        (NID varchar(255) NULL,
        ID int NOT NULL)
    INSERT #stu1 (NID, ID)
        select NID, ID
        FROM STU
    GO

    WITH
    maxid AS (SELECT maxid = MAX(nid) FROM #stu1), -- get the highest nid value
    orphins AS
    (
    SELECT #stu1.*, new_id = ROW_NUMBER()
    OVER (ORDER BY (SELECT NULL))+maxid -- starting at the highest nid, add 1

    FROM #stu1
    CROSS JOIN maxid
    WHERE nid IS NULL
    )
    UPDATE orphins
    SET nid = new_id; -- set the nid to be the new_id value

    --***************************
    update #stu1
        set NID = 11111
            where ID = 1101700

    select top 3 * from #stu1

  • Alan.B - Friday, April 14, 2017 3:17 PM

    Yes, loops bad. Set-based good. 

    First for some sample data:
    IF OBJECT_ID('tempdb..#stu') IS NOT NULL DROP TABLE #stu;
    CREATE TABLE #stu(nid int NULL, student_name varchar(100) NOT NULL);
    INSERT #stu VALUES (1000, 'billy'), (1109, 'sue'), (NULL, 'fred'), (NULL, 'pete');
    GO

    Now we have:
    SELECT * FROM #stu;
    nid   student_name
    ----------- ------------
    1000   billy
    1109   sue
    NULL   fred
    NULL   pete

    Now for a loop-free solution:

    WITH
    maxid AS (SELECT maxid = MAX(nid) FROM #stu), -- get the highest nid value
    orphins AS
    (
    SELECT #stu.*, new_id = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))+maxid -- starting at the highest nid, add 1
    FROM #stu
    CROSS JOIN maxid
    WHERE nid IS NULL
    )
    UPDATE orphins
    SET nid = new_id; -- set the nid to be the new_id value

    Now we have:

    nid   student_name
    ----------- -------------
    1000   billy
    1109   sue
    1110   fred
    1111   pete

    This approach requires reading the table twice: once to find the max nid and once to find the null records.  Here is an approach that only requires reading the table once.

    ;
    WITH orphans AS
    (
        SELECT *,
            MAX(nid) OVER() + ROW_NUMBER() OVER(ORDER BY nid) AS new_nid
        FROM #stu
    )
    UPDATE o
        SET nid = new_nid
    FROM orphans o
    WHERE nid IS NULL
    ;

    I should mention that this working correctly depends on the fact that NULL values are sorted before non-NULL values.

    I also corrected the spelling of orphans.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • thanks again - i see the problem, in our database the NID column does not allow NULLS.  So I changed 

    WHERE nid IS NULL
    to
    WHERE nid = ' '
    and it works great!  
    thank you for your help ๐Ÿ™‚

    David

  • David Walker-409102 - Friday, April 14, 2017 3:02 PM

    >> I have an existing SQL database that keeps track of student information. We use a field .sic. called STU.NID to store a unique network_id number (a 5 digit, sequential number). <<

    Please look up what the term "field" means in SQL. You've confused with a column. Why are you still writing in all uppercase and using short names? I used to do that back in the 1960's when we were Fortran programmer and we had limited space on punchcards. You also don't know a row is not anything like a record! That usually covered the first week of any decent SQL class. I also see you expect us to write the DDL that you fail to post; this is not just counterproductive, it's very very rude.

    >> I’m trying to write a query that will determine if a student is missing a value for STU.NID. If so, the student’s record [sic] should be updated with the next NID number in the sequence. <<

    Since you didn't post any DDL, we don't know for using the CREATE SEQUENCE statement in your code. Again, sequence has a special meaning in SQL.

    What is really worse is that you don't understand that, SQL is a declarative language. Declarative languages do not use while loops; that was your old COBOL and FORTRAN stuff which is basically what you're trying to write.

    CREATE SEQUENCE network_id_seq
    AS SMALLINT
    START WITH 00000
    INCREMENT BY 1
    MINVALUE 00000
    MAXVALUE 99999;

    CREATE TABLE STU -- what is a STU?
    (..
    network_id SMALLINT NOT NULL
     CHECK (network_id BETWEEN 00000 AND 99999)
    DEFAULT NEXT VALUE FOR network_id_seq,
    ..

    Having said all of this, the use of numeric values for identifiers is wrong. Get any book on basic data modeling. We do not do math on identifiers; what is the square root of your credit card number? With a little work. You can cast the small integer into a string of five digits (these things are called tag numbers in data modeling) and use it as default in your STU table. By the way, what the heck is an STU? I'm assuming that it's a meaningful name in your trade, but it really doesn't mean much to anybody else. Professionals write code that can be maintained by newcomers who just to the job; amateurs write code that is cryptic and gives them job security ๐Ÿ™‚

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • David Walker-409102 - Monday, April 17, 2017 4:32 PM

    thanks again - i see the problem, in our database the NID column does not allow NULLS.  So I changed 

    WHERE nid IS NULL
    to
    WHERE nid = ' '
    and it works great!  
    thank you for your help ๐Ÿ™‚

    David

    Is your nid column (N)(VAR)CHAR?  This would cause implicit conversion when adding the row number and another when converting back to a string.  This could cause a conversion error if your nid contains any value that is non-numeric.  You could also run into problems where the nid value is unique, but the converted value is not, e.g., '1', '01', and '001'.

    Drew

    PS: Ignore Joe on this.  He's overly concerned with minutiae.

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • jcelko212 32090 - Tuesday, April 18, 2017 10:36 AM

    David Walker-409102 - Friday, April 14, 2017 3:02 PM

    >> I have an existing SQL database that keeps track of student information. We use a field .sic. called STU.NID to store a unique network_id number (a 5 digit, sequential number). <<

    Please look up what the term "field" means in SQL. You've confused with a column. Why are you still writing in all uppercase and using short names? I used to do that back in the 1960's when we were Fortran programmer and we had limited space on punchcards. You also don't know a row is not anything like a record! That usually covered the first week of any decent SQL class. I also see you expect us to write the DDL that you fail to post; this is not just counterproductive, it's very very rude.

    >> I’m trying to write a query that will determine if a student is missing a value for STU.NID. If so, the student’s record [sic] should be updated with the next NID number in the sequence. <<

    Since you didn't post any DDL, we don't know for using the CREATE SEQUENCE statement in your code. Again, sequence has a special meaning in SQL.

    What is really worse is that you don't understand that, SQL is a declarative language. Declarative languages do not use while loops; that was your old COBOL and FORTRAN stuff which is basically what you're trying to write.

    CREATE SEQUENCE network_id_seq
    AS SMALLINT
    START WITH 00000
    INCREMENT BY 1
    MINVALUE 00000
    MAXVALUE 99999;

    CREATE TABLE STU -- what is a STU?
    (..
    network_id SMALLINT NOT NULL
     CHECK (network_id BETWEEN 00000 AND 99999)
    DEFAULT NEXT VALUE FOR network_id_seq,
    ..

    Having said all of this, the use of numeric values for identifiers is wrong. Get any book on basic data modeling. We do not do math on identifiers; what is the square root of your credit card number? With a little work. You can cast the small integer into a string of five digits (these things are called tag numbers in data modeling) and use it as default in your STU table. By the way, what the heck is an STU? I'm assuming that it's a meaningful name in your trade, but it really doesn't mean much to anybody else. Professionals write code that can be maintained by newcomers who just to the job; amateurs write code that is cryptic and gives them job security ๐Ÿ™‚

    I would not be surprised if this is a system he did not set up but inherited and it could even be a 3rd party thing that is not controlled by him.  Where I work, I have a table in a system called BM010415.  That may not mean much to you, but to our 3rd party software, it has a lot of meaning.  Renaming the table to something I understand will cause the software to break and since it is 3rd party, I cannot re-write it to fix that.
    The first sentance in the post (after Hello) indicates that it is a Student database.  I would thus assume that STU is student.  Pretty safe assumption, no?

    I was wondering about that "CREATE SEQUENCE" statement that you used though.  I had not seen that in SQL Server 2008 (nor any other version of SQL Server) and got curious and thought I'd try to learn something new.  I ran off to SSMS, connected to a SQL 2008 R2 instance and tried it only to have it fail.  Am I missing something?

    Also, what is wrong with using an integer or numeric for an ID?  We frequently use that where I work and I see it in a lot of 3rd party software as well.  An ID column of datatype INT IDENTITY NOT NULL.  And it would have an identity parameters of (0,1).  If there is going to be a lot of inserts without truncates, then we sometimes use bigint.  Or if there are going to be too many inserts without being able to truncate it, then we will use a GUID and not use an IDENTITY on it.

    Going back to the original question, I agree with drew.allen about the (N)(VAR)CHAR issue.  We have a table which has a "result" column that is USUALLY ints, but sometimes has characters.  I have written some bad queries with things like "WHERE charCOL = 12345" and it'll work during development (due to the smaller data set that is in our test system) but fail on live.  Mixing data types (and doing implicit conversion) is risky.

    The above is all just my opinion on what you should do.ย 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.ย  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply