Practice SQL querying with dates and aggregations

  • jcelko212 32090 wrote:

    But, by definition, an identifier is on a nominal scale and nominal scales are never numeric.

    That's really a mistake on your part to think so in the world of RDBMSs, Joe.  Of course, we've had this discussion before so I'm not posting here for your benefit.  I'm posting here so others won't take that comment as some sort of gospel.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff - I am going to create another thread on this issue. Could you please do me a favor and NOT participate? I would really appreciate it.

     

    • This reply was modified 3 years, 4 months ago by  Lord Slaagh.
    • This reply was modified 3 years, 4 months ago by  Lord Slaagh.

    Everyone has a plan until they get punched in the mouth. --Mike Tyson

  • @Lord Slaagh ,

    You asked "Jeff - Do you have any challenge queries that you can provide for practice?"

    Yes I do.  Anywhere from "pretty simple" to "OMG! How can this possibly be done?"  The question is, what position are you applying for and what level do you consider yourself to be with temporal queries.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Lord Slaagh wrote:

    Jeff - I am going to create another thread on this issue. Could you please do me a favor and NOT participate? I would really appreciate it.

    Your original reply said that I turned it into a "pissing contest" and that you were pissed off that it was turned into an "ego contest".  I'm really sorry that you took it that way because that's not how it was intended.  I'll be happy to honor your request, though.  Good luck with your interview.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Lord Slaagh wrote:

    Scott - I agree with Jeff in the sense that this test is somewhat unrealistic because you have CTE's and sub queries at your disposal. However, if this is a test to take the candidate out of their comfort zone and landing the job does not depend on the right answer to this specific query, then this could be a one off question. Perhaps you can ask the candidate to write this query  with and without CTE's and sub queries? That would be a nice test.

    Jeff - Do you have any challenge queries that you can provide for practice? I have all versions of Adventure Works DB's if that helps. I would really appreciate it and thank you both for preparing me for this interview!

    No, the test is to see if (1) you can write a query simply, without the unnecessary hedge of a cte (no cte is needed here at all) and (2) focus on the core of the q, the query itself, rather than fret about having to write a single query or other irrelevancies?  Might fretting about ctes and other restrictions make one wonder if you have difficultly writing even a single query without use of ctes and other added structures?

    If an applicant insisted that the q setup was terribly wrong because they know better than I do, I would very definitely consider that long and hard before I hired such a person.  Time in interviews in limited, and frankly I don't want to waste it hearing you expound on your views for a proper interview q.

    Yes, I provide a test table structure (no data).  Very sorry for not providing a table to go with the test.

    I don't want DDL from the applicant, just the SQL query.  This is a q for a developer who has some degree of experience, so one assumes they can determine the table structure from the q or query the system itself for the info (again, sorry I didn't provide a table def so you could do that).

    Yes, assignment_number is unique.  Presumably that could be implied, but if not, that's a valid q, I guess, albeit still irrelevant.

    However, how the assignment_number is assigned is not.  Again, the staying focused, not wasting time thing about an interview.  But, if a person truly insists on knowing: assignment_number is a SEQUENCE, and every test or assignment given by the teacher (any teacher perhaps) to any student is assigned the next assignment_number.  Ok, go ahead and lecture me now on how wrong this method of assigning numbers is.  I still consider that irrelevant to writing the query.

    Btw, this comes from a real-life example when I was teaching a db class at college at night.  One of the other instructors needed help averaging out her test scores.

    • This reply was modified 3 years, 4 months ago by  ScottPletcher.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • As a drastic example of what I've talking about, say you asked an applicant to list all the columns from the table.  Say the table has key_col, data_col1, data_col2.  No doubt you'd be expecting:

    SELECT * FROM table_name --bit lazy
    --or
    SELECT key_col, data_col1, data_col2 --best(?) response

    --What if instead they used 1 or 2 ctes? Say like this?:

    WITH cte1 AS (
        SELECT key_col, data_col1
        FROM table_name
    ),cte2 AS (
    SELECT key_col, data_col2
    FROM table_name
    )
    SELECT cte1.key_col, cte1.data_col1, cte2.data_col2
    FROM cte1
    INNER JOIN cte2 ON cte2.key_col = cte1.key_col

    --Is that query just as good as the original queries, since it does answer the q?

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Jeff Moden wrote:

    ScottPletcher wrote:

    This is the main test q I use to test query-writing skill.  It's not as trivial / obvious as it sounds at first.

    A table contains 1, 2 or 3 rows for each student, each row with one test score.  For example: student_scores ( student_id int NOT NULL, assignment_number int NOT NULL, score int NOT NULL )

    Write a single query (no ctes, joins, subqueries, etc.) that lists the following:

    student_id, score_count, avg_score, highest_score, middle_score, lowest_score

    If there are fewer than 3 scores, list from left-to-right (i.e., highest first, then middle, finally lowest), leaving missing scores NULL.

    Quick question... when you provide this question, do you provide any test data to go with it or do you leave that up to the interviewee like what might happen in real life?

    In the latter case, do you allow for the interviewee to ask clarification questions?  For example, what is the domain of "assignment_number".  Is it sequential with no gaps starting at 1 for each student?  If not, what is the possible domain of numbers for that column?  I'd ask the same question if such requirements were given to me in real life.

    Also, since the problem description contains no defined PK nor a Clustered Index (a favorite issue of yours and I'm surprised to see this omission in your problem description), as a person taking this test I'd have to ask if the combination of student_id and assignment_number form a unique combination or not.  Do you allow your interviewees to ask for/discuss such things?

    ScottPletcher wrote:

    Write a single query (no ctes, joins, subqueries, etc.) that lists the following:

    Really?  So you're not interested in finding someone that might have an innovation that might blow the status-quo solution out of the water?  I went through that during an interview once... it wasn't a written test but a discussion.  The interviewers were hell bent on an incremental rCTE being the correct answer and they argued vehemently towards it.  When I explained both why incremental rCTEs are a bad idea and that the use of a Tally table or function is a highly efficient replacement in the given problem , they thought it was a preposterous pipe dream and wouldn't even allow me to explain how it works.  Even though (at the time) I actually needed a job (company layoff first time in my life), I explained why I was going to leave, thanked them for their time, and left because I didn't want to work in a place where innovation is stifled by such a ridiculous posture of the interviewers.

    During that same short out-of-work adventure, I interviewed with another company and the Lead DBA was "too busy" to part in the interview process.  That's a first troubling sign.  I answered a whole bunch of questions and the final question was "Where is the one place where you wouldn't want to call a stored procedure even if you could"?  What they were looking for was "in a function", which I didn't get because... guess what?  I know how to call stored procedures from a function and it's absolutely the right thing to do in certain cases and so I'd have never guess they were looking for that answer.  It also turned out that was the make-it-or-break-it question for the interview... they didn't accept anyone that didn't answer "in a function".

    The fun part was that some good time after that, I ran into the new Lead DBA that replaced the old one and we had a discussion about that question and that's when he told me the answer was "in a function".  It turns out the old Lead DBA was fired.  I explained the question and, thinking about it, explained how I actually do use stored procedure calls in functions and why.  He was totally gob smacked and said "That's a part of why the old DBA was let go... he suppressed that type of innovation".

    I provide the table structure, that's it.  They are in SSMS session, so they can use whatever SSMS tools there are to view that structure.

    As for your qs on assignment_number, they are irrelevant to writing the query.  Finding the high, middle and low score of those rows for a given student have nothing to do with assignment_number.  So why would you care?  And more importantly, why would you waste my interview time asking me about it?  I.e., if you want to wonder to yourself about it, go ahead (although, again, why would you care), but have the self-restraint not to take away valuable interview time prattling to me about it.

    Part of the purpose of an interview is to see if you can stay goal focused and ignore extraneous matters (particularly when time is of importance).

    You literally can't execute a (standard) stored proc from within a function.  And I can't really imagine why you'd ever really need to.

    I guess, maybe, I've exec'd an extended stored proc from within a function, but it would have to have been an extraordinary situation.  I guess back in SQL 7.0 (shudder) or SQL 2000 days we might have to do that to get something done that otherwise wasn't easily doable then.

    I'd be interested in having you relate to me how you exec'd a proc from a function -- and it sounds like you're talking about a standard db proc -- and what value it provided in that context.

    When I was being interviewed at one place, they were not SQL Server higher-level DBA folks (one of the reasons they were hiring me).  They asked me about how to track/debug something, and my answer was different, and thus supposedly "wrong".  They said to use Profiler when in fact newer and better methods were available.  I simply moved on with the interview.  At the end, when they inevitably give you the "Do you have any questions for us?", that is when I brought up the issue.  "Have you ever considered looking at xxx rather than Profiler to do yyy?  That's considered preferred now, since 20xx came out."

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Jeff Moden wrote:

    jcelko212 32090 wrote:

    But, by definition, an identifier is on a nominal scale and nominal scales are never numeric.

    That's really a mistake on your part to think so in the world of RDBMSs, Joe.  Of course, we've had this discussion before so I'm not posting here for your benefit.  I'm posting here so others won't take that comment as some sort of gospel.

    I would hate having to be the DBA going to a place the Celko had designed / created.  What a nightmare of missing leading zeros or added leading zeros that should be missing or non-digit chars on all the character identifiers :shudder:.  You'd need a few drinks every night to relax from that nightmare.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • I have a major problem with ego-maniacs that like to hijack meetings. Which is what we had here. I asked for a few practice tests so I can prepare for my interview, and as always, the ego-maniac jumps in and starts trying to make everyone look bad. I call these people energy vampire's. I have struggled in my career to avoid falling into the energy vampires trap, but time and time again, it happens. I should not have responded to this post, but yet I did, which means that the EV won and I lost again. The energy vampire is intelligent, but he uses his smarts to set people up so he can look good in front of his boss and higher ups instead of trying to be a team player.

    The energy vampire loves causing shit in meetings so he can force people into a debate. This is my biggest challenge. At this stage all of my good energy is consumed by some jack-wagon telling me that I need to learn how to write DDL scripts. These people are toxic to team environments. Who are you trying to impress? Are you angry because you are out of work? Are you getting old like the rest of us and are afraid that you might not have a job tomorrow? Teamwork is what YOU need to learn, not DDL scripts and fancy SQL coding.

    Forget about the practice tests... My new question is... "How do you co-exist with an energy vampire who is just waiting to throw you under the bus?" Perhaps I should focus on that?

     

    Everyone has a plan until they get punched in the mouth. --Mike Tyson

  • Lord Slaagh wrote:

    Scott - If you are not looking for the test results to be listed in a string, then I think this is what you are looking for.

    create table student_scores 
    (
    student_id int NOT NULL,
    score_date date NOT NULL,
    score int NOT NULL
    );

    insert into student_scores VALUES (1,'11/27/2020',80);
    insert into student_scores VALUES (1,'12/01/2019',60);
    insert into student_scores VALUES (1,'09/27/2018',95);
    insert into student_scores VALUES (2,'11/27/2020',80);
    insert into student_scores VALUES (2,'06/24/2019',100);
    insert into student_scores VALUES (3,'07/04/2020',65);
    insert into student_scores VALUES (3,'06/25/2019',82);
    insert into student_scores VALUES (3,'10/17/2018',76);
    insert into student_scores VALUES (4,'10/17/2018',70);



    select student_id
    ,count(score) as score_count
    ,avg(score) as avg_score
    ,max(score) as highest_score
    ,case when count(student_id) <3 then NULL else (sum(score)/count(*)) end as middle_score
    ,case when count(student_id) <2 then NULL else min(score) end as lowest_score
    from student_scores
    group by student_id;

    Nice, you've made some real progress here, adding the all-important (to this query) CASE.

    count and max scores look good.

    avg might take some more thought.  You might consider the avg having a decimal place, as that is fairly common with avgs (and can show when it might be a good time to round up the student's avg).

    The case for MIN score should be < 3 rather than < 2.  Remember, values are listed left to right, so all three values are needed to have a value to supply for MIN.

    The middle score is the top prize here, and it takes some thought and maybe ingenuity to get it.  Walk thru this in your mind.  Hmm, if there are only two scores, then the MIN score must become the middle score.

    Just let me know when you'd like me to post my full, preferred query for this.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • >> would hate having to be the DBA going to a place that Celko had designed / created. What a nightmare of missing leading zeros or added leading zeros that should be missing or non-digit chars on all the character identifiers :shudder:. You'd need a few drinks every night to relax from that nightmare <<

    Scott, it would actually work the opposite way. In my world, an identifier would be a string with check constraints, etc. Let me give an actual example from something which I ran into many years ago.

    CREATE TABLE Personnel

    (emp_ssn BIGINT NOT NULL PRIMARY KEY,

    ..);

    Using an INTEGER or BIGINT for an identifier that consist of digits is a pretty common design error. What it should have been is:

    CREATE TABLE Personnel

    (emp_ssn CHAR(9) NOT NULL

    CHECK (emp_ssn LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'

    PRIMARY KEY,

    ..);

    Originally, the Social Security number was a bit more complicated being divided into three fields. Some rules about the allowed values. That changed a few years back, and not any nine digits can be a Social Security number (I think it's because we had too many duplicates, and too many fake numbers in the system. It might even be because we finally ran out of numbers under the old syntax. If anybody knows I'd like to hear about it)

    The regular expression makes it impossible not to put in nine digits. But it was declared as a big integer, it's quite easy. The same logic applies to ZIP Codes and the majority of common ANSI/ISO standard encodings.

    I have urged a fixed-length encoding for decades so that we can predict how it will display on screens or forms. I also strongly believe in regular expressions for validation. But with integers or just loose strings of characters, validation is nearly impossible. The worst example of his bad design which I've seen is the British postal codes. They are based on the names of old post offices which no longer exist, specially reserved string for offices within the British government, etc. I've actually seen a five-line regular expression to validate this scheme, but most people just go ahead and use table lookup if they have to. The UK has added a five-digit "ZIP Code look-alike" for bulk mailing because their system is so bad.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • Here's another q I like to use:

    "Bad data is getting into a prod table from an app.  Developers have full data modification authority to that data and currently fix it by hand.  What method(s) would you recommend to correct this problem?"

    In this case, I don't mind an applicant's comment that "developers should not have full DELETE / INSERT / UPDATE authority to prod data."  That's quite true and serious enough to be a valid point.  But in this case, accept what is and explain what you would do to avoid bad data -- or least be made aware of it when it happens -- in the future.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Lord Slaagh wrote:

    I have a major problem with ego-maniacs that like to hijack meetings. Which is what we had here. I asked for a few practice tests so I can prepare for my interview, and as always, the ego-maniac jumps in and starts trying to make everyone look bad. I call these people energy vampire's. I have struggled in my career to avoid falling into the energy vampires trap, but time and time again, it happens. I should not have responded to this post, but yet I did, which means that the EV won and I lost again. The energy vampire is intelligent, but he uses his smarts to set people up so he can look good in front of his boss and higher ups instead of trying to be a team player.

    The energy vampire loves causing shit in meetings so he can force people into a debate. This is my biggest challenge. At this stage all of my good energy is consumed by some jack-wagon telling me that I need to learn how to write DDL scripts. These people are toxic to team environments. Who are you trying to impress? Are you angry because you are out of work? Are you getting old like the rest of us and are afraid that you might not have a job tomorrow? Teamwork is what YOU need to learn, not DDL scripts and fancy SQL coding.

    Forget about the practice tests... My new question is... "How do you co-exist with an energy vampire who is just waiting to throw you under the bus?" Perhaps I should focus on that?

    Heh... lordy.  Ok... just to be sure, I was trying to help while I was waiting for you to tell me what position you were applying for so that I could tailor some of the questions for such a position, which you still haven't done.  If I actually were an ego-maniac that had thrown you under the bus, you'd still be under the bus. 😉

    You don't have to worry about me on the other thread you said you were going to create.  Have fun prepping for and good luck on your interview.  I do, however, suggest that you leave the attitude you're currently displaying at home.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ScottPletcher wrote:

    I provide the table structure, that's it. They are in SSMS session, so they can use whatever SSMS tools there are to view that structure.

     

    Ah... therein lies the key.  Didn't know it was going to be an actual SSMS sesssion.  Thanks for the feedback, Scott... totally agreed.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • jcelko212 32090 wrote:

    >> would hate having to be the DBA going to a place that Celko had designed / created. What a nightmare of missing leading zeros or added leading zeros that should be missing or non-digit chars on all the character identifiers :shudder:. You'd need a few drinks every night to relax from that nightmare <<

    Scott, it would actually work the opposite way. In my world, an identifier would be a string with check constraints, etc. Let me give an actual example from something which I ran into many years ago.

    CREATE TABLE Personnel (emp_ssn BIGINT NOT NULL PRIMARY KEY, ..);

    Using an INTEGER or BIGINT for an identifier that consist of digits is a pretty common design error. What it should have been is:

    CREATE TABLE Personnel (emp_ssn CHAR(9) NOT NULL CHECK (emp_ssn LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]' PRIMARY KEY, ..);

    Originally, the Social Security number was a bit more complicated being divided into three fields. Some rules about the allowed values. That changed a few years back, and not any nine digits can be a Social Security number (I think it's because we had too many duplicates, and too many fake numbers in the system. It might even be because we finally ran out of numbers under the old syntax. If anybody knows I'd like to hear about it)

    The regular expression makes it impossible not to put in nine digits. But it was declared as a big integer, it's quite easy. The same logic applies to ZIP Codes and the majority of common ANSI/ISO standard encodings.

    I have urged a fixed-length encoding for decades so that we can predict how it will display on screens or forms. I also strongly believe in regular expressions for validation. But with integers or just loose strings of characters, validation is nearly impossible. The worst example of his bad design which I've seen is the British postal codes. They are based on the names of old post offices which no longer exist, specially reserved string for offices within the British government, etc. I've actually seen a five-line regular expression to validate this scheme, but most people just go ahead and use table lookup if they have to. The UK has added a five-digit "ZIP Code look-alike" for bulk mailing because their system is so bad.

    SSN is a different thing, and the vast majority of places I've been also use char(9) with a CHECK to store it.  Same for zip codes, they are stored as char(5) or char(6) or char(9) or varchar(9) or [var]char(whatever), not as a numeric data type, particularly since not all zip codes (i.e. those outside the U.S.) are numeric.

    Customer_id is what should be an int (or bigint if you could potentially have that many customers over time) not char.  Char is too many wasted bytes and much higher overhead on joins.  Not to mention the massive headaches if it was defined as char(9) and you needed to make it char(10).

    As data gets moved around, someone, somewhere will define it as int.  It may be someone exchanging data with your company, and not  your company itself.  Then you get into the massive pain of having to converting the ints they {EDI or JSON'or whatever} into you to char and prefix with the appropriate number of '0's.  What a royal pain, and for no gain!  In fact, a loss, as you have extra overhead by insisting on using chars.  No thanks.

    No one expects to do math on customer_ids, but that's not the point.  Often practicality wins out when going from a logical design to a physical one.  Not just on this, but when, say, values are de-normalized and stored in additional tables and/or totals which can be derived are directly stored instead.  The real world sometimes doesn't hew strictly to theory.

    • This reply was modified 3 years, 4 months ago by  ScottPletcher.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 15 posts - 16 through 30 (of 36 total)

You must be logged in to reply to this topic. Login to reply