Coding Questions for Advanced DB Developer Candidates

  • I was recently re-reading an old article on conducting programming interviews, when I came across this gem:

    "Serge Lang, a math professor at Yale, used to give his Calculus students a fairly simple algebra problem on the first day of classes, one which almost everyone could solve, but some of them solved it as quickly as they could write while others took a while, and Professor Lang claimed that all of the students who solved the problem as quickly as they could write would get an A in the Calculus course, and all the others wouldn’t. The speed with which they solved a simple algebra problem was as good a predictor of the final grade in Calculus as a whole semester of homework, tests, midterms, and a final."

    I thought this sounded great, so I came up with a coding problem which I believe anyone should be able to solve as quickly as they can write if they have been doing this job for ten years:

    Coding Problem:  Given the data entry table below

                    CREATE TABLE dbo.DataEntry
                    ( RecordID                           INT IDENTITY      PRIMARY KEY
                    , DataEntryColumnA       VARCHAR(30)    NULL
                    , DataEntryColumnB       VARCHAR(30)    NULL
                    , DataEntryColumnC       VARCHAR(30)    NULL
                    , DataEntryColumnD       VARCHAR(30)    NULL
                    , DataEntryColumnE        VARCHAR(30)    NULL
                    , DataEntryUser                SYSNAME            NOT NULL           DEFAULT( SUSER_SNAME() )
                    , DataEntryDate                DATETIME           NOT NULL           DEFAULT( GETDATE() )
                    );

    Write a query to remove duplicate data entry records.  Where there are duplicates, be sure to retain the first entry.

     

    Can anyone suggest a similar coding question for an interview?  It should be something beyond the typical SELECT query, but that solves a common problem that an experienced candidate should have solved many times in their career.

    You need only post the question, not the answer.

  • This was removed by the editor as SPAM

  • Using the same data table, return the most recent non NULL value for each data entry column for each user.   I would solve this by UNPIVOTing the data, filter out NULL values, group by User, DataEntryColumn, MAX(ID) [assuming IDs are entered in DataEntryDate.  Then join this back to the unpivoted data to get the value of the DataEntryColumn and then PIVOT back into original format (dropping the DataEntryDate as it is no longer relevant)

     

     

  • Using the same data table, return the most recent non NULL value for each data entry column for each user.   I would solve this by UNPIVOTing the data, filter out NULL values, group by User, DataEntryColumn, MAX(ID) [assuming IDs are entered in DataEntryDate].  Then join this back to the unpivoted data to get the value of the DataEntryColumn and then PIVOT back into original format (dropping the DataEntryDate as it is no longer relevant)

     

     

  • OK, that is cool, but I am not looking for an answer to this coding problem.  This is a coding problem that I am using for interviews.  I am looking for different coding problems that I also can use for interviews.  I want coding problems that are similar to this one, in that they represent common situations that an experienced database engineer will have run into many times and which are simple enough to solve in a few minutes.

    Nevermind.  I just read the reply more closely. (facepalm)

  • Personally, coding problems in interviews is usually a red flag for me. You may end up asking questions that someone doesn't do frequently and you MAY filter out a really good candidate. For example, if you ask a question around "MERGE" or "PIVOT/UNPIVOT" to someone who doesn't use those frequently, you may lose out a candidate who is really good at doing set based data manipulation and accidently hire someone who has a love for CURSORs and NOLOCK.

    If I was hiring for a Jr. position, I would want them to be able to understand the basics such as the difference between the different join types, but that can also be easily googled, so as long as they are thinking about the problem correctly and looking to solve the problem with a set based solution, I can teach and train that Jr. to be a Sr.

    Now if I am hiring for a Sr. level position, I don't want to try to stump them on queries as that isn't the point of the interview. I may show them some code and an error message and ask how they would start troubleshooting it, but I'm not giving them a table and asking them to write queries by hand against it. If I am applying for a Sr. level position as a developer and you want me to write code, my first thought is that you are having a problem and I am coming in to be "free help" and I don't really want a job where I would be working with people like that anyways. Now, a better interview question is to get the interviewee "thinking" about the problem. Like ask them how they would plan to write a query to get the data and what sort of questions they would ask the end user before getting the data. That "plan" is the important part of the question because depending on what it is going to be used IN will determine how I write the query. Like is this a "one-off" and will never be used again or is this a repeated query and should be persisted to disk? Also, in your above example, what do you consider a "duplicate data entry"? Does just 1 column have to have a duplicate value or is it 2 identical rows? I would want clarification on what you consider a duplicate data entry before I start writing the query. Plus, what column are you using to determine what is "first"? Your question, while it MAY seem easy to you, has a lot of unknowns to someone who doesn't know the data and without knowing the data, you can end up with a lot of different answers that are all technically correct, but not what you were looking for. For a Sr. level developer position, I would HOPE that they are asking all the right questions and then providing a verbal explanation of how they would get the data, and for a Sr. level position, I would HOPE that they would be asking about data volume and expected data growth too so that they show they are planning for the future not solving an immediate problem. And even then, at a Sr. level, I would want them to ask things like "how are you handling this currently?" and "what problem do you hope to solve with this?" and "is there a workaround while we develop this?". Things I ask end users all the time.

    Now, if you were hiring me as a DBA, I would be annoyed that you are asking me "developer" level questions for a DBA position and I'd definitely turn down the position as developer and DBA are 2 completely different roles and if you are merging them into 1, I would want to be compensated nicely for working 2 jobs. The ONLY exception to asking "developer" related questions to a DBA (in my mind) is if you are looking for how they would tune the query. Like what is their "performance tuning" plan for a query? There are TONS of ways to do performance tuning and it really depends on the system usage. Like is the query slow but used 1 time per year? then I probably wouldn't waste time trying to tune it. If the query run time is 1 second and is run millions of times per day, getting that to 0.5 seconds would be a pretty big win. Then reading the execution plan, reviewing the code, reviewing indexes used and missing, reviewing data volume, etc... lots of places to look AND nice DBA related question.

    The TL;DR version - if I am hiring an "Advanced DB Developer" (which I read as a Sr. Database Developer), I am not going to throw code at them. I want to know how they will work with the end users to get the answers to the questions being asked. If they made it to the interview, then I already trust their resume, so I want to know how they "think". I want to ask questions to see their thought process. If they go straight to code when I ask them a question, they are not a good candidate as they will provide answers that don't meet the questions intent. I have a TON of end users who ask questions that are EASY to misinterpret and result in reports giving bad data IF you don't ask the correct questions. Jr. level you can train them to ask the correct questions. Advanced/Sr. level they should be able to ask the questions without coaching.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Mr. Brian Gale wrote:

    Personally, coding problems in interviews is usually a red flag for me. You may end up asking questions that someone doesn't do frequently and you MAY filter out a really good candidate. For example, if you ask a question around "MERGE" or "PIVOT/UNPIVOT" to someone who doesn't use those frequently, you may lose out a candidate who is really good at doing set based data manipulation and accidently hire someone who has a love for CURSORs and NOLOCK.

    To Brian's good point above...

    I would most certainly fail for coding problems if I were required to use MERGE, PIVOT, or UNPIVOT because I totally avoid them.  MERGE was fraught with errors when it first came out and I've lately seen the mess it makes of things in some third party software that we have.  PIVOT has gotten faster on single column pivots but still hits the table separately for every column that's pivoted.  It also can't easily do things like give you a row total and returning zeros instead of nulls is a pain.  Then try to add column totals.  Most people I work with also avoid UNPIVOT, as well.  Also, you cannot use PIVOT in things like Indexed Views, which is an incredible, very high performance, pre-aggregation tool.

    If I'm allowed to explain all of that and then show them how to do it with the likes of a good ol' fashioned "UPSERT" for the MERGE question an the how and why to use CROSSTABs for PIVOTs and Table Valued Constructors instead of UNPIVOT, etc, then I'd likely be hired.  I can even provide them with the PowerPoint presentation and the demo code that I give on the subject.

    When conducting code interviews, ask people how they would solve a problem and don't require them to use YOUR idea of what the correct answer should be.  As Brian states, you may be inadvertently be eliminating a coding Ninja.

    Other similar examples include telling someone to write a recursive CTE to do the FizzBuzz problem or to use a Calendar table to expand date ranges for each row of a table that has a start and end date in it or someone saying that you can't use a function to solve a given problem because they cause performance issues. 😀  Heh... or how about someone that marks your coded answer to combine a DATE and TIME column because you didn't first convert them to DATETIME and then just add them together. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yep - I agree with that. Personally, having a non-technical recruiter ask screening questions for a Sr. level job is never going to go over well. I mean I wouldn't ask my mechanic to build me a house; that's just crazy.

    For a Jr. level position I can see some questions that you could ask, but if a recruiter was asking me questions like "what are the different join types?" for a Sr. level position, it would be a red flag for me even applying for the job unless I was desperate for work because the question is something a Sr. level DB developer should already know. Any questions I could have for a recruiter would be too trivial for a Sr. level employee or would be too complicated for the recruiter to understand even if I had the answer written down for them. Like Jeff said, there is more than 1 way to do a task. If I asked a question and was looking for them to say that they would use PIVOT (for example) and they came back with CROSSTAB or temp tables or CTE's or table variables or some other voodoo, their answer MAY still be correct and how will you get the recruiter to know if it is correct or not?

    Mind you, you said that 90% of your staff would fail your quiz is a bit alarming to me. That would mean to me that I would want to re-train (in house or externally) some of the staff. I know one former developer I worked with was more of a C#/VB developer and their SQL was a bit weak. They could make the SQL do what they wanted, but it wasn't always the most efficient. They would use cursors in their code and the code worked, gave the results they wanted, and with the tight timelines for release, sometimes their code went live with a note about technical debt. Then a project came along that didn't have as tight of a deadline and I worked with them to show them how while the cursor gave them the results they wanted, doing it without the cursor was 100 times faster.

    If MOST your team would fail your recruitment quiz, I wonder if you would get a team performance boost by getting some training in place and you MAY not need the extra person you are looking to hire?

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • You both make good points.

    However, what I am looking for is a simple problem I can pose to a candidate of the sort that they surely have had to solve if they truly have 10+ years of diverse experience working with SQL Server.

    I believe the sample I gave meets those criteria.  I have had to delete duplicate records or filter out duplicate records many times in my career.  I could write that query on a whiteboard at 2:00 a.m. if I were blind drunk.

    I am looking for other problems of the sort.  I don't care HOW the candidate solves the problem, only that they can solve it as fast as their fingers can type it out, because they have done it so many times before.

    This would be only one section of an interview, during which a panel digs into

    • a candidate's past projects and accomplishments,
    • poses problem scenarios for the candidate to solve,
    • the candidate's Spirit Animal,
    • et cetera, et cetera and so forth....

    That said, I understand the desire to avoid giving the interweb more "interview questions" for candidates to study.

  • "Other similar examples include telling someone to write a recursive CTE to do the FizzBuzz problem or to use a Calendar table to expand date ranges for each row of a table that has a start and end date in it or someone saying that you can't use a function to solve a given problem because they cause performance issues.    Heh... or how about someone that marks your coded answer to combine a DATE and TIME column because you didn't first convert them to DATETIME and then just add them together. "

    I have terrible trouble using recursive CTE's myself.  I've needed to use them occasionally, but every time I have to look it up on the interweb and even then I struggle with them.  Somehow, my brain just does not work that way.  For me, it is like trying to write legibly with my left hand.  I have similar problems with PIVOT and UNPIVOT.  There is one particular case where I use UNPIVOT all the time - I need to resort to a code template I created back in the day when I was able to cram UNPIVOT into my brain (temporarily).

    I also would not ask about the FizzBin problem, because that is not a common problem I would expect someone to have run into a lot.  I might ask about using a Calendar table, because that is a pretty common problem, particularly for someone who has spent time working on data warehouses or with ETL.   And I would be mighty disappointed in a candidate who could not describe a Tally table and how it replaces a loop.

    Regarding writing simple code in an interview:  if you are an experienced candidate (10+ years) applying for a senior SQL engineer position, and you can't write simple code off the top of your head, then you had best be a rock-star in other areas.  Similarly, if you have been working with SQL Server for 10+ years and you don't have a good knowledge of the Internals, then I will question the value of your experience.

    On the other hand, I care very little about your experience with tools:   whether Azure, Snowflake, Grapnark or whatever.  In my opinion, if you are smart, have a track record of getting things done and have a solid grasp of the fundamentals, then you should have no trouble learning to use a new tool.

    On a personal note, this is my punishment for having completed a few tasks competently.  I get asked to write up the documentation for stuff.  (I hate writing.  Writing words is hard work.  Can I go back to working on code, please?)  I get asked to participate in interviews.  (What a massive time sink they are!  Hiring the right people is incredibly important, of course, but can't I just let someone else do it and then complain when they muck it up?)  Then, when the corporate recruiter complains about the candidates I reject (because his income is 100% commission), I am asked to write a list of screening questions.  Seriously, I ask you, how is this fair?

  • "Mind you, you said that 90% of your staff would fail your quiz is a bit alarming to me. That would mean to me that I would want to re-train (in house or externally) some of the staff."

    The quiz is quite brutal.  My team is actually quite good.  If you are interested, send me a message.  We can get on a Skype call and see how you fare!  (Then you can tell me how stupid my questions are.)

    But that is why I posted my question here.  The quiz I came up with is too hard for an initial screen.  However, at this point we are just chatting about finding and interviewing good candidates.  I understand why no one would want to share good interview questions of the sort I want to find.  They are rather a precious commodity.

    Oh, and ChatGPT 4.0 is very good at answering interview questions.  If you are conducting remote interviews, watch your candidates carefully to make sure they aren't typing or looking off to the side at another screen.

  • "If MOST your team would fail your recruitment quiz, I wonder if you would get a team performance boost by getting some training in place and you MAY not need the extra person you are looking to hire?"

    I do think we need a training program.  However, I would end up running it.  Once again, no good deed goes unpunished.

  • "Also, in your above example, what do you consider a "duplicate data entry"? Does just 1 column have to have a duplicate value or is it 2 identical rows? I would want clarification on what you consider a duplicate data entry before I start writing the query. Plus, what column are you using to determine what is "first"? Your question, while it MAY seem easy to you, has a lot of unknowns to someone who doesn't know the data and without knowing the data, you can end up with a lot of different answers that are all technically correct, but not what you were looking for."

    Candidates are welcome to ask questions about the problems posed.  In fact, that is often a good sign.  However, I'd like to address your other questions.  I think the thing is more obvious to an experienced SQL engineer than you think.  Again, here is the table I give in the problem:

    CREATE TABLE dbo.DataEntry
    ( RecordID INT IDENTITY PRIMARY KEY
    , DataEntryColumnA VARCHAR(30) NULL
    , DataEntryColumnB VARCHAR(30) NULL
    , DataEntryColumnC VARCHAR(30) NULL
    , DataEntryColumnD VARCHAR(30) NULL
    , DataEntryColumnE VARCHAR(30) NULL
    , DataEntryUser SYSNAME NOT NULL DEFAULT( SUSER_SNAME() )
    , DataEntryDate DATETIME NOT NULL DEFAULT( GETDATE() )
    );

    I want you to remove "duplicate records".  Which column do I care about, you ask?  Well, let's think about this a bit, shall we?

    RecordID is an INT IDENTITY primary key.  There will be no duplicate records here.

    DataEntryUser & DataEntryDate with the default values given are clearly just timestamps.  Why would I care about duplicates here?

    Clearly I care only about the VARCHAR(30) columns, and I also think it obvious that I care about all of them.  A more subtle point is that this sort of table represents a sadly common anti-pattern:  a table with an INT IDENTITY primary key and no natural key.

    Surely you have experienced the frustration of trying to determine what the data in a table such as this actually represents.  In my career I have spent many hours writing one query after another like  GROUP BY X, Y, Z  HAVING COUNT(*) > 0 on a deep and wide table, desperately trying to find some combination of columns that has unique values.  Typically I can get close, but inevitably, there are some duplicate records in there.  I even have made it into a personal meme:  "If your table allows some data to be entered, eventually some user will enter that data, no matter how unlikely it may seem."  I also have a finely-tuned soap-box rant about this:  "Add a surrogate key if you insist, but always, always, always define a natural key!"  (There are a handful of exceptions, and those exceptions make good follow-up questions, but they also end up proving the rule.)

    I think an experienced engineer with 10+ years of diverse, worthwhile experience working with SQL will have run into this sort of thing many times, and I have follow-up questions designed to pry into that.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply