This blog post summarizes the type of technical questions I would ask candidates for a Microsoft SQL Server data platform administrator and database developer role.
Hopefully this helps both candidates and managers prepare for interviews. I have no qualms in providing the brief answers because your interview, like mine, should be:
- behavioral: based on scenarios, not multiple choice answers.
- open ended: ask for an explanation, not a single word answer.
- conversational: testing how the candidate would explain this to a client or colleague.
- applicable: only ask questions relevant to your environment and in the job description.
When I was the manager of a SQL Server managed service provider and a principal consultant at a consulting company, I interviewed and hired database administrators to be consultants, remote DBAs, and database developers for our app dev internal projects.
I've divided the 30 questions into four categories.
Use these questions to prompt an open-ended explanation that should reveal not only the candidate's depth of knowledge in the area, but how they explain the concept. As a candidate, you should be familiar enough with basic concepts like this to give a concise, confident explanation that doesn't wander. As a consultant, explaining these type of concepts to clients and colleagues may be a regular part of the job.
1. Speaking generally, what is the basic relationship between a database, a table, and a column? Yes, this is a super simple question. The point is to hear how the candidate organizes their thoughts and explains the concepts. You'd be surprised how many candidates struggled with explaining even basic concepts. This appeared to be especially true of candidates who had more server admin experience than data experience.
2. What is the SQL Server transaction log and how does it work? How does it play a role in DR? This answer should hit the important notes about recovery models, log backups, RTO and RPO.
3. Tell me about the different kind of backups that can be performed on a SQL Server database, and when do you typically perform them? Continuing from a conversation about the transaction log and the database recovery model, the qualified candidate should definitely mention the transaction log, perhaps mentioning how a transaction log backup is an incremental backup, as opposed to a differential backup, or a full backup. The full backup is the start of a chain which then includes differential and transaction log backups.
4. Explain the difference between primary and foreign keys. The qualified candidate who understands the concepts should start talking about the integrity of data between tables, and perhaps given an example of the type of data that would be constrained by a foreign key. I've heard some pretty far off answers to this question that expose when a candidate has clearly never designed tables before.
5. What is an IDENTITY column and when would you use one? The qualified candidate should talk about the advantages of an auto-sequencing number as a primary key in relational table design. Bonus points for a rant about clustered keys on GUIDs being an antipattern.
6. Tell me the difference between a database in FULL recovery model and a database in SIMPLE recovery model? I would only ask this question as a final chance for a candidate to explain this, if they haven't already done so in the previous two opportunities. By this point, the crucial concept of a point-in-time restore made possible by transaction log backups should have been discussed.
7. What's the difference between truncating and shrinking the transaction log? Shrinking should be discussed as a pejorative here, something that occurs once only if needed and never on a schedule. Truncating the log happens regularly, every time a transaction log backup happens. Sometimes, it is necessary to truncate a log without backing it up only in an emergency.
8. You get a request to copy a single table to a database on another server. It needs to be a readable copy of the table that is always up to date with no or very little latency. What would you do? Multiple alternatives here that can be discussed. If replication, what kind of replication? Or, change data capture (CDC) or change tracking (CT) and custom SSIS packages, more details on how this would be implemented, hopefully drawing from experiences? Minus points for mentioning deprecated features like database mirroring, or features like database snapshots or log shipping that do not fit the latency requirements. Followup: what about if I wanted to do this with an entire database? Availability groups come into play, or some kind of third party product.
9. SQL Server availability groups: what are some of the advantages of AG (Availability Groups) vs FCI (Failover Cluster Instance)? Tell me about the kind of experience you have? Advantages of an AG: readable secondary, built-in error detection, backups on a secondary replica, synchronous or asynchronous replication. Bonus points for mentioning basic availability groups or distributed availability groups, or the Azure SQL Managed Instance Sync to SQL Server that leverages distributed availability groups.
10. What are some of the factors for availability groups spanning two subnets? There may be other discussion that is valuable here to learn about the candidates experience, but the key point is that the Listener has an IP address in each subnet. Applications need to use MultiSubNetFailover in the connection strings, so that application connections can failover immediately, regardless of which subnet hosts the primary nodes.
11. Tell me about the difference between users and logins in SQL Server. Server logins, database users. Logins handle authentication, users handle database access. Bonus points for mentioning partially contained database users, like in Azure SQL Database, where the user handles both authentication and database access.
12. Tell me about situations in which you've used PowerShell to administer a SQL Server. Lots of reasons a candidate has hopefully used PowerShell. Open to a lot of approaches here, including PowerShell scripts for deployment, remoting, SQL Agent job steps. Bonus points for experience using dbatools.io suite of custom SQL Server PowerShell cmdlets. Further would like to dig into how much of the PowerShell the candidate actually wrote.
13. Tell me about some of the differences between Azure SQL Database and a SQL Server instance? Lots of answers here
. Would like for them to speak experientially about these. Perhaps starting with "You can't use USE" or "You can't manually take a backup".
14. What is a DTU? Azure SQL Database has two purchasing models, a vCore purchasing model and a distributed transaction unit (DTU), which is used for scaling/governing the all the performance resources in a single number. Perhaps discuss the Basic/Standard/Premium tier, but try not to ask questions about how Microsoft sells Azure SQL Database, but how the candidate has used Azure SQL Database.
15. What are some things you have to do differently with maintenance on 10 TB database vs a smaller database? If applicable, ask this question and look for particular experience with large databases. The candidate should mention things like partitioned tables, columnstore tables, partitioned index maintenance and custom index maintenance strategies. Use replicas for alternative DBCC CHECKDB strategies. Would like to hear relevant lessons learned from the applicant’s experience as it applies to their experience with very large databases.
Database design questions
16. Tell me about the different types of indexes in the SQL Database Engine. I'm looking for clustered, nonclustered, and columnstore indexes as the big three I want to hear about. Bonus points for other types of indexes like xml or ordered clustered columnstore indexes. I'm not looking for a list of indexes of course, but a short discussion on the role of each in the table, how many of each, what kind of queries they benefit.
17. What can you do with columnstore indexes, when would you use them, and what restrictions do they add to a table? I'd ask this question if it wasn't discussed in the previous question. Most helpful for large tables (like millions of rows). Columnstore indexes are highly compressed data for heavy reporting workloads, and for queries that would otherwise be scanning millions of rows of data. There is no need to order the keys. Columnstore indexes can be the clustered index, or a "nonclustered" index. In SQL Server 2022, ordered clustered columnstore indexes can benefit from partition elimination to greatly improve performance. Would like to hear in what scenarios the candidate has used or considered columnstore indexes and why.
18. Suppose we have a simple table for products where we have a product name and a price. Tell me how you'd write a query to return the ten most expensive products. Some candidates might go straight to a convoluted subquery or WHERE clause, and eventually realize they have overthought it. SELECT TOP 10 name FROM product ORDER BY Price DESC; is the correct answer, with the TOP and the ORDER BY being the key points here.
19. We still have the simple table for products... lets say that the client wants to start tracking price changes. What architecture changes or features of SQL Server would you use to accomplish this? Interested to hear how the candidate has accomplished this common database development task. Possible answers (in order of descending age) include database triggers, CDC, change tracking (CT), or temporal tables. An answer that is entirely application-dependent is missing the mark here for a database-focused role. Bonus points for mentioning temporal tables, introduced in SQL Server 2016. Bonus points for mentioning the data warehousing concept of a slowly changing dimension.
20. What are the differences between the numeric and decimal data types? Float and real? There is no difference between numeric/decimal. Float and real is an approximate floating point data type that may round values to the right of the decimal point, and could result in rounding errors. Bonus points for discussing the significant danger of float and real data types when used for decimal data, especially when used as unique decimal data.
Database development questions
21. Tell me how you'd write a query to return all the records in table A that match a common key record in table B? The answer is an INNER JOIN, something like SELECT ... FROM TableA inner join TableB on TableA.Key = TableB.Key; Answers involving a UNION or a comma join or a WHERE clause or some convoluted subquery are not what I am looking for.
22. Tell me how you'd write a query that given a datetime variable, would strip away any hour and minute information and just return the date? Convert to DATE data type, introduced in 2008. Other solutions here requiring string manipulation are dated or too elaborate. Bonus points for mentioning that we should be using datetime2 or datetimeoffset instead of the datetime data type.
23. Assume you have a SQL Server 2016+ instance with an application that has been in production for months. The instance has not been restarted recently. What steps would you take to determine what new indexes may need to be created or dropped? To discover indexes to be created or dropped, two sets of DMV’s respectively: the set of missing indexes DMVs, and the index usage stats to determine if any indexes are unused.
24. What are the differences between DELETE and TRUNCATE? DELETE can be filtered, TRUNCATE impacts whole table, may be faster because it’s efficiently logged (but it is still logged). TRUNCATE is technically DDL, could be blocked by FK’s, permissions.
25. What are some of your favorite new features of SQL Server 2017? 2019? 2022? Would like to hear new features in use. Optionally, I often ask for this answer in a short writing assignment as "homework" before a first or second interview.
26. Tell me about the difference between a scalar and table function. When would you use either one of them? Scalar functions return one value, table functions return a rowset. Neither are very good for performance, though new features in SQL Server 2019 and 2022 have reduced the negative impact. Functions are different from a stored procedure in that you can SELECT or JOIN directly to them. Functions are typically used for more object-oriented database development, emphasizing code re-use.
27. Explain the difference between blocking and deadlocking? Locks create both blocking and deadlocking, and are necessary for a pessimistic concurrency database. Blocking may go on forever., until the application times out. Deadlocks are logical conflicts where a victim process or processes are killed immediately. The READ UNCOMMITTED isolation level (NOLOCK) bypasses locking, but minus points if a proper caveat/warning about this isn't mentioned. Bonus points for mentioning memory-optimized tables.
28. Tell me about situations in which you've used Extended Events. Candidates have hopefully used XEvents instead of traces, as traces as deprecated. XEvents are superior to traces because they can be configured to be asynchronous and have less impact and overhead on the instance. XEvents are used to capture app activity, diagnostic activity, performance metrics, or most commonly, recent deadlock history. There are already XEvent sessions running on a server by default.
29. Have you ever used SQL Server Integration Services (SSIS)? What kind of tasks did you accomplish and how? Would like to hear technical specifics about controls used, and also how it was deployed. Ask probing questions. Try to figure out if candidate actually did the work, or just was aware of it or supported it. Follow-up question: What are some of the advantages of the project deployment model (first introduced in SQL Server 2012.) As opposed to the older package deployment model, the project deployment model allows for environment variables for running the same code with different project variables, also code version history, built-in logging via SSISDB.
30. Have you ever used Azure Data Factory (ADF)? What kind of tasks did you accomplish and how? Would like to hear technical specifics about controls used, and also how it was deployed. Ask probing questions. Try to figure out if candidate actually did the work, or just was aware of it or supported it. ADF changed a lot in 2017 with the release of ADF v2, which is the current version on Azure, and generally ADF v1 was an incomplete product.
Optional homework for the candidate
1. Ask for a writing sample on their favorite new feature of SQL Server, one page on what we would show a client or colleague who was asking about the new feature.
2. A script from their personal "toolbox" of scripts that they developed, either in T-SQL or PowerShell.
Legal stuff that HR wants you to know
Do not ask questions that have nothing to do with the candidate's job qualifications. In your interviews with candidates, you don't need to know, and should not ask, about:
- their family, kids, or plans for having kids,
- relationship status,
- date of their education graduation,
- race or cultural background, including the origin or meaning of their name,
- place or country of birth,
- sexual orientation or gender identity,
- political persuasion,
- or private medical history.
- You don't need to know about their disability status, only if they need an accommodation.
- You don't need to know if they are citizens or immigrants to your country, only if they can legally work in the country.
- You don't need to know much much money they made at their last job.
If you have questions about the above, ask a qualified human resources professional. Remember that if the candidate seems like someone you want to have a beer with, that has nothing to do with the job.
Questions? Your experience? Other good questions you ask? Feel free to add them in the comments below.