February 25, 2011 at 10:13 am
Hi, I am very new to the SQL world. I have a table that contains transactional data. I created the table with a primary keys (employee and transaction date). When i tried to load the table with data, it failed because it said it found duplicate keys.
My questions are:
1. Can a primary key be non unique?
2. Can I create a non unique index without a primary key?
3. Should I have no keys on the table and create a view which would sort the data the way I want?
There will be approximately 70,000 records added to the table each year. I will need to generare reports from this data. What is the best way to proceed from a performance point of view?
Any advice is appreciated. Thanks.
February 25, 2011 at 10:27 am
1. Can a primary key be non unique?
NO. This is the definition of a primary key.
2. Can I create a non unique index without a primary key?
Yes.
3. Should I have no keys on the table and create a view which would sort the data the way I want?
I would think this is not the best approach. You don't have to have a primary key on a table but there is probably a reason you have this data stored. If you know what columns you want to sort on those are VERY likely candidates to have some indexing on.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
February 25, 2011 at 10:38 am
Thanks Sean.
Lets say I just create a non unique index on employee and date with no primary keys defined. When I write queries or programs for the users to extract this data, will not having primary keys hurt the performance of the system (will the queries take longer to run versus if they read a view of the table)?
February 25, 2011 at 10:44 am
Not really sure what your are meaning about a non-unique index in the context that you are referring to it.
With 70,000 rows i would think you want to have a primary key. Remember that it can be more than one column.
When I write queries or programs for the users to extract this data, will not having primary keys hurt the performance of the system (will the queries take longer to run versus if they read a view of the table)?
Not having proper indexing on your table will have a negative impact on performance with or without a primary key. Remember that a view is just a precompiled query with an established execution plan.
Just curious, why are so against having a primary key on this data?
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
February 25, 2011 at 11:01 am
I would like a key or index on employee and transaction date. However, the data is not unique, so I can't make them primary keys. If SQL could have non unique primary keys, that would be ideal. I could do this in my old IBM mid-range days.
February 25, 2011 at 11:05 am
Steve.hitch (2/25/2011)
I would like a key or index on employee and transaction date. However, the data is not unique, so I can't make them primary keys. If SQL could have non unique primary keys, that would be ideal. I could do this in my old IBM mid-range days.
I'm not sure about the IBM system, but I think the issue here is what these items are supposed to do.
A Primary Key, by definition, is a way to find a specific row in a table. That's its purpose.
Now, most people tend to use the PK as the Clustered Index. The clustered index is the primary seeking mechanism on the table, and sorts the physical full table according to it's definition. However, these don't have to be the same thing.
I would recommend creating a clustered index on your two fields to assist in querying, and perhaps adding an identity column as a non-clustered primary key, just so you can explicitly find specific records.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
February 25, 2011 at 11:21 am
Steve.hitch (2/25/2011)
I would like a key or index on employee and transaction date. However, the data is not unique, so I can't make them primary keys. If SQL could have non unique primary keys, that would be ideal. I could do this in my old IBM mid-range days.
A Primary Key constraint is by definition unique. This is one of the foundations of referential integrity.
I've myself worked for many years with IBM technologies - could you please point me to the specific document where any IBM technology allows for a non-unique, enabled Primary Key?
Also, it may happen your PK needs an additional column to enforce uniqueness - not knowing the specific table of application something like (employeeId, transactionDate, TransactionNumber) may work.
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.February 25, 2011 at 11:26 am
So, on this logging table, how do you identify a single row ?
If you cannot: Is this worth a concern for your data system ?
If that is a concern, one of the common techniques of creation a primary key without hurting the business rules is to add a sequence number (int identity) and make that the primary/unique key.
However, you should always check if your system can handle such modification. ( by default a column with the identity property is not to be mentioned in an insert statement for sqlserver )
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
February 25, 2011 at 11:41 am
Steve:
I think what's happening is you're using the term "primary key", but what you mean is "clustered index". They are often the same columns in an SQL database, but they aren't the same thing.
A "clustered index" defines the order the rows are stored.
A "key" defines the row of data. The "primary key" is simply the key used to best define which row is which. By definition in SQL Server (and in every relational database I've worked with), the "primary key" has to be unique, because otherwise it doesn't really define the row.
In many tables, they are defined on the same columns, and that's the default behavior in SQL Server. That often causes people to think they're the same thing. But they don't have to be defined that way.
If, for example, you want the data to be stored in the sequence of employee and transaction date, you would define the clustered index on that. That doesn't require that they be unique, it just defines how the table physically stores the data. It's very convenient for selecting ranges of rows. If you frequently want all of an employee's records, sorted by transaction date, clustering on those columns will make that query very efficient, because it optimizes for I/O off of a spinning disk (the most common storage used).
Having a primary key on the table is mainly useful for making sure that updates/deletes and single-row selects get "the right row". If you don't have something defining the rows, one from the other, you can easily end up accidentally operating on multiple rows when you intend to operate only on one. That's one of they uses of a primary key (or any other unique index, for that matter). Another is enforcement of the rule that no two rows in a relational table should ever be identical to each other.
Does that help?
On another point: Having a view defined "to order the rows" doesn't really work in SQL. A view with an "order by" clause doesn't really force the data into a particular order. Neither does an index on it. If you want data in a particular order, you have to include an "order by" clause in your query of the data. Data will often be returned in the order of the index that's being queried (clustered index or not), but it's not guaranteed. You mentioned something that made me think you may not know that, so forgive me if I'm telling you what you already know.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
February 25, 2011 at 11:46 am
Hi Paul,
I worked on the IBM AS/400 system for years. When I created a file and built key fields, the default was unique keys. You could override that and say unique keys no. I didn't need to use this often, but it was a nice feature.
Unfortunately, the way this data is arranged doen't allow for unique keys. The data captures project data for employee by date. The payroll system will allow multiple entries for an employee/project per day.
That's why I thought maybe in this case to not include primary keys on the table and build a view that would give queries/report programs the qiuck access path it would need to run efficiently.
Thanks.
February 25, 2011 at 11:50 am
Steve.hitch (2/25/2011)
Hi Paul,I worked on the IBM AS/400 system for years. When I created a file and built key fields, the default was unique keys. You could override that and say unique keys no. I didn't need to use this often, but it was a nice feature.
Unfortunately, the way this data is arranged doen't allow for unique keys. The data captures project data for employee by date. The payroll system will allow multiple entries for an employee/project per day.
That's why I thought maybe in this case to not include primary keys on the table and build a view that would give queries/report programs the qiuck access path it would need to run efficiently.
Thanks.
Put the clustered index on the columns you're querying by most frequently (employee and transaction date), and don't bother with a primary key, in that case.
Since inserts will be non-sequential in this case, make sure the clustered index has enough padding in it to avoid heavy fragmentation.
If the data will fragment too much that way, you might try clustering on transaction date and employee (instead of the other way around), as that will force new transaction dates to be added "at the end of the table", which has significant advantages in speed and efficiency for the inserts.
If the table gets added to frequently, you'll almost certainly be better off with the date column being the leading edge of the clustered index.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
February 25, 2011 at 1:28 pm
Steve.hitch (2/25/2011)
I worked on the IBM AS/400 system for years. When I created a file and built key fields, the default was unique keys. You could override that and say unique keys no. I didn't need to use this often, but it was a nice feature.
...and, AS/400 will build a non-unique index on those "key fields" - not a primary key.
Steve.hitch (2/25/2011)
Unfortunately, the way this data is arranged doen't allow for unique keys. The data captures project data for employee by date. The payroll system will allow multiple entries for an employee/project per day.
...and, why you have to have a primary key? there is no obligation to have one if you don't need it.
Question... how is the system going to avoid duplicate information from coming into it? like if capture process is ran twice one day, by mistake.
Steve.hitch (2/25/2011)
That's why I thought maybe in this case to not include primary keys on the table and build a view that would give queries/report programs the qiuck access path it would need to run efficiently.
A view is nothing but a catalogued query so, anything you can put in a view you can also put in a query. In general views are used to restrict access to certain columns, not to provide an "order by"
What kind of query is sent against this table?
Are queires looking for one specific row or are they looking for a range of rows? based on which columns?
Finally, how many rows are those queries going to retrieve from the table? just tell if less than 15% or more than 15% of the total population 🙂
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.February 25, 2011 at 1:41 pm
Steve.hitch (2/25/2011)....
Unfortunately, the way this data is arranged doen't allow for unique keys. The data captures project data for employee by date. The payroll system will allow multiple entries for an employee/project per day.....
And is there a datetime/time registration kind of column in that table?
How do you identify distinct rows ?
Your primary / unique key can be a multiple columns constraint !
How do you avoid actual duplicate registrations in that table?
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
Viewing 13 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply