UDF/Query to Show dupes (but not entire Dupe)

Question

Post reply

UDF/Query to Show dupes (but not entire Dupe)

Mike Walsh-281108

Grasshopper

Points: 21
More actions
December 13, 2005 at 2:26 pm

#90096

Hello -
Normally I am on these forums answering questions whenever I can, and trying to learn from helping someone else out but now I need help!!
We have a couple tables that have gone through an already set up import process. This process is taking data from a legacy system. In the legacy system a company record was broken down into account groups. In the current system the account groups don't exist, so there were potential times where a Company could have had multiple account groups and for one of the tables could have had different prices per account group. In the import process that was written before I started, the logic just grouped by the columns that were unique across a company and grouped the two columns that could have been different for an account (price and comments).. So now I am tasked with creating an exception list, I can do it slowly and with several steps, and have already started doing so, Basically I use dervied tables and grouping joins to create another temp table. This table looks like:
CREATE TABLE [tmpDupesCSP] (
[companyID] [int],
[CompanyServiceId] [int],
[CompanyName] [varchar] (100) ,
[Description] [nvarchar] (75) ,
[prgm_code] [nvarchar] (4),
[prgm_desc] [nvarchar] (40),
[serviceID] [int],
[Service] [varchar] (250),
[price] [money] NULL ,
[prgm_notes] [nvarchar] (250)
)

Within this data only those records that had more than one case of duplicate amongst (CompanyID CompanyServiceID CompanyName Description Prgm_Code Prgm_Desc Service ID and Service) - For the rest of this post I am going to call the tables in the parenthesis above "The Grouped Columns", and could have had different or duplicate Price and Prgm_Notes. I only selected the data where the grouped columns had a count of more than one, so only dupes are in here.
Now the data in this table can meet one of a few different criteria relating to price (If someone can help with price, I can use the same process for prgm_notes, so I'll just talk about price). These criteria could be:
1) The Grouped Columns Appear n times (where n can be 2 - 6 in the current set, but really could theoretically be much higher), but the price is the same amongst those.
2) The grouped columns Appear n times and the price is different for one or more of those apperances of The Grouped Columns.
Ideally my query would give a result set of:
[All Grouped Columns] [Price1] [Price2] [Price3] [Price n] etc. etc... If the most matches is 6, then the query would spit out 6 prices, if the most matches were only 3 then the query would only spit out 3 Price columns...
I would like to use the process to go back up one level potentially before the data was fully grouped, and add the AccountGroup information so I could list:
[All Grouped Columns] [AccountGroup1] [It's Price] [AccountGroup2] [It's Price] etc. etc.
I have been looking at this too long so maybe there is an easy solution out there, then again maybe there isn't.

Thanks in advance for the help, and by the way this is a great site, I think I'll be using it much more often, and hopefully will help contribute to it.

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Sergiy SSC Guru Points: 110209 More actions · Answer 1

Which of the prices should Price1, which Price2, etc.?

_____________
Code for TallyGenerator

Ian Yates SSCoach Points: 19738 More actions · Answer 2

And how do we get the account group? I don't see it in your table DDL you posted...

Is your ultimate goal to put this into a spreadsheet or do you wish to use this output to programatically resolve the conflicts?

If we spit out 6 price columns for 1000 of these duplicates, but only one of them had 6 unique prices, is the price6 field set NULL for all of the other rows?

If you can provide more information about your ultimate end goal perhaps we can provide a better path to it?....

Look forward to your comments

Mike Walsh-273529 Newbie Points: 9 More actions · Answer 3

Fair questions. I didn't post the account groups because I can get to those through joins with the basic idea for what I am trying to do gleaned from an answer here... Basically the account group is figured out through a join on the CompanyID and the Prgm_Code to two tables. That isn't my concern, it is how to build this.

Basically to give an answer, just add a varchar(50) Column to my DDL called Account_No. This will not be part of the "Grouped Columns" but will be different for every instance of the grouped columns. So for instance:

Company Account Service Prgm_Code Price Notes

IBM National 'Tooth Cleaning' 75 5.00 'Only for full time'

IBM Local Sales 'Tooth Cleaning' 75 5.75 '2x /yr'

And I would want the report to look something like(but with all the fields I mentioned):

COMPANY-SERVICE-PRGM_CODE-ACCOUNT GROUP1-PRICE1-ACCOUNT GROUP2-PRICE2-ETC

IBM-'Tooth Cleaning'-75-National-5.00-Local Sales-5.75

My ultimate end point is going to be putting this into excel and giving the files to people who are familiar with the data to make decisions on which of the n prices are correct.

Regarding the Nulls, if out of 1000 duplicates the most anyone set of duplicates had of cases where prices were different was 6, but the majority were 2, then I would be fine with Account Group and Price 3-6 being NULL for the majority. I can take care of that quick enough with excel. What I don't want to do is make up a bunch of Account Group and Price placeholders for the potential to have a set number. In other words if for one particular running there are a max of 7 different prices for one record then there should be PRICE1-PRICE7 and AccountGroup1-AccountGroup7, and if on another running there are only 3, then that running should only have columns PRICE1-PRICE3, and AccountGroup1-AccountGroup3.

I suppose I could use a report engine like Crystal or Access (We are a SQL Server 2000 shop not using report services - from above -) and do a subreport now that I think about it, but I don't want to. I should be able to do this with SQL code, and I'm mad that I can't figure it out, so I am hoping someone else can and I can learn something in the process. Thats what it's all about learning.

Ian Yates SSCoach Points: 19738 More actions · Answer 4

SQL itself is not designed to output a variable number of columns based on its input. The idea is that you have a sql statement which will always produce a consistent resultset in terms of the columns, their types, their positions, etc...

To get what you want you need to use dynamic SQL and build up some ugly select statement...

You could construct a very wide temporary table (eg, with 100 price columns all defaulted to null). Then you would have some while loop to insert into this table - the looping variable would be your price number. On each pass through the loop, your price number would increase - so your first loop pass would get the lowest distinct price for each group, then the second pass would get the second lowest distinct price, etc... The insert statement in the loop would need to use dynamic SQL as the column into which it is inserting would change each time.

Whilst looping you would also check @@ROWCOUNT to see how many rows were inserted - if zero rows were inserted you could stop looping immediately. You should also store the value of the loop counter at this stage.

Finally, build a simple select string to select the data from your temp table, adding as many columns as your loop counter covered. Then you have your summarised data - this approach should at least avoid the need for messy cursors...

I can see a way to use a cursor on your data though which might be ok - you'll have to try it out. If you had a cursor to go over the records and have the records sorted in the grouped order, then you could compare each record to the one prior. If the groups match, then compare the prices - if the prices match, skip the row. If the prices do not match, update the value of next price column and increment the price column# by one. If the groups do not match, then set your price# back to price#1 and insert a new row containing the new group names and a value in price#1.

Depending on how many rows you have, the cursor may be faster as you needn't write the SQL to get the Nth lowest price, which can be inefficient.

Hope that is clearer than mud

David A. Long SSCrazy Eights Points: 8421 More actions · Answer 5

Here is an example of the n number of columns from rows, maybe it will ignite some sparks?

http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=169&messageid=237882#bm238057

You also might check out these examples:

How to dynamically number rows in a SELECT Statement

http://support.microsoft.com/?id=186133

Andy

Mike Walsh-281108 Grasshopper Points: 21 More actions · Answer 6

Quite clear in fact. I like both suggestions and thank you. I am not scared of dynamic sql, use it when I have to.. I actually like the Cursor idea though, we are only importing 250 - 1000 clients at a time, so at most there would really only ever be 250 - 10000/150000 individual cases where there are dupes. The majority of which would have the same price leaving somewhere around 250-7000ish that need to go into the report.

Thank you for your suggestions. I will work on that this morning and try to take care of this all inside of SQL.. I am not a Crystal guy, I can make the reports work but they are ugly and I probably add a lot of superflous steps.. As for Access Reports.. Well.. Need I say more?

Mike Walsh-281108 Grasshopper Points: 21 More actions · Answer 7

Andy - Thank you for both or your links they combined with Ians post have sparked some ideas, so I am off to write the query fresh from a night sleep not looking at QA.

Thanks!

Ian Yates SSCoach Points: 19738 More actions · Answer 8

Good to hear you are on track Let us know how you get on - am interested to see final solutions and how well they work for you.

Mike Walsh-281108 Grasshopper Points: 21 More actions · Answer 9

I will do so just as soon as I get back on the task. I have been diverted 85 times since that was my priority 1 task.. Management always has more priority 1 tasks that are even more important that other priority 1 tasks.. Everything is a fire, didn't you know?

Mike Walsh-281108 Grasshopper Points: 21 More actions · Answer 10

Alright - so I had some time today to sit down and play with this, attempting use the cursor.. And I choked. I don't use Cursors often, and the ones I do use are fairly simple. Thankfully for over 6 years now I have managed to steer clear of real complicated slow hoggy cursors..

Well now I need to create one, and I keep tripping up on the logic. Do I need a nested cursor here, Ian? Basically I am confusing myself with how best to compare the rows, especially if there are more than two cases where there are similar grouped columns and different prices.

I suppose I want to save a variable also with the number of different prices for the current group, so I can then build my insert off of that number, I also believe I would need to find out the max possible for this particular table before building any inserts so I could use that to determine dynamically how to build the table as the first step.

That I could do with a query before the cursor, so the help I need is with the actual Cursor. I am unable to wrap my head around this one sadly.