Question -Find rows where only some fields are duplicate-

  • Ok I got another fun one I got a table Site, which contains sites (Duh)

    ([Sell-to Customer No_] varchar(20),

    [Name] varchar(50),

    [Name 2] varchar(50),

    [Address] varchar(50),

    [Address 2] varchar(50),

    [City] varchar(30),

    [Contact] varchar(30),

    [Country Code] varchar(10),

    [Phone No_] varchar(30),

    [GSM] varchar(30),

    [Fax No_] varchar(30),

    [PrintOnInvoice] tinyint,

    [Post Code] varchar(20),

    [Global Dimension 1 Code] varchar(20),

    [Global Dimension 2 Code] varchar(20),

    [NoSeries] varchar(10),

    [Territory Code] varchar(10),

    [Contact No_] varchar(20),

    [Creation Date] datetime,

    [Tender Date] datetime,

    [Contact No_ 1] varchar(20),

    [Business Relation 1] varchar(10),

    [Contact No_ 2] varchar(20),

    [Business Relation 2] varchar(10),

    [Contact No_ 3] varchar(20),

    [Business Relation 3] varchar(10),

    [30bis Identification] varchar(30),

    [Comments] tinyint)

    This table is filled up with the help of the following query

    select distinct

    oh.Customer,ISNULL(UPPER(LEFT(od.Name1,50)),''),ISNULL(LEFT(od.Name2,50),''),ISNULL(UPPER(LEFT(od.Street1,50)),''),

    ISNULL(LEFT(od.Street2,50),''),ISNULL(UPPER(LEFT(od.Place,30)),''),ISNULL(UPPER(LEFT(od.Name1,30)),'')

    ,ISNULL(UPPER(LEFT(od.Country,10)),''),ISNULL(LEFT(od.Telephone,30),''),'','',1,

    ISNULL(LEFT(od.Postcode,20),''),'','','','','','1753-01-01 00:00:00.000','1753-01-01 00:00:00.000','','','','','','','',0

    from dbo.Orderdelivery od

    inner join dbo.Orderheader oh on od.OrderId = oh.OrderID

    where 1 = 1

    and od.Street1 IS NOT NULL

    Now comes the fun part I get records that the same information except for a few fields

    So I get for instance a site with Adress, City,Country filled in but with an empty postcode,

    but i can also get the site again but then without a postcode.

    In this case I want to keep the one with a postcode, but not all sites will have a postcode

    Anyone got any ideas how to do this

  • Why you have used here where 1=1

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • It's a habit of mine to use where 1 =1 cause then when i need to add/remove filters I don't have to remove the where only 'and'

  • I would create 2 select statements as subqueries in your main query and join them using the columns that you know must be equal to uniquely define a record and then in the SELECT part of your statement, use CASE to return the fields with value.

    HTH,

    B

  • It's not entirely clear what you want, but I think you are after selecting the "best" version of each delivery address, where "best" means it has a postcode.

    So, use a

    row_number() over(partition by oh.Customer order by od.Postcode DESC) as [Best]

    then wrap your whole query inside a subquery (or CTE) and select WHERE [Best]=1

    This will give you the first address for each customer that has a Postcode - while still returning an address for those that don't have a Postcode.

    If you need to prioritise between the addresses (more than one with a postcode, more than one without), just add more columns / expressions to the ORDER BY part of the ROW_NUMBER function.

    Of course, this is simplistic and some validation of the Postcode may be required to make sure that a simple value of "Z" doesn't get priority over "W12 8QT"

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Can you post the DDL for Site, OrderHeader and OrderDetail as we might then understand the problem a little better.

    What is the process for entering data into the OrderHeader and OrderDetail tables, is it a Manually typed in through a UI and then used to update the Site?

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • mister.magoo (11/7/2012)


    It's not entirely .... may be required to make sure that a simple value of "Z" doesn't get priority over "W12 8QT"

    That's indeed what's required, the sites are also 'unique' per customer', but a site can have a different name.

    Basically this is for an order delivery application for delivery of building materials to construction sites, on an order multiple sites can be present but each site belongs to 1 order and 1 customer.

    The new application is in place so this is to pull the data over from the old system, the old system did not have a table to store sites since they were stored in the order delivery detail table which also contains what was being delivered. The old system didn't check whether the site is present or not. SO the Name of the site could differ as could the postcode and contact, although the latter really isn't that important in the new system.

    This problem started affecting the link table between orders and sites where wrong sites got linked.

    so I came up with the following

    I use a row number to get the best postcode RNPC, and a row number for the best name RNC

    WITH CTE

    (

    [Sell-to Customer No_],[Name],[Name 2],[Address],[Address 2],[City],[Contact],[Country Code],

    [Phone No_],[GSM],[Fax No_],[PrintOnInvoice],[Post Code],[Global Dimension 1 Code],[Global Dimension 2 Code],

    [NoSeries],[Territory Code],[Contact No_],[Creation Date],[Tender Date],[Contact No_ 1],[Business Relation 1],

    [Contact No_ 2],[Business Relation 2],[Contact No_ 3],[Business Relation 3],[30bis Identification],[Comments],

    RNPC,RNC) AS

    (

    select distinct

    oh.Customer,ISNULL(LEFT(od.Name1,50),''),ISNULL(LEFT(od.Name2,50),''),ISNULL(UPPER(LEFT(od.Street1,50)),''),

    ISNULL(LEFT(od.Street2,50),''),ISNULL(UPPER(LEFT(od.Place,30)),''),ISNULL(LEFT(od.Name1,30),'')

    ,ISNULL(UPPER(LEFT(od.Country,10)),''),ISNULL(LEFT(od.Telephone,30),''),'','',1,

    ISNULL(LEFT(od.Postcode,20),''),'','','','','','1753-01-01 00:00:00.000','1753-01-01 00:00:00.000','','','','','','','',0,

    ROW_NUMBER() OVER(PARTITION BY oh.Customer,od.Street1,od.Place,od.Country ORDER BY od.Postcode) As RNPC,

    ROW_NUMBER() OVER(PARTITION BY oh.Customer,od.Street1,od.Place,od.Country ORDER BY od.Name1) As RNC

    from VMSER_TP7VM.dbo.Orderdelivery od

    inner join VMSER_TP7VM.dbo.Orderheader oh on od.OrderId = oh.OrderID

    where 1 = 1

    and od.Street1 IS NOT NULL

    )

    select [Sell-to Customer No_],Name,[Address],City,[Country Code],

    ISNULL(CASE WHEN CTE.[Post Code] = ''

    THEN (select top 1 [Post Code] from CTE t

    where 1 = 1

    and t.City = CTE.City

    and CTE.City <> ''

    and t.[Post Code] <> '')

    ELSE CTE.[Post Code]

    END,'')

    [Post Code],

    RNPC,RNC

    from CTE

    where 1 =1

    --and RNPC = 1

    and RNC > 1

    --and Name = 'WERF BOSTOEN'

    --and CTE.Address = 'DEPOT'

    order by city,address

    This also showcases why I use where 1 = 1, if anyone thinks this can be better please tell.

  • Resender (11/7/2012)


    It's a habit of mine to use where 1 =1 cause then when i need to add/remove filters I don't have to remove the where only 'and'

    +1

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Viewing 8 posts - 1 through 7 (of 7 total)

    You must be logged in to reply to this topic. Login to reply