How to Get The Most Filled Records from Similar Records

  • How to Get The Most Filled Records from Similar Records

    I have table called "Locations" and it contains 4 Columns "Neighborhood,City,Governorate,Country"

    Neighborhood, City, Governorate, Country

    , , , Egypt

    , , Alexandria, Egypt

    , East, Alexandria, Egypt

    , West, Alexandria, Egypt

    MoharemBek, West, Alexandria, Egypt

    MoharemBek, North, Alexandria, Egypt

    , , Cairo, Egypt

    , , , Morocco

    Here is the first Record filled only with Country Value and nothing else ...The Next Record Contains 2 filled Columns Governorate and Country and the country value the same as The First Record ...so here the winner will be The Second Record and we shall ignore First Record.

    The Same On the Third Record ..Contains 3 filled Columns City, Governorate and Country ... and Governorate the same as the second record and Country the same as First and Second ...so here the winner will be the third record and we shall ignore the first 2 records .

    The Output I wish to get it :

    , East, Alexandria, Egypt

    MoharemBek, West, Alexandria, Egypt

    MoharemBek, North, Alexandria, Egypt

    , , Cairo, Egypt

    , , , Morocco

  • Something like this to get you started perhaps?

    WITH SampleData AS (

    SELECT ID=ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    ,MyString=REPLACE(MyString, ', ', '.')

    FROM (

    -- Neighborhood, City, Governorate, Country

    SELECT ', , , Egypt'

    UNION ALL SELECT ', , Alexandria, Egypt'

    UNION ALL SELECT ', East, Alexandria, Egypt'

    UNION ALL SELECT ', West, Alexandria, Egypt'

    UNION ALL SELECT 'MoharemBek, West, Alexandria, Egypt'

    UNION ALL SELECT 'MoharemBek, North, Alexandria, Egypt'

    UNION ALL SELECT ', , Cairo, Egypt'

    UNION ALL SELECT ', , , Morocco') a(MyString))

    ,Split2Cols AS (

    SELECT ID

    ,Neighborhood=PARSENAME(MyString, 4)

    ,Governorate=PARSENAME(MyString, 3)

    ,City=PARSENAME(MyString, 2)

    ,Country=PARSENAME(MyString, 1)

    FROM SampleData)

    SELECT ID

    ,Neighborhood=CASE WHEN Neighborhood IS NULL

    THEN (

    SELECT TOP 1 Neighborhood

    FROM Split2Cols b

    WHERE Neighborhood IS NOT NULL AND a.Country = b.Country AND

    a.City = b.City AND a.Governorate = b.Governorate

    ORDER BY ID)

    ELSE Neighborhood END

    ,Governorate=CASE WHEN Governorate IS NULL

    THEN (

    SELECT TOP 1 Governorate

    FROM Split2Cols b

    WHERE Governorate IS NOT NULL AND a.Country = b.Country AND

    a.City = b.City

    ORDER BY ID)

    ELSE Governorate END

    ,City=CASE WHEN City IS NULL

    THEN (

    SELECT TOP 1 City

    FROM Split2Cols b

    WHERE City IS NOT NULL AND a.Country = b.Country

    ORDER BY ID)

    ELSE City END

    ,Country

    FROM Split2Cols a

    You just need to now combine the four columns returned to get a single comma delimited string by concatenation.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Thanks Dwain

    I think with that modification it can give me what I want

    DECLARE @Temptbl TABLE( Neighborhood nvarchar(50),Governorate nvarchar(50),City nvarchar(50),Country nvarchar(50));

    WITH SampleData AS (

    SELECT ID=ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    ,MyString=REPLACE(MyString, ', ', '.')

    FROM (

    -- Neighborhood, City, Governorate, Country

    SELECT ', , , Egypt'

    UNION ALL SELECT ', , Alexandria, Egypt'

    UNION ALL SELECT ', East, Alexandria, Egypt'

    UNION ALL SELECT ', West, Alexandria, Egypt'

    UNION ALL SELECT 'MoharemBek, West, Alexandria, Egypt'

    UNION ALL SELECT 'MoharemBek, North, Alexandria, Egypt'

    UNION ALL SELECT ', , Cairo, Egypt'

    UNION ALL SELECT ', , , Morocco') a(MyString))

    ,Split2Cols AS (

    SELECT ID

    ,Neighborhood=PARSENAME(MyString, 4)

    ,Governorate=PARSENAME(MyString, 3)

    ,City=PARSENAME(MyString, 2)

    ,Country=PARSENAME(MyString, 1)

    FROM SampleData)

    INSERT INTO @Temptbl

    (Neighborhood,Governorate,City,Country)

    (

    SELECT

    Neighborhood=CASE WHEN Neighborhood IS NULL

    THEN (

    SELECT TOP 1 Neighborhood

    FROM Split2Cols b

    WHERE (Neighborhood IS NOT NULL AND a.Country = b.Country AND

    a.City = b.City AND a.Governorate = b.Governorate)

    OR

    (Neighborhood IS NOT NULL AND a.Country = b.Country AND

    a.City = b.City)

    OR

    (Neighborhood IS NOT NULL AND a.Country = b.Country)

    ORDER BY ID)

    ELSE Neighborhood END

    ,Governorate=CASE WHEN Governorate IS NULL

    THEN (

    SELECT TOP 1 Governorate

    WHERE (Governorate IS NOT NULL AND a.Country = b.Country AND

    a.City = b.City)

    OR

    (Governorate IS NOT NULL AND a.Country = b.Country AND

    a.City = b.City)

    OR

    (Governorate IS NOT NULL AND a.Country = b.Country)

    ORDER BY ID)

    ELSE Governorate END

    ,City=CASE WHEN City IS NULL

    THEN (

    SELECT TOP 1 City

    FROM Split2Cols b

    WHERE City IS NOT NULL AND a.Country = b.Country

    ORDER BY ID)

    ELSE City END

    ,Country

    FROM Split2Cols a

    )

    SELECT Neighborhood,Governorate,City,Country

    FROM @Temptbl

    GROUP BY Neighborhood,Governorate,City,Country

    but The Question now .. In the first Column Part "Neighborhood" Condition

    I wrote that

    WHERE (Neighborhood IS NOT NULL AND a.Country = b.Country AND

    a.City = b.City AND a.Governorate = b.Governorate)

    OR

    (Neighborhood IS NOT NULL AND a.Country = b.Country AND

    a.City = b.City)

    OR

    (Neighborhood IS NOT NULL AND a.Country = b.Country)

    ORDER BY ID)

    Which i think not totally correct ..

    I want to handle it in that way

    IF (Neighborhood IS NOT NULL AND a.Country = b.Country AND

    a.City = b.City AND a.Governorate = b.Governorate)

    Record Count equal zero use the next part which is

    (Neighborhood IS NOT NULL AND a.Country = b.Country AND

    a.City = b.City)

    And IF That return no record use that part

    (Neighborhood IS NOT NULL AND a.Country = b.Country)

    ORDER BY ID)

    I want To Get The Top 1 Neighborhood Value

    If The Governorate= Governorate and If City = City and If Country = Country

    If the Above Condition not return anything so use the next filter part

    If City = City and If Country = Country

    If The Above Condition not return anything so use the next filter part

    If Country = Country

    ---Reduction Conditions If Not Return Records ----

  • [EDIT: Mickey's solution in a post below is MUCH better. Even though this solution "worked," it's admittedly convoluted. Mickey's method also avoids use of a cursor which is almost always better. Don't use this...use the code below. :blush: ]

    I tried but couldn't get the last part of this without using a cursor to do a final comparison of rows. But this works at least with the sample data Dwain set up to copy your original list of locations.

    I don't know if this is any better than what you did above or not, but you can give it a try. I think with some more work it could be trimmed down perhaps and the cursor replaced with a CTE or Tally table join somehow. Right now for me it's very late and way past bedtime. 🙂

    Sample data

    IF OBJECT_ID('tempdb..#TempTable') IS NOT NULL

    DROP TABLE #TempTable

    CREATE TABLE #TempTable (

    [ID] INT IDENTITY(1,1) NOT NULL,

    [Jurisdictions] NVARCHAR(500) NULL,

    PRIMARY KEY (ID))

    IF OBJECT_ID('tempdb..#ResultsTable') IS NOT NULL

    DROP TABLE #ResultsTable

    CREATE TABLE #ResultsTable (

    [ID] INT IDENTITY(1,1) NOT NULL,

    [Jurisdictions] NVARCHAR(500) NULL,

    [ItemNumber] INT NULL,

    [NumElements] INT NULL,

    PRIMARY KEY (ID))

    ;WITH SampleData AS

    (

    SELECT

    Jurisdictions

    FROM

    (

    -- Neighborhood, City, Governorate, Country

    SELECT ',,,Egypt' AS Jurisdictions

    UNION ALL SELECT ',,Alexandria,Egypt'

    UNION ALL SELECT ',East,Alexandria,Egypt'

    UNION ALL SELECT ',West,Alexandria,Egypt'

    UNION ALL SELECT 'MoharemBek,West,Alexandria,Egypt'

    UNION ALL SELECT 'MoharemBek,North,Alexandria,Egypt'

    UNION ALL SELECT ',,Cairo,Egypt'

    UNION ALL SELECT ',,,Morocco'

    ) a

    )

    INSERT INTO #TempTable

    SELECT

    Jurisdictions

    FROM

    SampleData

    Now some script to parse the data and make up a "candidate" list.

    INSERT INTO #ResultsTable

    SELECT

    r1.Jurisdiction

    ,r1.ItemNumber

    ,r1.NumElements

    FROM

    (

    SELECT DISTINCT

    (CASE

    WHEN CHARINDEX(',,,',dsk2.Item,1) = 1

    THEN STUFF((dsk2.Item),1,3,'')

    WHEN CHARINDEX(',,',dsk2.Item,1) = 1

    THEN STUFF((dsk2.Item),1,2,'')

    WHEN CHARINDEX(',',dsk2.Item,1) = 1

    THEN STUFF((dsk2.Item),1,1,'')

    ELSE dsk2.Item

    END) AS Jurisdiction

    ,dsk3.ItemNumber

    ,(SELECT

    COUNT(*)

    FROM dbo.DelimitedSplit8K(

    STUFF((REVERSE(REVERSE(REPLACE(dsk2.Item,',,','')))),1,1,'')

    ,',')

    ) AS NumElements

    FROM

    (

    SELECT DISTINCT

    STUFF((

    SELECT

    '|'+CAST(ID AS VARCHAR)+'~'+CAST(Jurisdictions AS VARCHAR(50))

    FROM

    #TempTable

    FOR XML PATH('')

    ),1,1,'') AS List

    FROM

    #TempTable t

    ) r

    CROSS APPLY

    dbo.DelimitedSplit8K(r.List,'~') dsk1

    CROSS APPLY

    dbo.DelimitedSplit8K(dsk1.Item,'|') dsk2

    CROSS APPLY

    dbo.DelimitedSplit8K(dsk2.Item,',') dsk3

    WHERE

    1=1

    AND dsk2.ItemNumber = 1

    AND dsk2.Item <> '1'

    ) r1

    WHERE

    r1.ItemNumber > 0

    ORDER BY

    r1.NumElements --DESC

    ,r1.ItemNumber DESC

    ,r1.Jurisdiction

    Finally, check the resultant rows against the other rows to remove those

    that are substrings of another row.

    DECLARE

    @Jur VARCHAR(50)

    ,@Ele INT

    ,@Cnt INT

    DECLARE UpdateList CURSOR

    FOR

    SELECT DISTINCT

    rt2.Jurisdictions

    ,rt2.NumElements

    FROM

    #ResultsTable AS rt1

    INNER JOIN

    #ResultsTable AS rt2

    ON rt2.ID = rt1.ID + 1

    WHERE

    rt2.ItemNumber = 1

    OPEN UpdateList

    FETCH NEXT FROM UpdateList

    INTO @Jur,@Ele

    WHILE @@FETCH_STATUS = 0

    BEGIN

    SET @Cnt = 0

    SELECT

    @Cnt = COUNT(DISTINCT rt2.Jurisdictions)

    FROM

    #ResultsTable AS rt1

    INNER JOIN

    #ResultsTable AS rt2

    ON rt2.ID = rt1.ID + 1

    WHERE

    rt2.ItemNumber = 1

    AND rt2.NumElements > @Ele

    AND @Jur = RIGHT(rt2.Jurisdictions,LEN(@Jur))

    IF @Cnt > 0

    BEGIN

    DELETE FROM #ResultsTable

    WHERE Jurisdictions = @Jur

    END

    FETCH NEXT FROM UpdateList

    INTO @Jur,@Ele

    END

    CLOSE UpdateList

    DEALLOCATE UpdateList

    SELECT DISTINCT

    MIN(ID) OVER (PARTITION BY Jurisdictions) AS ID

    ,j.Jurisdictions

    FROM

    (

    SELECT DISTINCT

    ROW_NUMBER() OVER (ORDER BY ID DESC,Jurisdictions) AS ID

    ,Jurisdictions

    FROM

    #ResultsTable

    ) j

    The output:

    IDJurisdictions

    1MoharemBek,West,Alexandria,Egypt

    2MoharemBek,North,Alexandria,Egypt

    9East,Alexandria,Egypt

    13Cairo,Egypt

    17Morocco

     

  • Hi

    This seems to work, but would need some serious testing I think.

    ;WITH data as (

    SELECT * FROM (VALUES

    (null,null ,null , 'Egypt' )

    ,(null,null , 'Alexandria', 'Egypt')

    ,(null, 'East', 'Alexandria', 'Egypt')

    ,(null, 'West', 'Alexandria', 'Egypt')

    ,('MoharemBek', 'West', 'Alexandria', 'Egypt')

    ,('MoharemBek', 'North', 'Alexandria', 'Egypt')

    ,(null,null , 'Cairo', 'Egypt')

    ,(null,null ,null , 'Morocco')

    ) AS A(Neighbourhood, City, Governorate, Country)

    )

    select *

    from (

    select *

    ,RANK() OVER (

    PARTITION BY City

    ORDER BY Neighbourhood desc, City desc, Governorate desc, Country desc

    ) CityRank

    ,RANK() OVER (

    PARTITION BY Governorate

    ORDER BY Neighbourhood desc, City desc, Governorate desc, Country desc

    ) GovernateRank

    ,RANK() OVER (

    PARTITION BY Country

    ORDER BY Neighbourhood desc, City desc, Governorate desc, Country desc

    ) CountryRank

    from data

    ) a

    where CountryRank = 1 or GovernateRank = 1 or CityRank = 1

  • Mickey...that sure beats my solution by a mile. I don't know why I didn't split the data into columns right from the beginning instead of trying to parse the strings on the fly. Doh! I thought about the possibilities with the RANK function but with my (poor) approach I could see how to make that work.

    Good job and something for my toolbox!

     

  • Steven Willis (3/14/2013)


    Mickey...that sure beats my solution by a mile. I don't know why I didn't split the data into columns right from the beginning instead of trying to parse the strings on the fly. Doh! I thought about the possibilities with the RANK function but with my (poor) approach I could see how to make that work.

    Good job and something for my toolbox!

     

    Thanks ... took a bit of experimenting to get the result. Originally looked at it trying to use a single rank. It wasn't until I was tried various options in a single query that I saw the pattern

  • Thanks a lot Micky .. Less Code ..Amazing + Perfect +That is what i want (y)

    https://www.simple-talk.com/sql/learn-sql-server/window-functions-in-sql-server/

  • ahmedhussein874 (3/14/2013)


    Thanks a lot Micky .. Less Code ..Amazing + Perfect +That is what i want (y)

    You're welcome

  • Out of the game last night, so sorry I couldn't contribute more. But it appears MickyT and Steven Willis came to the rescue / problem solved.

    I did say my offering was just to get you started. 🙂


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply