How to query out duplicate rows (without deleting)

  • Hi - I have a script listed below this message that pulls up many duplicate row of the same name and site. I also included a sample of that.

    Can anyone help me on how I could (not delete) but query just one entry for each tp.login with title and date?

    URL tp.login Title tp.created

    http://clientzone.redblock.com/sites/botoxROSETTA\eileen.frameContributor2/11/09 6:43 PM

    http://clientzone.redblock.com/sites/botoxROSETTA\eileen.frameGuest 2/11/09 6:43 PM

    http://clientzone.redblock.com/sites/botoxROSETTA\eileen.frameReader 2/11/09 6:43 PM

    http://clientzone.redblock.com/sites/botoxROSETTA\Bon.oliverReader 2/12/09 1:49 PM

    http://clientzone.redblock.com/sites/botoxROSETTA\Bon.oliverGuest 2/12/09 1:49 PM

    http://clientzone.redblock.com/sites/botoxROSETTA\Bon.oliverContributor2/12/09 1:49 PM

    http://clientzone.redblock.com/sites/botoxROSETTA\adam.backContributor2/12/09 2:33 PM

    http://clientzone.redblock.com/sites/botoxROSETTA\adam.backGuest 2/12/09 2:33 PM

    http://clientzone.redblock.com/sites/botoxROSETTA\adam.backReader 2/12/09 2:33 PM

    .................

    select

    'http://clientzone.redblock.com/' + w.FullURL as ,

    w.Title,

    ui.tp_login,

    wg.Title,

    ud.tp_Created

    FROM

    UserInfo ui

    inner join WebGroups wg on wg.SiteId = ui.tp_SiteId

    inner join WebGroupmembership wgm on wgm.WebId = wg.WebId and wgm.GroupID = wg.[ID] and ui.tp_ID = wgm.memberID

    inner join Sites s on s.[Id] = ui.tp_SiteID

    inner join Webs w on w.SiteId = ui.tp_SiteID

    inner join UserData ud on ud.tp_SiteID = ui.tp_SiteID and ud.tp_id = ui.tp_ID

    where

    --Remove the Portal Areas and Personal Sites

    ud.tp_Created BETWEEN '2009-01-01' AND '2009-02-28'

    and

    w.FullURL not like '%personal%'

    and

    w.Title not in

    (

    'name of area you dont want',

    'Administration',

    'another area you dont want to list'

    )

    and

    w.FullUrl <> 'MySite'

    order by

    Thanks...

  • Wouldn't

    select DISTINCT yourqueryhere

    remove your duplicates?

  • I guess I could. But where in my current script where that line go? I have tried it there and cannot get the right area to place it.

    Thanks

  • I would assume it goes here...

    select DISTINCT

    'http://clientzone.redblock.com/' + w.FullURL as ,

    w.Title,

    ui.tp_login,

    wg.Title,

    ud.tp_Created

    FROM ...

    This assumes the same values for title/tp_login/title/tp_created. If that is not true you could use max() on those columns...

  • it looks like you want to get duplicate rows, but ignore the Title, right? I'd probably not partition on the date, either, but look at this and see if it is close to what you want.

    using row_number, here's what i get for results:

    URL tp.login Title tp.created

    ------------------------------------------- --------------------- ------------ -----------------------

    http://clientzone.redblock.com/sites/botox ROSETTA\adam.back Contributor 2009-02-12 14:33:00.000

    http://clientzone.redblock.com/sites/botox ROSETTA\Bon.oliver Reader 2009-02-12 13:49:00.000

    http://clientzone.redblock.com/sites/botox ROSETTA\eileen.frame Contributor 2009-02-11 18:43:00.000

    and my code:

    CREATE TABLE #EXAMPLE (

    varchar(1000),

    [tp.login] varchar(50),

    [Title] varchar(100),

    [tp.created] datetime )

    INSERT INTO #EXAMPLE

    SELECT 'http://clientzone.redblock.com/sites/botox','ROSETTA\eileen.frame','Contributor','2/11/09 6:43 PM' UNION ALL

    SELECT 'http://clientzone.redblock.com/sites/botox','ROSETTA\eileen.frame','Guest ','2/11/09 6:43 PM' UNION ALL

    SELECT 'http://clientzone.redblock.com/sites/botox','ROSETTA\eileen.frame','Reader ','2/11/09 6:43 PM' UNION ALL

    SELECT 'http://clientzone.redblock.com/sites/botox','ROSETTA\Bon.oliver ','Reader ','2/12/09 1:49 PM' UNION ALL

    SELECT 'http://clientzone.redblock.com/sites/botox','ROSETTA\Bon.oliver ','Guest ','2/12/09 1:49 PM' UNION ALL

    SELECT 'http://clientzone.redblock.com/sites/botox','ROSETTA\Bon.oliver ','Contributor ','2/12/09 1:49 PM' UNION ALL

    SELECT 'http://clientzone.redblock.com/sites/botox','ROSETTA\adam.back ','Contributor ','2/12/09 2:33 PM' UNION ALL

    SELECT 'http://clientzone.redblock.com/sites/botox','ROSETTA\adam.back ','Guest ','2/12/09 2:33 PM' UNION ALL

    SELECT 'http://clientzone.redblock.com/sites/botox','ROSETTA\adam.back ','Reader ','2/12/09 2:33 PM'

    select ,[tp.login],[Title],[tp.created]

    FROM (SELECT ROW_NUMBER() OVER(PARTITION BY ,[tp.login],[tp.created] ORDER BY ) AS RW,

    ,[tp.login],[Title],[tp.created]

    FROM #EXAMPLE

    )MyAlias

    WHERe RW = 1

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Your right.. Your results is what I'm looking for. But I have a question. I have 65736 rows of data.

    Do I have to enter the

    INSERT INTO #EXAMPLE

    SELECT

    for all of them to get the results you came up?

    Thanks,

  • basically, you'd replace #EXAMPLE with the query(in parenthesis) you were going to use; without all the CREATE TABLE info, there;s nothing i could test based on the sql you posted; remember, it's a best practice to give us everything we need, similar to my post: CREATE TABLE, INSERT INTO, any sqls you've tried.

    here's my best guess, untested, based on the query you pasted:

    SELECT ,

    [Title],

    [tp_login],

    [Title2],

    [tp_Created]

    FROM (

    SELECT ROW_NUMBER() OVER(PARTITION BY 'http://clientzone.redblock.com/' + w.FullURL,[ui.tp_login],[ud.tp_Created] ORDER BY 'http://clientzone.redblock.com/' + w.FullURL) AS RW,

    'http://clientzone.redblock.com/' + w.FullURL as ,

    w.Title,

    ui.tp_login,

    wg.Title AS Title2,

    ud.tp_Created

    FROM UserInfo ui

    inner join WebGroups wg on wg.SiteId = ui.tp_SiteId

    inner join WebGroupmembership wgm on wgm.WebId = wg.WebId and wgm.GroupID = wg.[ID] and ui.tp_ID = wgm.memberID

    inner join Sites s on s.[Id] = ui.tp_SiteID

    inner join Webs w on w.SiteId = ui.tp_SiteID

    inner join UserData ud on ud.tp_SiteID = ui.tp_SiteID and ud.tp_id = ui.tp_ID

    --Remove the Portal Areas and Personal Sites

    where ud.tp_Created BETWEEN '2009-01-01' AND '2009-02-28'

    and w.FullURL not like '%personal%'

    and w.Title not in

    (

    'name of area you dont want',

    'Administration',

    'another area you dont want to list'

    )

    and w.FullUrl <> 'MySite' ) MyAlias

    WHERE RW = 1

    ORDER BY

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Some things I found out is that my compatibility mode is set to 80. So I went on the server that has SQL 2000 install and ran from query analyzer

    SELECT ,

    [Title],

    [tp_login],

    [Title2],

    [tp_Created]

    FROM (

    SELECT ROW_NUMBER() OVER(PARTITION BY 'http://clientzone.brulant.com/' + w.FullURL,[ui.tp_login],[ud.tp_Created] ORDER BY 'http://clientzone.brulant.com/' + w.FullURL) AS RW,

    'http://clientzone.brulant.com/' + w.FullURL as ,

    w.Title,

    ui.tp_login,

    wg.Title AS Title2,

    ud.tp_Created

    FROM UserInfo ui

    inner join WebGroups wg on wg.SiteId = ui.tp_SiteId

    inner join WebGroupmembership wgm on wgm.WebId = wg.WebId and wgm.GroupID = wg.[ID] and ui.tp_ID = wgm.memberID

    inner join Sites s on s.[Id] = ui.tp_SiteID

    inner join Webs w on w.SiteId = ui.tp_SiteID

    inner join UserData ud on ud.tp_SiteID = ui.tp_SiteID and ud.tp_id = ui.tp_ID

    --Remove the Portal Areas and Personal Sites

    where ud.tp_Created BETWEEN '2009-01-01' AND '2009-02-28'

    and w.FullURL not like '%personal%'

    and w.Title not in

    (

    'name of area you dont want',

    'Administration',

    'another area you dont want to list'

    )

    and w.FullUrl <> 'MySite' ) MyAlias

    WHERE RW = 1

    ORDER BY

    I received the following error message.

    Server: Msg 195, Level 15, State 10, Line 7

    'ROW_NUMBER' is not a recognized function name.

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply