Optimizing IF EXISTS

  • I'm in the process of optimizing an application that aggregates news from RSS feeds. I need to check if an item already exists and if not, insert it.

    I have a table "News" which contains more than 1,3 million rows. There's a console application that runs at set intervals to aggregate the news, but I have a feeling there's a bottleneck in my insert statement. I use the following stored procedure to check if the record already exists:

    ALTER PROCEDURE [dbo].[sp_InsertNews]

    (

    @Title varchar(250),

    @Description varchar(2000),

    @Link varchar(250),

    @FeedId int

    )

    AS

    SET NOCOUNT ON;

    IF NOT EXISTS(SELECT Title, FeedId FROM News WHERE FeedId = @FeedId AND Title = @Title)

    INSERT INTO News

    (Title, Description, Link, Title)

    VALUES (@Title,@Description,@Link,@FeedId)

    I somehow have the idea this query can be more effecient, because basically on every insert it needs to check if the @Kop variable exists and comparing it against 1,3 million rows.

    I am a bit confused here. Does the query limit the records to compare first, by limiting it to the FeedId? If so, would it be wise to also limit it for example to the records inserted the last day? Or am I not correct in the way the execution plan works?

    Thanks in advance for your replies!

  • There is nothing wrong with what you have showed us, but I don't think that you have showed us everything. The IF..Select.. will perform fine if you have an appropiate index on FeedId and Title. I suspect that the real problem is that you're inserting records one at a time, when you should be using Set-based Inserts.

    As for this question:

    Rino (6/29/2008)


    I somehow have the idea this query can be more effecient, because basically on every insert it needs to check if the @Kop variable exists and comparing it against 1,3 million rows.

    This just leads me to believe that you are not showing us everything, because there is no @Kop variable in what you've showed us.

    I encourage you to show us the rest of this procedure and to show us the calling procedure. We can only help you with what you show us.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Thanks for your reply!

    You are right I am inserting records one at a time. I accidentaly mentioned the @Kop variable, because I 'translated' some of the variables from Dutch. Anyway, here's the full stored procedure I am using and the VB.Net code I am calling the sp with. - without the 'translation' 🙂 -

    The stored procedure:

    ALTER PROCEDURE [dbo].[sp_InsertNews]

    (

    @Kop varchar(250),

    @Omschrijving varchar(2000),

    @Link varchar(250),

    @BronId int

    )

    AS

    SET NOCOUNT ON;

    IF NOT EXISTS(SELECT NieuwsKop, BronId FROM tblNieuws WHERE BronId = @BronId AND NieuwsKop = @Kop)

    INSERT INTO tblNieuws

    (NieuwsKop, NieuwsOmschrijving, NieuwsLink, BronId)

    VALUES (@Kop,@Omschrijving,@Link,@BronId)

    And here's the console application which basically loops through all the rows with the feed urls and then checks if there are new records:

    Imports AggregateNL.dsNL

    Imports AggregateNL.dsNLTableAdapters

    Module AggregateNL

    Dim _adapterNieuws As New NieuwsTableAdapter

    Dim _adapterBronnen As New BronnenTableAdapter

    Dim dtBronnen As BronnenDataTable

    Dim stpWatchInfo As New Stopwatch()

    Private m_feedXML As XDocument

    Public Property feedXML() As XDocument

    Get

    Return m_feedXML

    End Get

    Set(ByVal value As XDocument)

    m_feedXML = value

    End Set

    End Property

    Private m_feeds As Collections.IEnumerable

    Public Property feeds() As Collections.IEnumerable

    Get

    Return m_feeds

    End Get

    Set(ByVal value As Collections.IEnumerable)

    m_feeds = value

    End Set

    End Property

    Sub Main()

    stpWatchInfo.Start()

    GetFeeds()

    End Sub

    Sub GetFeeds()

    dtBronnen = _adapterBronnen.GetData()

    For Each dr As BronnenRow In dtBronnen.Rows

    Try

    feedXML = Xdocument.Load(dr.BronRssFeed)

    feeds = From feed In feedXML.Descendants("item") _

    Select title = feed.Element("title").Value, _

    link = feed.Element("link").Value, _

    description = feed.Element("description").Value

    InsertNews(dr.BronId)

    Catch ex As Exception

    Console.WriteLine("Ongeldige feed: " & dr.BronId)

    End Try

    Next dr

    ' Count execution time

    stpWatchInfo.Stop()

    Dim ts As TimeSpan = stpWatchInfo.Elapsed

    Dim elapsedTime As String = String.Format("{0:00}:{1:00}:{2:00}.{3:00}", ts.Hours, ts.Minutes, ts.Seconds, ts.Milliseconds / 10)

    Console.WriteLine("Execution time: " & elapsedTime)

    End Sub

    Sub InsertNews(ByVal BronId As Integer)

    Try

    For Each item In feeds

    Dim myTitle As String = item.title.ToString()

    Dim myDescription As String = item.description.ToString()

    Dim myLink As String = item.link.ToString()

    _adapterNieuws.InsertNews(myTitle, myDescription, myLink, BronId)

    Next item

    Catch ex As Exception

    Console.WriteLine("Fout: " & ex.Message & " - BronId: " & BronId)

    End Try

    End Sub

    End Module

  • First, let me say that I am not a programmer and don't really know VB.Net that well. With that said, I don't see anything wrong with the T-SQL that you have, other than the fact that you have no error checking or validation being performed.

    However, there is one thing you can do - but, it will not make any difference on the T-SQL side.

    IF NOT EXISTS(SELECT NieuwsKop, BronId FROM tblNieuws WHERE BronId = @BronId AND NieuwsKop = @Kop)

    INSERT INTO tblNieuws

    (NieuwsKop, NieuwsOmschrijving, NieuwsLink, BronId)

    VALUES (@Kop,@Omschrijving,@Link,@BronId)

    In the EXISTS - there is no need to specify the columns. It can be changed to:

    IF NOT EXISTS(SELECT * FROM tblNieuws WHERE BronId = @BronId AND NieuwsKop = @Kop)

    INSERT INTO tblNieuws (NieuwsKop, NieuwsOmschrijving, NieuwsLink, BronId)

    VALUES (@Kop,@Omschrijving,@Link,@BronId);

    Where I see a possible problem in the VB.Net code (remember, I am not a programmer) is that you are using a TableAdapter to perform the insert. I am fairly certain that this is overkill for calling a stored procedure.

    Also, when calling the procedure InsertNews you only pass into the procedure dr.BronId. But, you also use feeds - probably should be passing that into the procedure also.

    I would build this as SQL text (using SqlCommand???), build up multiple insert statements and batch those inserts to SQL. This could be batched in the InsertNews procedure based upon all of the information in feeds (maybe smaller all depends upon how many items exist in feeds).

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • I would rebuild the solution to take better advatage of SQL set-based logic. I'd dump all of the feed data into a separate table from the main table, then do a single insert from that table, using the Except structure. That would probably be faster, overall.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thanks guys for the advice on this matter...

    I think I need to move everything to SQL Server. This may be an interesting learning experience for that: http://www.simple-talk.com/sql/t-sql-programming/rss-newsfeed-workbench/

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply