Huge table, optimization opinions welcomed

  • hi there, i am trying to come up with a best model for a social networking site. obviously one would need a relationship table between users. i thought i would create a master relationship table with master_user_id and child_user_id that would hold all my relationships (along with other fields). master_user_id would hold the inviting user and child_user_id would hold the user being invited. this way i can use one table to hold all relationships and at the same time know which user initiated the invitation. problem comes when i have to check if one user is friends with another, i would have to do a query with an OR to allow me to see if these guys are related

    ((master = myuser AND child = otheruser) OR (master = otheruser AND child = myuser)).

    my question to you guys is if this is a best way to go about this simple scenario or i should take some other path like creating an indexed view or perhaps creating two records for each relationship and avoid using an "OR". i am trying to be proactive and create someting that would perform very well.

    assuming that one would have a site with say 10 Million users, and on average each user would have 100 friends, then this table would hold 1 Billion records. would the above approach work well for it or would you recommend a better way?

    thanks for your help

  • Depends on your priorities and concerns. If performance is your priority, then probably dual entries are the way to go. If disk space is your top concern, then a view with unions may be best.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • thanks for your response Barry.

    so there are no other alternitives to go about? somehow i think there is gotta be an easier way, but maybe not...

  • I'd add the user twice, as the parent and child, maybe including a flag if it matters who referred who.

    Then if you update/delete, you can easily find all the rows with two delete statements wrapped in a transaction.

  • If the Parent/Child tags are meant to be significant, then storing two rows kind of defeats that purpose.

    An indexed view might be your best bet.

    create view Friends

    with schemabinding

    as

    select parentid as ID1, childid as ID2

    from dbo.relations

    union all

    select childid, parentid

    from dbo.relations

    go

    create index IDX_Friends on dbo.Friends (id1, id2)

    Something like that should handle it pretty efficiently.

    Another thing to consider on this is that you're going to need triggers to manage the relations if you go with the two rows version. That means more logging of more complex transactions. That means a bigger transaction log. That means the space saved by not having an indexed view (which will take up drive space), might not actually be an issue.

    Another solution, which would make for slightly faster inserts/updates/deletes, but slightly slower selects, would be to use SQL 2005, and use a hierarchy-style CTE for resolving these relations. Less disk space, faster transactions, slightly slower selects. Has the advantage of working out "friend-of-friend" connections more efficiently.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • thanks guys, my main concern is performance i dont really care about space. this table will be heavily used (inserts and selects). if i go with the indexed view that view will have double the amount of rows of my relationship table which in my example would be 2 Billion records. Do you guys think it will perform well?

  • wojo (3/21/2008)


    thanks guys, my main concern is performance i dont really care about space. this table will be heavily used (inserts and selects). if i go with the indexed view that view will have double the amount of rows of my relationship table which in my example would be 2 Billion records. Do you guys think it will perform well?

    Make sure the "master" column is the first column of a clustered index on the table. Then, yes, it should perform well on lookups. Slower on writes, but that is usually acceptable as it is both somewhat expected and much less frequent.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • I just remembered that you can't create an index on a view with a Union operator in it. I'll have to test with Union All (BOL wasn't expicit about that). This means you'd have to create two indexed views and another view to make a Union on them.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • i tried creating clustered index and it complains that it cannot be done because it contains UNION, any suggestions?

    CREATE UNIQUE CLUSTERED INDEX IDX_Friends ON dbo.friends (user_id, friend_id)

  • GSquared (3/21/2008)


    If the Parent/Child tags are meant to be significant, then storing two rows kind of defeats that purpose.

    Easily remedied, G2, just add an attribute column to indicate the direction or intiator of the relationship. You would probably have other attributes anyway (date, etc.).

    Another thing to consider on this is that you're going to need triggers to manage the relations if you go with the two rows version.

    Oh my, no. I would never use triggers for something like this, unless I had no control over the source. Since the OP implies source code control, I woul definitely go with stored procedures.

    That means more logging of more complex transactions. That means a bigger transaction log.

    True, partly. Larger transaction logs is still the space vs. speed issue, it still depends on what the priority is. More complex transactions may mean that the writes are maybe 2x slower in exchange for making the reads up to 100x faster. And the potential slowness of writing twice could be more than offset by the drastic reduction in contention from the much more frequent reads.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • wojo (3/21/2008)


    i tried creating clustered index and it complains that it cannot be done because it contains UNION, any suggestions?

    CREATE UNIQUE CLUSTERED INDEX IDX_Friends ON dbo.friends (user_id, friend_id)

    OK, my bad. I misread your earlier statement and I didn't realize that you were talking about a View.

    If performance is your priority, then you want a single relationship table with all relations dual-entered. Put a clustered index on the table consisting of [font="System"](master, child)[/font] which will probably also be your primary key. For lookups, ONLY lookup the [font="System"]master='user'[/font], do NOT look up the reciprocal records. They will be used when you are looking up the other user as the master.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Another thought: You will want this Friends table to contain copies of the information that your site will need to display their friends list. So it should look something like this:

    Table Friends (

    UserID int,

    FriendUserId int,

    UserIsInitiator bit,

    FriendName varchar(50)

    {friend's other display list info}

    )

    With [font="System"](UserID, FriendUserID)[/font] being a clustered primary key.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Yep, just tested it and you cannot create an indexed view on Union All.

    Testing other solutions now.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • You know, now that I think about it I realize: this is a classic de-normalization exercise.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • hey guys, hope you all had a great easter weekend. i wanted to see if you had any chance to think of a better way to do this, i am not really sure i want to create two views and then combine them to get what i am looking for...

    let me know what you thoughts are, thanks for your help 🙂

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply