Large Object Data

  • Comments posted here are about the content posted at http://www.sqlservercentral.com/columnists/ssabin/2639.asp


    Simon Sabin
    SQL Server MVP

    http://sqlblogcasts.com/blogs/simons

  • Hi Simon,

    The point you are making is absolutely correct, but your example isn't.

    The query you posted would probably use a scan on the IX_Person_DateOfBirth table, followed by a bookmark lookup to get forename and surname. When you write: "So your query would then read 2 or 3 pages", you seem to assume that all rows with DOB in the required range are in the same or adjacent pages. Since the clustered index is on the IDENTITY column, this would be highly coincidental - definitely not the kind of coincidence a DBA would want to rely on!

    Of course, with the LOB data moved out of row, there is indeed a better chance that some of the matching rows happen to be on the same page - though that chance decreases as the table gets bigger (to wit, if the table totals 1,000 rows, the chance is pretty good; if the table totals a few billion rows - well, you get the picture). So you will probably see SOME improvement, but not as much as your article implies.

    The queries that REALLY would take advantage of having more rows per page are the queries that use a (partial or complete) clustered index scan. For instance:

    SELECT Forename, Surname
    FROM   Person
    WHERE  PersonID BETWEEN '623412' AND '624211'

    Assuming no gaps in the identity sequence, this query would have to read 800 + X pages if the LOB is stored in the row, but only 10 + X with the LOB stored out of row, with X being equal to the number of levels in the clustered index (probably 2 or 3, maybe 4 if the table is extreme large).

    Of course, if you often query the data based on a range of birthdates, you'll probably want to change the index on PersonID to nonclustered and IX_Person_DateOfBirth to clustered - and in that case, your example applies exactly as you wrote it.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Point taken. The 80 rows was due to that being the almost maximum that could fit on one page, so the bookmark lookup would only need to read 1 page rather than 80 pages if the avatar was include in row.

    If in your last point about changing the clustered index, a user could end up in the siuation where they filter on dateOfBirth and join to another table on personID i.e

    SELECT Address.AddressLine1 ...

    FROM Address

    JOIN Person on Address.PersonId = Person.PersonId

    WHERE Person.DateOfBirth BETWEEN '1 jan 1973' and '1 Feb 2006'

    The inclusion of the Avatar in row would really screw performance. If you have a clustered index, changing it from the PK is something that should be done with very careful consideration. Especially in SQL 2000 where you can't include columns in an index at the leaf level.

     


    Simon Sabin
    SQL Server MVP

    http://sqlblogcasts.com/blogs/simons

  • Hi guys, in my company is being developed a web application base on SQL Server 2008 and .NET framework 3.5, where it extracts information regarding employees like Name, Age, Position, ExtractedDate and some other data. Recently was needed to include the employee's pictures that it doen't matter the sizes, and I knew a little bit about the FILESTREAM Data type, so I started to investigate regarding it and, I try to implement it, but reading this article I saw that it impacts to the Query performance, so I have the question on how to implement this kind of Data in order to impact less posible the preformance?? sholud I do another table where store only the FILESTREAM Data type and have a FK column from my Employee table in order to use it only in the queries that is necessary?? Or could I have the FILESTREAM Data type in the same Employee table and only create an index??

    this is my Employee table:

    CREATE TABLE [tln].[T_TRABAJADOR](

    [curp] [char](18) NOT NULL,

    [id_rpe] [varchar](5) NOT NULL,

    [nombre] [varchar](100) NOT NULL,

    [edad] [tinyint] NOT NULL,

    [fecha_ingreso_cfe] [smalldatetime] NOT NULL,

    [puesto_titular] [smallint] NOT NULL,

    [id_proceso] [char](2) NOT NULL,

    [id_area] [char](5) NOT NULL,

    [Fecha_Extraccion] [smalldatetime] NOT NULL

    ) ON [PRIMARY]

    ALTER TABLE [tln].[T_TRABAJADOR] ADD CONSTRAINT [PK_T_TRABAJADOR] PRIMARY KEY CLUSTERED

    (

    [curp] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    So I don't know if include the Photo Column in the same table or in another table something like this:

    Opt 1 :

    CREATE TABLE [tln].[T_TRABAJADOR](

    [curp] [char](18) NOT NULL,

    [id_rpe] [varchar](5) NOT NULL,

    [nombre] [varchar](100) NOT NULL,

    [edad] [tinyint] NOT NULL,

    [fecha_ingreso_cfe] [smalldatetime] NOT NULL,

    [puesto_titular] [smallint] NOT NULL,

    [id_proceso] [char](2) NOT NULL,

    [id_area] [char](5) NOT NULL,

    [Fecha_Extraccion] [smalldatetime] NOT NULL,

    [GUID]UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE,

    [Photo]VARBINARY(MAX) FILESTREAM NULL,

    ) ON [PRIMARY]

    Opt 2:

    CREATE TABLE [tln].[T_TRABAJADOR_RECURSOS]

    (

    [GUID]UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE,

    [curp] CHAR (18) NOT NULL FK,

    [Photo]VARBINARY(MAX) FILESTREAM NULL,

    )

    Regards guys 🙂 and I hope u could help me, I'm trying to consume these new features.

    sorry for my bad english :(.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply