Filtered index prerequisites to be used

  • Hello,

    After few videos and  showing how filtered indexes works, what are the prerequisites to be used and so on still I am not able to make my query using it.

    Now, I have table CDPOS which is over 100 milion rows. There is a statement which will run automatically, we are still in deciding process if it is going to run every week or every day. The statement wont change, and it's part of bigger configuration. We want to do for all statements Query Tuning, because this will be the main thing running on this Database. The table probably will be used maximum once a year for something else.

    The whole logic came from this, that the data we need from the table is about 25M rows, and has the exact same value in one of the fields, so we came up with the idea, lets create Filtered Covering index, where the Query will read everything needed from the index, will be faster because will work with 1/4 of the data. Unfortunately after about 6-7 different types of indexes and moving around the fields in the JOIN and Where clause, trying different setups, we aren't still able to make the query using any of the filtered index, always uses one of the other indexes which are on the whole table.

    Here is the query, we are open to change the query, as long as the logic remains, if that's going to help with the performance.

    SELECT "CDPOS"."OBJECTCLAS", "CDPOS"."OBJECTID", "CDPOS"."CHANGENR",  "CDPOS"."MANDANT"
    --SELECT COUNT (*)
    FROM "dbo"."CDPOS"
    JOIN dbo.MM_EKKO_STAGING EKKO
    ONEKKO.MANDT = CDPOS.MANDANT
    AND EKKO.OBJECTCLAS = CDPOS.OBJECTCLAS
    AND EKKO.OBJECTID = CDPOS.OBJECTID

    WHERE 1=1
    AND (EKKO.BSTYP = 'L' OR LEN('L')=0)
    AND EKKO.MANDT = '400'
    AND EKKO.EKKO_YEAR < 2017
    AND CDPOS.OBJECTCLAS = 'EINKBELEG' -- This parametr is about 1/4 from the table

    GROUP BY "CDPOS"."MANDANT", "CDPOS"."OBJECTCLAS", "CDPOS"."OBJECTID", "CDPOS"."CHANGENR"

    -- This whole query will delete the results found in it CDPOS, but we are actaully first testing reading the data, how long is it going to take, so later on it will delete it.

    Here is example of the data from CDPOS, it's typical SAP table, no customizations. Only the data shown here

    OBJECTCLAS OBJECTID CHANGENR MANDANT

    EINKBELEG 0062000000 0002365048 400

    EINKBELEG 0064000000 0000971162 400

    EINKBELEG 0064000001 0000971163 400

    Actually while I am thinking, also the another value that's always the same is MANDANT = 400, but mandant in general is in 98% of the data in the table, and also together with OBJECTCLAS lower the results with less than 0.5%

    Maybe someone will ask, since you are going to delete records, why are you reading "changenr", well we are going to keep data in different tables for statistics and logs which will be used later on.

    Basically the index I was thinking is going to work is:

    CREATE INDEX [IX_CDPOS_Filtered] ON [dbo].[CDPOS] (MANDANT,OBJECTCLAS,OBJECTID) 
    INCLUDE (CHANGENR)
    WHERE OBJECTCLAS = 'EINKBELEG'

    I have clustered index:

    CREATE CLUSTERED INDEX [IX_CLSTR_OBJCLAS_OBJID] ON [dbo].[CDPOS] (OBJECTCLAS,OBJECTID);

    I have tried probably all possible combinations moving around the order of the fields in the index and the query, and does not matter how everything is set, I wasn't able to make the query to use the filtered index. Always uses one general index from the table which was created early on, which is not filtered, not even "include" just normal index on the 3 fields in the JOIN.

    I have tried pushing it using manually the index (WITH (INDEX(name)), seems that with this index is quicker/better, but obviously the SQL server does not think the same way.

    A little bit information about the table, values and so on..

    1. First run of this Query will remove all data older than 2017 - that's about 17 years of data to be deleted. That will be about 600k records.
    2. Together with more similar Queries(7 in total) is to be deletes 1.2M on a first run, just a bit different "where clause",(difference is only on EKKO, CDPOS remains everywhere the same, same JOIN, same Selection of fields.
    3. Second and all next runs will be setup to run and delete data either on daily basis, or weekly basis. that will be about average of 10 000 records per day, some days it might go up to 40k, some times might be about 1-2k.

    I am thinking if there is something that is important to mention, but I believe I have provided everything needed.

    The truth is that we are excepting in the future systems with much much more records in it, and we would like to be prepared on optimize it as much as possible. The quires on those systems will be similar.

    Thank you in advance for your answers

  • I don't have time to dig into this, but there is an error in the query.

    LEN('L') will always equal 1, because that's how long 'L' is.

    You should also remove the double quotes, they make things look confusing and add no value.

    Here is a version which is easier on the eye. Also includes some aliasing

    .

    SELECT cdp.OBJECTCLAS
    ,cdp.OBJECTID
    ,cdp.CHANGENR
    ,cdp.MANDANT
    FROM dbo.CDPOS cdp
    JOIN dbo.MM_EKKO_STAGING eko
    ON eko.MANDT = cdp.MANDANT
    AND eko.OBJECTCLAS = cdp.OBJECTCLAS
    AND eko.OBJECTID = cdp.OBJECTID
    WHERE eko.BSTYP = 'L'
    AND eko.MANDT = '400'
    AND eko.EKKO_YEAR < 2017
    AND cdp.OBJECTCLAS = 'EINKBELEG' -- This parametr is about 1/4 from the table
    GROUP BY cdp.MANDANT
    ,cdp.OBJECTCLAS
    ,cdp.OBJECTID
    ,cdp.CHANGENR;

    • This reply was modified 3 weeks, 5 days ago by  Phil Parkin.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • You should consider heading over to https://www.brentozar.com/pastetheplan/instructions/ and adding your actual execution plan there.

    It would also be helpful if you would include the full DDL for the tables involved. Include PKs, FKs and index definitions in the DDL.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • You've got the correct clustered index on CDPOS, except perhaps that it's not unique.  If adding CHANGENR to the clus index will make it UNIQUE, you should do that.

    Either way, SQL will prefer the clus index rather than a filtered index.  And that will be much more efficient when you do the DELETEs.

    With that you've shown, I don't see any other way to improve the structure of CDPOS or the query.

    You could look at adding a covering index on EKKO if it helps, assuming you don't also need to DELETE from EKKO.  I thought you were only DELETEing from CDPOS, but if also deleting from EKKO, you might consider the clustering index there also.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Phil Parkin wrote:

    I don't have time to dig into this, but there is an error in the query.

    LEN('L') will always equal 1, because that's how long 'L' is.

    You should also remove the double quotes, they make things look confusing and add no value.

    Thank you for the answer. The code is not mine, came from external firm, and i already raised the question why is there. Until I get answer, I won't change the LEN('L') which is wrong as you noticed as well.

    ScottPletcher wrote:

    You've got the correct clustered index on CDPOS, except perhaps that it's not unique.  If adding CHANGENR to the clus index will make it UNIQUE, you should do that.

    Either way, SQL will prefer the clus index rather than a filtered index.  And that will be much more efficient when you do the DELETEs.

    With that you've shown, I don't see any other way to improve the structure of CDPOS or the query.

    You could look at adding a covering index on EKKO if it helps, assuming you don't also need to DELETE from EKKO.  I thought you were only DELETEing from CDPOS, but if also deleting from EKKO, you might consider the clustering index there also.

    We have Clustered index, also in the past were created 2 indexes which are supposed to help this and the other queries from this deletion run. Anyhow those 2 indexes look like that:

    CREATE INDEX [Index_1] ON [dbo].[CDPOS] (MANDANT, OBJECTCLAS) INCLUDE (OBJECTID, CHANGENR, TABNAME, TABKEY, FNAME, CHNGIND);
    CREATE INDEX [Index_2] ON [dbo].[CDPOS] (MANDANT, OBJECTCLAS,OBJECTID) WITH (DATA_COMPRESSION = PAGE);

    Always uses one of those 2 indexes, never uses any of the indexes I did also never the Clustered.

    The First index does not make sense at all for me, because includes fields that are never needed by the query.

    Second one is clear for me, covers the JOIN, even when MANDAT is just 4 different values, and one of them is around 95% of the table... So In my opinion MANDANT never goes first either on the query or the index.

    I have created clustered and also covering index for EKKO (MM_EKKO_STAGING), and there does not seem to be a problem.

    Does not seems to be slow based on the tests without using the filtered index, but still I believe that this can be improved.

    Also that's the first time i thought of using the filtered indexes and now I am kind of disappointed that didn't workout how I was expecting, so I wanna know why, what's the reason and if there is a problem, or just my logic is not correct.

    Thank you all of the answers

  • ScottPletcher wrote:

    You've got the correct clustered index on CDPOS, except perhaps that it's not unique.  If adding CHANGENR to the clus index will make it UNIQUE, you should do that.

    About that, if by Unique you mean that there will be field which has unique value for each record, no CHANGENR is not unique.

     

  • No, if the combination of all three columns would always be unique if CHANGENR were added:

    dbo.CDPOS ( OBJECTCLAS, OBJECTID, CHANGENR )

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Just to add, data manipulation operations (INSERT, UPDATE, and DELETE) on a table need to modify most (all?) indexes on the table. So adding indexes to improve performance of deletes MAY not be the best option, especially with large tables. Plus each index increases the space on disk used by the table/indexes.

    What MAY be better from a performance standpoint would be to archive "old" data if it is rarely looked at and have a view created that would join the archive with the live data. MAY not be an option as it means changes to the application side of things, but it may be beneficial to look at redesigning part of the system if you don't need to look at those 100 million rows and only look at a small percentage of those.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • ScottPletcher wrote:

    No, if the combination of all three columns would always be unique if CHANGENR were added:

    dbo.CDPOS ( OBJECTCLAS, OBJECTID, CHANGENR )

    I see, I didn't know that. I will check to see if it is going to be unique

    Mr. Brian Gale wrote:

    Just to add, data manipulation operations (INSERT, UPDATE, and DELETE) on a table need to modify most (all?) indexes on the table. So adding indexes to improve performance of deletes MAY not be the best option, especially with large tables. Plus each index increases the space on disk used by the table/indexes.

    What MAY be better from a performance standpoint would be to archive "old" data if it is rarely looked at and have a view created that would join the archive with the live data. MAY not be an option as it means changes to the application side of things, but it may be beneficial to look at redesigning part of the system if you don't need to look at those 100 million rows and only look at a small percentage of those.

    The whole application is based on reading old data which will be read probably once or twice a year - For this specific database, the customer didn't read even once the data since we archived it with us.

    Here we are applying Legal Hold, where we have the rules how long the data should be kept into the database, and we are using those rules to delete the rest of the data.

    All changes that will be ever done will be deletion, never change the data or add new.

    Yes, indexes probably will need to be modified, but DBA colleagues have that handled, if I remember correctly they do it by default -> if the index gets fragmented more than 10%, its immediately rebuild. If not, if I remeber correctly, every 2 weeks or every 4 weeks they run script that rebuilds all indexes with I believe one rule - fragmentation 1% or higher.

    So I have no idea how much fragmentation will be when there is deleted 10 000 rows from 95 000 000 rows table. Once we do more tests, we will know.

    Also except the Clustered index, one big index for basic needs(the tables use same fields for joins so one index is usually enough for all needs) and the mentioned filtered index in the post, there probably won't be any other indexes on this table. There are tables with need of more indexes but they are much much smaller tables, which wont impact so much when the indexes are rebuild.

    And last, the space, we are creating the indexes with compression, so at least some space is saved. Also the good thing is to show the data and delete the data, the connection between tables is the same, so often the same indexes will be used for deletion and reading, which is kind of beneficial.

     

  • ScottPletcher wrote:

    You've got the correct clustered index on CDPOS, except perhaps that it's not unique.  If adding CHANGENR to the clus index will make it UNIQUE, you should do that.

    Either way, SQL will prefer the clus index rather than a filtered index.  And that will be much more efficient when you do the DELETEs.

    With that you've shown, I don't see any other way to improve the structure of CDPOS or the query.

    You could look at adding a covering index on EKKO if it helps, assuming you don't also need to DELETE from EKKO.  I thought you were only DELETEing from CDPOS, but if also deleting from EKKO, you might consider the clustering index there also.

    AFAIK there is no prerequisite for a clustered index key to be unique.

    SQLServer will make it unique in the background if it is not declared unique by adding a uniquifier.

    refs:

    "Effective Clustered Indexes" ( January 06, 2011)

    "SQL Server Clustered Indexes internals with examples" ( October 14, 2020 )

    As always: Tell your system what you know !

     

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • The clus index must be unique, and it is a lot of overhead for SQL to force uniqueness, which it must do if you don't.  You then will often also get the overhead of SQL maintaining ghost rows.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply