Tunning or alternative to PIVOT

  • Hi everyone,

    we have a table containing kind of key-value pairs, which are the columns of some entities. This is how master data is ingested in the platform.

    Then a query does some joins, filtering and finally a PIVOT to only output a row by entity, with all their columns denormalized.

    The query plan looks like this:

    query plan pivot

    The Sort operator seems to consumed too much.

    We have the default MAXDOP value (8).

    I would like to get some suggestions on how to troubleshoot and tune this query. The database resources are most of the time underutilize. I guess the maxdop value is related to lower cpu consumption, if no other processes are using the database frequently. From the other side, the most common wait type I've seen is "parallelism". I need to investigate which type of wait and the total wait duration in an execution, but increasing maxdop could worsen this wait, right?

    Any guidance will be appreciated.

    Paul Hernández
  • Since you posted a graphic of the execution plan instead of an actual execution plan and you post no meta-data for the table and you posted no code, we're going to have a really tough time helping.  My suggest would be to never used the PIVOT operator and always use a good, ol' fashioned, CROSSTAB to do this.  Here's a link to that very old bit of arcane knowledge.

    https://www.sqlservercentral.com/articles/cross-tabs-and-pivots-part-1-converting-rows-to-columns-1

    Second, read the article at the 2nd link in my signature line below for what to and how to post for performance issues.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • and adding to this - parallelism waits are NOT bad per se - while it may be a pointer for you to look at the queries that go parallel, in many cases that is the desired state, and changing them to go serial will just make them go slower for no good reason. Have a read of https://www.sqlskills.com/blogs/paul/wait-statistics-or-please-tell-me-where-it-hurts/ before actively trying to remove these.

    one reason to look for parallel queries is to identify cases where the query required different or new indexes to be added to the underlying tables - or even a rewrite of the code - in many many cases these 2 aspects, in isolation or together, reduce the need for parallel processing and speed up the overall execution of the queries.

     

    changing to a crosstab will likely still result in parallelism being used - may or not be faster (I've seen it go both ways).

  • Are you able to post the query itself?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Hi everyone,

    thanks for your support.

    I'm working with a customer and I requested the actual execution plan, I can share it once I get it.

    I have a query but not the DDLs of the tables, but I still have some related data

    Query:

    SELECT  "isin","WrapperID","LongName","ShortName","WKNBorder","TickerSymbolGermany","ProductGroup","QuotationUnit","SafeCustodyAlternatives","InstrumentTypeAdd1","LEI","UnderlyingIndexMultiplier","InstrumentTypeAdd3","InstrumentType","BaseExchange","SuperordinateIssuer","IssuerNumber","InstrStatusReason","RegCountryISO","UnitType","EvidenceNo","SubSegm","SubSegmOptionRight","IssueDate","CCPEligibilityFrom","CCPEligibilityTo","RedemptionDateAfterCalling","IntPaymentMonth","IntPaymentDate","InterestPaymentBegin","FirstCouponDate","FlagInterest","BaseInterestRate","InterestRateValidUntil","InterestPayDatePart","AssetEligibiltyEurosystem","InflatIndex","IntersetAccural","LastCouponDate","DetachmentDays","CurrentInterestPeriodFrom","CurrentInterestPeriodTo","IntPaymentmethod","SpecificsInterestsDividend","IntPaymentperiod","InterestRatePerCent","PeriodFirstCoupon","PeriodLastCoupon","PeriodSecondCoupon","LastRedemptionDate","GS039","PoolFactor","MasterDataAssetClassificationsWKNClassification","MasterDataCouponRelatedDataIrregularCoupons","TechnicalInterestPaymentDate","InstrumentTypeAdd2","InterestPaymentMethod","RedemptionType","RegCountryStatBA","UnderlIDCorpActISIN","IDCorpActISIN","esma_Liquidity_flag","esma_minimum_qualifying_size","esma_pre_trade_large_in_scale_threshold","esma_adnt" FROM (
    SELECT
    isin,
    [MasterData.Indices.WrapperID] WrapperID,
    [MasterData.InstrumentNames.LongName] LongName,
    [MasterData.InstrumentNames.ShortName] ShortName,
    [MasterData.InstrumentIdentifier.WKNBorder] WKNBorder,
    [MasterData.ExchangeRelatedInformation.TickerSymbolGermany] TickerSymbolGermany,
    [MasterData.AssetClassifications.ProductGroup] ProductGroup,
    [MasterData.CurrencyUnit.QuotationUnit] QuotationUnit,
    [MasterData.Custody.SafeCustodyAlternatives] SafeCustodyAlternatives,
    [MasterData.AssetClassifications.InstrumentTypeAdd1] InstrumentTypeAdd1,
    [MasterData.InstrumentIdentifier.LEI] LEI,
    [MasterData.Indices.UnderlyingIndexMultiplier] UnderlyingIndexMultiplier,
    [MasterData.AssetClassifications.InstrumentTypeAdd3] InstrumentTypeAdd3,
    [MasterData.AssetClassifications.InstrumentType] InstrumentType,
    [MasterData.ExchangeRelatedInformation.BaseExchange] BaseExchange,
    [MasterData.InstrumentIdentifier.SuperordinateIssuer] SuperordinateIssuer,
    [MasterData.InstrumentIdentifier.IssuerNumber] IssuerNumber,
    [MasterData.InstrumentStatus.InstrStatusReason] InstrStatusReason,
    [MasterData.IssuerRelatedInstrumentData.RegCountryISO] RegCountryISO,
    [MasterData.Custody.PrintedCertificateType] UnitType,
    [MasterData.Custody.BookEntrySecCertificationType] EvidenceNo,
    [MasterData.Indices.OptionRightType] SubSegm,
    [MasterData.Indices.OptionRightSubSegment] SubSegmOptionRight,
    [MasterData.IssuanceInformation.IssueDate] IssueDate,
    [MasterData.CCPEligitibility.CCPEligibilityFrom] CCPEligibilityFrom,
    [MasterData.CCPEligitibility.CCPEligibilityTo] CCPEligibilityTo,
    [MasterData.RepaymentMaturityResubmission.RedemptionDateAfterCalling] RedemptionDateAfterCalling,
    [MasterData.CouponRelatedData.InterestPaymMonthsProsp] IntPaymentMonth,
    [MasterData.CouponRelatedData.InterestPaymDayInMonthProsp] IntPaymentDate,
    [MasterData.KeyDatesBonds.InterestPaymentBegin] InterestPaymentBegin,
    [MasterData.KeyDatesBonds.FirstCouponDate] FirstCouponDate,
    [MasterData.BasicInformationInterest.InterestType] FlagInterest,
    [MasterData.ReferenceInterestRate.BaseInterestRate] BaseInterestRate,
    [MasterData.BasicInformationInterest.InterestRateValidUntil] InterestRateValidUntil,
    [MasterData.BasicInformationInterest.InterestPayDateParticularities] InterestPayDatePart,
    [MasterData.EligibilityCentralBank.AssetEligibiltyEurosystem] AssetEligibiltyEurosystem,
    [MasterData.InflationLinkedInformation.InflatIndexOrInterestCapital] InflatIndex,
    [MasterData.BasicInformationInterest.InterestAccrualParticularities] IntersetAccural ,
    [MasterData.KeyDatesBonds.LastCouponDate] LastCouponDate,
    [MasterData.BasicInformationInterest.DetachmentDays] DetachmentDays,
    [MasterData.BasicInformationInterest.CurrentInterestPeriodFrom] CurrentInterestPeriodFrom,
    [MasterData.BasicInformationInterest.CurrentInterestPeriodTo] CurrentInterestPeriodTo,
    [MasterData.BasicInformationInterest.DayCountConvention] IntPaymentmethod,
    [MasterData.AppropriationEarnings.SpecificsInterestsDividend] SpecificsInterestsDividend,
    [MasterData.CouponRelatedData.InterestPaymPeriod] IntPaymentperiod,
    [MasterData.BasicInformationInterest.InterestRatePerCent] InterestRatePerCent,
    [WM-GS035] PeriodFirstCoupon,
    [WM-GS001] PeriodLastCoupon,
    [WM-GS030] PeriodSecondCoupon,
    [MasterData.RepaymentMaturityResubmission.LastRedemptionDate] LastRedemptionDate,
    [WM-GS039] GS039,
    [MasterData.RepaymentBasicInformation.PoolFactor] PoolFactor,
    [MasterData.AssetClassifications.WKNClassification] MasterDataAssetClassificationsWKNClassification,
    [MasterData.CouponRelatedData.IrregularCoupons] MasterDataCouponRelatedDataIrregularCoupons,
    [MasterData.AdditionsG.TechnicalInterestPaymentDate] TechnicalInterestPaymentDate,
    [MasterData.AssetClassifications.InstrumentTypeAdd2] InstrumentTypeAdd2,
    [MasterData.BasicInformationInterest.InterestPaymentMethod] InterestPaymentMethod,
    [MasterData.RepaymentBasicInformation.RedemptionType] RedemptionType,
    [MasterData.IssuerRelatedInstrumentData.RegCountryStatBA] RegCountryStatBA,
    [MasterData.CorpActReferenceInstruments.UnderlyingInstrCorporateAction.UnderlIDCorpActISIN] UnderlIDCorpActISIN,
    [MasterData.CorpActReferenceInstruments.IDCorporateAction.IDCorpActISIN] IDCorpActISIN,
    [Liquidity_flag] esma_Liquidity_flag,
    [Minimum_qualifying_size] esma_minimum_qualifying_size,
    [Pre_trade_large_in_scale_threshold] esma_pre_trade_large_in_scale_threshold,
    [Relevant_market_average_daily_number_of_transactions] esma_adnt
    FROM
    (select
    a.isin,
    b.name,
    a.attribute_value
    from (select isin from td.isin_market_place where market_place = 'XFRA' and ABS(CHECKSUM(ISIN)) % 6 = 3 ) cr
    inner join (select isin,attribute_value,attribute_id from td.source_instrument where is_active = 1) a
    on cr.isin=a.isin
    inner join td.source_attribute b
    on a.attribute_id = b.id
    where (name in (
    'MasterData.InstrumentNames.LongName',
    'MasterData.InstrumentNames.ShortName',
    'MasterData.Indices.WrapperID',
    'MasterData.CurrencyUnit.QuotationUnit',
    'MasterData.Custody.SafeCustodyAlternatives',
    'MasterData.InstrumentIdentifier.WKNBorder',
    'MasterData.ExchangeRelatedInformation.TickerSymbolGermany',
    'MasterData.ExchangeRelatedInformation.BaseExchange',
    'MasterData.AssetClassifications.ProductGroup',
    'MasterData.AssetClassifications.InstrumentTypeAdd1',
    'MasterData.InstrumentIdentifier.LEI',
    'MasterData.Indices.UnderlyingIndexMultiplier',
    'MasterData.AssetClassifications.InstrumentTypeAdd3',
    'MasterData.AssetClassifications.InstrumentType',
    'MasterData.InstrumentIdentifier.SuperordinateIssuer',
    'MasterData.InstrumentIdentifier.IssuerNumber',
    'MasterData.InstrumentStatus.InstrStatusReason',
    'MasterData.IssuerRelatedInstrumentData.RegCountryISO',
    'MasterData.Custody.PrintedCertificateType',
    'MasterData.Custody.BookEntrySecCertificationType',
    'MasterData.Indices.OptionRightType',
    'MasterData.Indices.OptionRightSubSegment',
    'MasterData.IssuanceInformation.IssueDate',
    'MasterData.CCPEligitibility.CCPEligibilityFrom',
    'MasterData.CCPEligitibility.CCPEligibilityTo',
    'MasterData.RepaymentMaturityResubmission.RedemptionDateAfterCalling',
    'MasterData.CouponRelatedData.InterestPaymMonthsProsp',
    'MasterData.CouponRelatedData.InterestPaymDayInMonthProsp',
    'MasterData.KeyDatesBonds.InterestPaymentBegin',
    'MasterData.KeyDatesBonds.FirstCouponDate',
    'MasterData.BasicInformationInterest.InterestType',
    'MasterData.ReferenceInterestRate.BaseInterestRate',
    'MasterData.BasicInformationInterest.InterestRateValidUntil',
    'MasterData.BasicInformationInterest.InterestPayDateParticularities',
    'MasterData.EligibilityCentralBank.AssetEligibiltyEurosystem',
    'MasterData.AssetClassifications.ProductGroup',
    'MasterData.InflationLinkedInformation.InflatIndexOrInterestCapital',
    'MasterData.BasicInformationInterest.InterestAccrualParticularities',
    'MasterData.KeyDatesBonds.LastCouponDate',
    'MasterData.BasicInformationInterest.DetachmentDays',
    'MasterData.BasicInformationInterest.CurrentInterestPeriodFrom',
    'MasterData.BasicInformationInterest.CurrentInterestPeriodTo',
    'MasterData.BasicInformationInterest.DayCountConvention',
    'MasterData.AppropriationEarnings.SpecificsInterestsDividend',
    'MasterData.CouponRelatedData.InterestPaymPeriod',
    'MasterData.BasicInformationInterest.InterestRatePerCent',
    'WM-GS035',
    'WM-GS001',
    'WM-GS030',
    'MasterData.RepaymentMaturityResubmission.LastRedemptionDate',
    'WM-GS039',
    'MasterData.RepaymentBasicInformation.PoolFactor',
    'MasterData.AssetClassifications.WKNClassification',
    'MasterData.CouponRelatedData.IrregularCoupons',
    'MasterData.AdditionsG.TechnicalInterestPaymentDate',
    'MasterData.AssetClassifications.InstrumentTypeAdd2',
    'MasterData.BasicInformationInterest.InterestPaymentMethod',
    'MasterData.RepaymentBasicInformation.RedemptionType',
    'MasterData.IssuerRelatedInstrumentData.RegCountryStatBA',
    'MasterData.CorpActReferenceInstruments.UnderlyingInstrCorporateAction.UnderlIDCorpActISIN',
    'MasterData.CorpActReferenceInstruments.IDCorporateAction.IDCorpActISIN',
    'Liquidity_flag',
    'Minimum_qualifying_size',
    'Pre_trade_large_in_scale_threshold',
    'Relevant_market_average_daily_number_of_transactions')
    )
    and (source in ('WM-VF1','esma'))
    ) t
    PIVOT (
    max(attribute_value) FOR name IN(
    [MasterData.InstrumentNames.LongName],
    [MasterData.InstrumentNames.ShortName],
    [MasterData.Indices.WrapperID],
    [MasterData.ExchangeRelatedInformation.TickerSymbolGermany],
    [MasterData.AssetClassifications.ProductGroup],
    [MasterData.AssetClassifications.InstrumentTypeAdd1],
    [MasterData.ExchangeRelatedInformation.BaseExchange],
    [MasterData.CurrencyUnit.QuotationUnit],
    [MasterData.Custody.SafeCustodyAlternatives],
    [MasterData.AssetClassifications.InstrumentTypeAdd3],
    [MasterData.AssetClassifications.InstrumentType],
    [MasterData.InstrumentIdentifier.WKNBorder],
    [MasterData.InstrumentIdentifier.LEI],
    [MasterData.Indices.UnderlyingIndexMultiplier],
    [MasterData.InstrumentIdentifier.SuperordinateIssuer],
    [MasterData.InstrumentIdentifier.IssuerNumber],
    [MasterData.InstrumentStatus.InstrStatusReason],
    [MasterData.IssuerRelatedInstrumentData.RegCountryISO],
    [MasterData.Custody.PrintedCertificateType],
    [MasterData.Custody.BookEntrySecCertificationType],
    [MasterData.BasicInformationInterest.InterestType],
    [MasterData.Indices.OptionRightType],
    [MasterData.Indices.OptionRightSubSegment],
    [MasterData.IssuanceInformation.IssueDate],
    [MasterData.CCPEligitibility.CCPEligibilityFrom],
    [MasterData.CCPEligitibility.CCPEligibilityTo],
    [MasterData.RepaymentMaturityResubmission.RedemptionDateAfterCalling],
    [MasterData.CouponRelatedData.InterestPaymMonthsProsp],
    [MasterData.CouponRelatedData.InterestPaymDayInMonthProsp],
    [MasterData.KeyDatesBonds.InterestPaymentBegin],
    [MasterData.KeyDatesBonds.FirstCouponDate],
    [MasterData.ReferenceInterestRate.BaseInterestRate],
    [MasterData.BasicInformationInterest.InterestRateValidUntil],
    [MasterData.BasicInformationInterest.InterestPayDateParticularities],
    [MasterData.EligibilityCentralBank.AssetEligibiltyEurosystem],
    [MasterData.InflationLinkedInformation.InflatIndexOrInterestCapital],
    [MasterData.BasicInformationInterest.InterestAccrualParticularities],
    [MasterData.KeyDatesBonds.LastCouponDate],
    [MasterData.BasicInformationInterest.DetachmentDays],
    [MasterData.BasicInformationInterest.CurrentInterestPeriodFrom],
    [MasterData.BasicInformationInterest.CurrentInterestPeriodTo],
    [MasterData.BasicInformationInterest.DayCountConvention],
    [MasterData.AppropriationEarnings.SpecificsInterestsDividend],
    [MasterData.CouponRelatedData.InterestPaymPeriod],
    [MasterData.BasicInformationInterest.InterestRatePerCent],
    [WM-GS035],
    [WM-GS001],
    [WM-GS030],
    [MasterData.RepaymentMaturityResubmission.LastRedemptionDate],
    [WM-GS039],
    [MasterData.RepaymentBasicInformation.PoolFactor],
    [MasterData.AssetClassifications.WKNClassification],
    [MasterData.CouponRelatedData.IrregularCoupons],
    [MasterData.AdditionsG.TechnicalInterestPaymentDate],
    [MasterData.AssetClassifications.InstrumentTypeAdd2],
    [MasterData.BasicInformationInterest.InterestPaymentMethod],
    [MasterData.RepaymentBasicInformation.RedemptionType],
    [MasterData.IssuerRelatedInstrumentData.RegCountryStatBA],
    [MasterData.CorpActReferenceInstruments.UnderlyingInstrCorporateAction.UnderlIDCorpActISIN],
    [MasterData.CorpActReferenceInstruments.IDCorporateAction.IDCorpActISIN],
    [Liquidity_flag],
    [Minimum_qualifying_size],
    [Pre_trade_large_in_scale_threshold],
    [Relevant_market_average_daily_number_of_transactions])
    ) AS p) SPARK_GEN_SUBQ_8

    Tables:

    • The table isin_market_place is a small one, only a few records, and in this query returns 8 rows
    • The table td.source_instrument is the main table and contains all the instruments and their attributes. The table also has row versioning (valid_from valid_to and is_active flag)
    • The table source_attributes is where the instruments attributes are maintained

    source_instrument and attributes

    The main intention is to "denormalize" the the attributes stored as key-value pairs as columns.

    BR.

    Paul

    Paul Hernández
  • `back to basics - what problem are you trying to solve?

    is the query too slow? how long does it take and how often is it executed?

    does it affect other queries running on the server?

    does it exhaust the buffer pool when it runs?

    if you change to "maxdop 1" on the query it self does it run faster or slower?

     

  • Hi everyone,

    thanks again for your suggestions.

    Regarding the questions from @frederico_fonseca:

    • what problem are you trying to solve? The customer has a complex data processing process using Azure SQL DB and Databricks. They have a target SLA of 20 minutes to finish the whole process. We divided the process into different parts and one of the source queries is taken around 5 minutes to complete. I'm trying to see how to improve performance and I suggested to start with the top 5 queries by duration. One idea, and that is why pivot is used, is to reduce the number of rows to be loaded in Databricks.
    • is the query too slow? how long does it take and how often is it executed? I think it should run faster, we are talking about a result set of 1.5 million rows, which is the results of using these 1.5 millions instruments and pivoting them with an attributes table of about 64 attributes for instrument, in total around 96 millions records.
    • does it affect other queries running on the server? There are not too many queries running on the server at the same time, since most parts of the data processing process is sequential.
    • does it exhaust the buffer pool when it runs? I don't know, I have to check it.
    • if you change to "maxdop 1" on the query it self does it run faster or slower? Today I asked them to execute with maxdop = 1, 4, 8 and 16. I can share the results tomorrow.

    I also attached the actual execution plan for the current query using PIVOT.

    I followed the article cited by @jeff Moden, and use the CASE statements instead. The query now is quite cleaner but at the moment we have only gained 30 seconds, that's it, 4 minutes 30 seconds instead of 5 minutes.

    Database is Azure SQL Database, pricing tier = Premium P6 1000 DTUs

    Paul Hernández
  • plan doesn't attach on forums - you will need to use https://www.brentozar.com/pastetheplan/ and post link.

    another question - is the output of that query being directly inserted into a table or is it being retrieved to another process outside the DB for further processing - important even if the process is in azure and is inserting back to another table - point here is if it leaves the server (Azure ADF for example)

    if it is inserting into a table can you also post DDL for that table - and do give further details of this particular step.

  • Thanks again, I didn't know that.

    Here the plan: https://www.brentozar.com/pastetheplan/?id=H1PbpUEgA

    The query is issued by a Databricks notebook using JDBC.

    BR. Paul

    Paul Hernández
  • can you also get the explain plan using the CASE method from Jeff Moden article?

    curious to compare.

    regarding the process - as I don't know Databricks I can't say if this will work - but I would try out with first loading the data into a temp table (created with and without a columnstore index) to see if timings improve - this for both methods and both with and without the index.

    the CASE method is likely not using a sort step (which is being created as far as I can see by the PIVOT operation as your original query didn't have a order by) so it may be why its slightly faster.

    using the temp table will also kind of tell you if the duration is due to the transfer to data bricks or the processing of the data in the first place (easy to test by executing the select into a temp table directly on the database)

    another thing I would likely try (again if temp table work with Databricks)  is to create a temp table, with a clustered index on isin, where you would add the output of a select from source_instrument where is_valid = 1 - dont know what percentage of the source data would be filtered just by this though - but may or not be worth it.

     

     

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply