how to avoid casting for tables with indexes

  • hi, in my procedure i need to cast varchar datatype to match the int datatype in a table which is huge and have indexes.

    i think using casting for such tables would cause Indexing not work!! ??

    so, could you please suggest me a different way ?

    in below piece of code i tried to cast those two columns because one is int and other is varchar...

    my Example:

    select v.Vendor_Id from Vendor v

    inner join EInvoicing_Invoices ei on V.ps_export_vendor_id = EI.psvendor_id_padded

    inner join OrderHeader OH on ((cast(OH.OrderHeader_ID as varchar) = ei.po_num) or (cast(OH.OrderExternalSourceOrderID as varchar) = ei.po_num ))

    where ei.EInvoice_Id = @InvoiceId and v.Vendor_ID = OH.Vendor_ID

  • Are all OrderHeader_ID in the table convertable to INT?

    Why is it not an INT?

    Preferred way: alter the table so the data types match.

    Workaround: add a computed column that conditionally converts the column to an INT and index the computed column instead to support your queries.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • hi , po_num is the column which is varchar from other table that because that could names in the columns too. and OrderHeader_ID in OrderHeader table is INT only.

    so, about using the computed column should i take the temp table and dump the data(OrderHeader_ID as varchar in temp table) and then that would be easy to use Index on that temp table OrderHeader_ID ?

    please let me know.

  • whole (6/24/2011)


    hi , po_num is the column which is varchar from other table that because that could names in the columns too. and OrderHeader_ID in OrderHeader table is INT only.

    so, about using the computed column should i take the temp table and dump the data(OrderHeader_ID as varchar in temp table) and then that would be easy to use Index on that temp table OrderHeader_ID ?

    please let me know.

    OK, allow me to rephrase with the proper column names:

    Are all OrderHeader_ID po_num values in the table convertable to INT? Why is the po_num column not defined as INT in the base table?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • No not all po_num can be converted as varchar because some values are char's so those cannot be converted as INT.

    so, i am trying to convert Orderheader_ID into varchar and store in temp table?

  • correction :

    No not all po_num can be converted as varchar INTbecause some values are char's so those cannot be converted as INT.

    so, i am trying to convert Orderheader_ID into varchar and store in temp table?

  • This is what I mean about adding a computed column and having it participate in an index:

    ALTER TABLE OrderHeader ADD OrderHeader_ID2 AS (CAST(OrderHeader_ID AS VARCHAR(100))) ;

    CREATE INDEX [ix_OrderHeader.OrderHeader_ID2] ON OrderHeader (OrderHeader_ID2);

    As a side note, it is a good practice to explicitly specify the length of all character types in your code. The bolded text below should be something like varchar(n) where n matches the actual length of po_num:

    inner join OrderHeader OH on ((cast(OH.OrderHeader_ID as varchar) = ei.po_num) or (cast(OH.OrderExternalSourceOrderID as varchar) = ei.po_num ))

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • ok good. as a new developer in the team am not sure if i would be allowed to modify table to add extra column with my permission levels here... and also the table data is HUGE and heavily used so it is sensitive to play with it i guess.

    is there any other work around for this one ?

  • The computed column IS a workaround. You are screwed by poor database design. The proper way out of the mess is to redesign the schema so columns that naturally need to be joined (same entity type) are declared with the same data type.

    Both columns should be in the same family/category of data type (character, n-character, exact numeric, etc.) to give the engine the best chance of being able to use an index when joining...and unfortunately you have an INT and a VARCHAR.

    You could load the data that would participate in one side of the JOIN into a temp table with data types that matched the other side of the JOIN, doing the conversion on the INSERT, and then add indexes to the temp table to make your JOIN use an index...but all that work may take more time and use more resources than the JOIN the way it is.

    Maybe someone else will drop in and provide a nicer workaround.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • hi, i cannot do anything about the database design now ...

    now i understand the criteria that both columns datatypes needs to match...

    So, as a work around could i go ahead with taking temp table and applying index on it .. but is this going to solve my existing problem or cause more memory use as you said ?

    please let me know.

  • whole (6/27/2011)


    hi, i cannot do anything about the database design now ...

    now i understand the criteria that both columns datatypes needs to match...

    Excellent, remember this when it's your turn to design new schema!

    So, as a work around could i go ahead with taking temp table and applying index on it .. but is this going to solve my existing problem or cause more memory use as you said ?

    It will depend, you'll need to test it.

    You also have a major problem with your JOIN predicates. This:

    ((cast(OH.OrderHeader_ID as varchar) = ei.po_num) or (cast(OH.OrderExternalSourceOrderID as varchar) = ei.po_num ))

    is not good. Even if you were not applying a CAST to one side and your data types matched the optimizer will generally still get confused by this and will do a scan. Rewrite your query to do a UNION (or UNION ALL if your data guarantees the queries will return mutually exclusive sets), like this:

    select v.Vendor_Id from Vendor v

    inner join EInvoicing_Invoices ei on V.ps_export_vendor_id = EI.psvendor_id_padded

    inner join OrderHeader OH on ((cast(OH.OrderHeader_ID as varchar) = ei.po_num)

    where ei.EInvoice_Id = @InvoiceId and v.Vendor_ID = OH.Vendor_ID

    UNION

    select v.Vendor_Id from Vendor v

    inner join EInvoicing_Invoices ei on V.ps_export_vendor_id = EI.psvendor_id_padded

    inner join OrderHeader OH on (cast(OH.OrderExternalSourceOrderID as varchar) = ei.po_num ))

    where ei.EInvoice_Id = @InvoiceId and v.Vendor_ID = OH.Vendor_ID

    Regarding the CAST itself, which side of the JOIN involves less data? My initial attempt would be to:

    1) Create a temp table that can hold the results of the smaller comparison set to support the JOIN with the data type of the other column.

    2) Do an INSERT...SELECT to load the temp table, wrapping the column in the SELECT list in a CAST to change the data type on the way in.

    3) Join the temp table with the correct type to the base table to get your results.

    4) Evaluate performance. Add indexes to the temp table prior to step 2 to improve performance to an acceptable level.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply