Some query values change when query is run but not others. Why?

  • Hi everyone

    I have a 1000 line SQL query that is too long to put here so I will explain the basic logic and what the issue is.

    The query processes stock market data and produces a daily metric for each symbol.  There are close to 4000 symbols that get the metric created daily. I am testing the query and I am seeing some odd behaviour.  For most symbols, if the query is re-run using the same data the metrics for the day do not change.  This is expected.  However, for some symbols, each time the query is run the metric is completely different for the SAME data.  This is definitely not expected.  If all the values changed each time I ran the query for the SAME data then I understand that something is clearly wrong.  However, for some symbols the metric keeps changing each time I run the query on the SAME data but others do not change.  The ones that don't change are producing the correct value.  I am truly stumped.  There is no symbol-specific logic in the query (ie, no hard-coding logic that applies different logic to a particular symbol).  Instead, it is just "select symbol from .... where ... etc"  I have no clue where to even begin.  It is helpful to have the query to look at to provide helpful feedback so I apologize in advance for the inconvenience from not seeing the query.  I am hoping members on this forum can provide some suggestions on what to look for as part of the investigation.  I will investigate whatever feedback I get.

    Thank you

  • Is there an order by? You can have duplicates and sql may pick a random duplicate without order by

  • Do you have a TOP(n) without an ORDER BY?

    Is it always the same symbols that are static, and the same symbols that change?

    With such a large query, I would start by trying to focus on specific symbols, and also cutting out as much of the query as possible to return the absolute minimum.  Then slowly add back parts of the query until things start to change.

  • Thank you both for your feedback.  Issue has been fixed.  I will explain the fix so others can use it for inspiration.

    The query uses multiple physical tables to store data.  One of the fields is used for joins and calculations and it was storing data as INT.  The same field in other tables was NUMERIC(12,5).  This means that data such as 1.4 and 1.5 were being stored as 1 in the problem table hence there were duplicates so 1, 1.4, 1.5 were all being stored as 1.  This meant that if another table was doing a join on the field and was using 1 then the problem table would produce 3 rows (1, 1.4 convert to 1, 1.5 converted to 1) when it should have returned only 1 row.  The fix was to change the data type from INT to NUMERIC(12,5) and now the issue is gone.  If the calculation is repeated using the same data then the result should be static (as expected).

  • Thanks for the followup. I was thinking datatypes as I read the first post today, as this can sometimes be an issue.

     

    The other thing I had in a baseball database was showing Frank Thomas as the most career home runs. The reason? Name + seasons + HRs isn't good enough. There's a Frank Thomas from the 50s 😉 I didn't have a good enough separation in data, which is essentially what you have here.

    This is a good reason to have some testing in place with something like TSQLT, so you can setup a demo set of data that might reproduce this and ensure future refactors don't break this.

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply