September 18, 2015 at 3:49 am
Just to add to my original suggestion , since your using this for a ETl process , Here is what I think would work
-create a date table for i guess the next 10 years
-Take the data from these table and make it part of a cache transform for the SSIS package
-Have a lookup transform join with the cache dataset, this way you get a table like structure against which you can perform joins and filters ( within the SSIS package) without violating any warranties
I have tested this approach a number of times and its works great for 10s of millions of rows as long as there are no blocking transforms elsewhere in the package.
September 18, 2015 at 6:17 am
Eirikur Eiriksson (9/18/2015)
Kristen-173977 (9/18/2015)
The front runners were neck-and-neck in Luis's script?I've found the method I use for timing more accurate than the time statistics
😎
And it would take everything into account. No timing problems from scalar functions or the timing being on impacting anything...just total execution time. I think I'm going to have to play with that on Monday. Thanks.
September 18, 2015 at 10:58 am
Eirikur Eiriksson (9/18/2015)
I've found the method I use for timing more accurate than the time statistics
Ah ... 🙂 Not to dis Luis or anything, but when I ran the STATISTICS example my repeat timings were all over the shop. I thought i might have been varying server load, but I was a bit surprised, at the time, that it could be that as there was hardly anyone else using it.
I'll run your script here and see if the Min/Max/etc vary by much.
September 18, 2015 at 11:43 am
Kristen-173977 (9/18/2015)
Eirikur Eiriksson (9/18/2015)
I've found the method I use for timing more accurate than the time statisticsAh ... 🙂 Not to dis Luis or anything, but when I ran the STATISTICS example my repeat timings were all over the shop. I thought i might have been varying server load, but I was a bit surprised, at the time, that it could be that as there was hardly anyone else using it.
I'll run your script here and see if the Min/Max/etc vary by much.
There's no problem. Sometimes, different servers/computers might react differently with similar data.
I found similar results to Eirikur using his script, which might reflect a problem with measurements with system functions (I learned the problem existed with udfs from this article[/url]).
I'd be interested in seeing your results.
September 18, 2015 at 5:51 pm
Sean Lange (9/17/2015)
I can't change the table at all. It will violate our support contract. Not to mention it would be possible to get trashed with an update at some point. We are not allowed to do anything other than select statements or we are on our own.
Are you allowed to add objects? I'm thinking an indexed view that exposes and persists this column.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
September 19, 2015 at 2:24 am
WayneS (9/18/2015)
Sean Lange (9/17/2015)
I can't change the table at all. It will violate our support contract. Not to mention it would be possible to get trashed with an update at some point. We are not allowed to do anything other than select statements or we are on our own.Are you allowed to add objects? I'm thinking an indexed view that exposes and persists this column.
Quick thought, if no object can be created in the database then an option is to create another database on the same server and create the objects there. There is really no penalty querying objects across databases on the same server.
😎
September 19, 2015 at 4:41 am
Can you have an Indexed View in another database?
Sorry, being lazy I haven't checked, but my recollection is "no"
If that is the case is there some other, cool :-), way to get data in a.n.other database to reflect changes in the main database? (A trigger, in the main table, being not allowed due to Licence terms).
September 19, 2015 at 6:33 am
Eirikur Eiriksson (9/17/2015)
Using my normal test harness, Kristen's DATETIMEFROMPARTS solution is a clear winner, more than twice faster than the next one. Note that the slowest solutions and those generating conversion errors are not included in the test.
What's the configuration of your box, Eirikur? I ask because when I run this on the Dev box at work, Kristen's times come up in the 900ms range and the rest also suffer a similar nearly 3X increase. Something's obviously wrong on that box and I'm trying to hammer out what it is.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 19, 2015 at 8:22 am
Jeff Moden (9/19/2015)
Eirikur Eiriksson (9/17/2015)
Using my normal test harness, Kristen's DATETIMEFROMPARTS solution is a clear winner, more than twice faster than the next one. Note that the slowest solutions and those generating conversion errors are not included in the test.What's the configuration of your box, Eirikur? I ask because when I run this on the Dev box at work, Kristen's times come up in the 900ms range and the rest also suffer a similar nearly 3X increase. Something's obviously wrong on that box and I'm trying to hammer out what it is.
That sounds both slow and odd, especially when taking into consideration that my test boxes are mostly second generation i5 CPUs, some dual and some quad core with nominal memory of 8-12 Gb. That last test was run on a Dell E6220 i5-2520M and 8Gb Ram. On the other hand, all of those boxes have SSD storage for the databases (or in case of the laptops, the system drive) and that is where I spare no expense.
😎
Do you see high IO or CPU wait times when running these tests? There is a nifty little tool called SQL Performance Monitor (http://sqltouch.blogspot.ca/2014/03/free-real-time-sql-server-performance.html) which I often use when troubleshooting one off problems like this.
September 21, 2015 at 8:16 am
WayneS (9/18/2015)
Sean Lange (9/17/2015)
I can't change the table at all. It will violate our support contract. Not to mention it would be possible to get trashed with an update at some point. We are not allowed to do anything other than select statements or we are on our own.Are you allowed to add objects? I'm thinking an indexed view that exposes and persists this column.
I would have to check and see if they allow that. I am guessing I probably could but they will say that it may break in the future if they change the table. Not a big deal really.
At this point the conversion in the query is working acceptably. This is an ETL process so it isn't like somebody is actually going to notice if it takes 3 seconds instead of 1. And it only runs once an hour so it isn't going to cause large amount of waits. I have leveraged a cte to filter out a the large majority of rows initially using simple math for the date as suggested by Eric Russell. Then I do the inline conversion against the cte.
Here is the gist of what I am currently using.
with InitialPass as
(
select *
from #Something
where VHRGDT > cast(replace(convert(varchar(10), @LastExtractDate, 102), '.', '') as int)
)
select *
from InitialPass
where CAST(CAST(VHRGDT AS char(9)) + STUFF(STUFF(RIGHT(VHRGTM + 1000000, 6), 5, 0, ':'), 3, 0, ':') AS datetime) > @LastExtractDate
This allows the index on the VHRGDT column to filter out the data up until the current day. Then I can add the costly datetime conversion to eliminate rows based on the time. Seems to be pretty efficient and simple. 😀
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
September 21, 2015 at 1:59 pm
Eirikur Eiriksson (9/19/2015)
Jeff Moden (9/19/2015)
Eirikur Eiriksson (9/17/2015)
Using my normal test harness, Kristen's DATETIMEFROMPARTS solution is a clear winner, more than twice faster than the next one. Note that the slowest solutions and those generating conversion errors are not included in the test.What's the configuration of your box, Eirikur? I ask because when I run this on the Dev box at work, Kristen's times come up in the 900ms range and the rest also suffer a similar nearly 3X increase. Something's obviously wrong on that box and I'm trying to hammer out what it is.
That sounds both slow and odd, especially when taking into consideration that my test boxes are mostly second generation i5 CPUs, some dual and some quad core with nominal memory of 8-12 Gb. That last test was run on a Dell E6220 i5-2520M and 8Gb Ram. On the other hand, all of those boxes have SSD storage for the databases (or in case of the laptops, the system drive) and that is where I spare no expense.
😎
Do you see high IO or CPU wait times when running these tests? There is a nifty little tool called SQL Performance Monitor (http://sqltouch.blogspot.ca/2014/03/free-real-time-sql-server-performance.html) which I often use when troubleshooting one off problems like this.
I'm not seeing anything in the free tool that I'm not already collecting data for. 🙂 Also, the Dev box that I'm talking about has virtually no usage yet (it's almost brand new and we've not migrated to it yet). I can tell you that it has no SSDs, though. It's a VM, to boot.
On the boxes you spoke of, are they Windows servers or laptops or desktops or ???
--Jeff Moden
Change is inevitable... Change for the better is not.
September 21, 2015 at 11:10 pm
Jeff Moden (9/21/2015)
I'm not seeing anything in the free tool that I'm not already collecting data for. 🙂
Doesn't come as a surprise,thought it was worth mentioning though.
😎
Also, the Dev box that I'm talking about has virtually no usage yet (it's almost brand new and we've not migrated to it yet). I can tell you that it has no SSDs, though. It's a VM, to boot.
Do you have other VM's on that host and how do those perform?
The most plausible explanation would be the read performance but of course I'm just guessing here.
On the boxes you spoke of, are they Windows servers or laptops or desktops or ???
The ones used for the tests are laptops and desktops running desktop OSs (Win7), each having multiple instances of SQL Servers and they are relatively busy boxes.
BTW, just ran the test on a tiny (smallest) Amazon RDS instance
T_TEXT DURATION
-------------- -----------
DRY RUN 140395
Kristen 374388
Luis 842373
EE 842373
tripleAxe-1 1263560
September 25, 2015 at 10:41 am
Sean Lange (9/17/2015)
... I would rather peel M&Ms in the desert than work with this system.
Sounds like the ERP system used by a previous employer. I told my friends that the name of That Vendor would never appear on my resume or my LinkedIn profile, and it hasn't. I started a new job and found out that their accounting system had been bought out by That Vendor.
*sigh* Sometimes you just can't get away from certain people/things. I only hope that the division managing our system is different than their ERP division.
I found out that at that previous employer that the vendor took over managing several other systems, which included disabling my DBCCs, log monitoring, etc. That Vendor didn't notice an error creep in and lost a couple of months of data.
-----
[font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]
September 25, 2015 at 10:55 am
Jeff Moden (9/21/2015)
Eirikur Eiriksson (9/19/2015)
Jeff Moden (9/19/2015)
Eirikur Eiriksson (9/17/2015)
Using my normal test harness, Kristen's DATETIMEFROMPARTS solution is a clear winner, more than twice faster than the next one. Note that the slowest solutions and those generating conversion errors are not included in the test.What's the configuration of your box, Eirikur? I ask because when I run this on the Dev box at work, Kristen's times come up in the 900ms range and the rest also suffer a similar nearly 3X increase. Something's obviously wrong on that box and I'm trying to hammer out what it is.
That sounds both slow and odd, especially when taking into consideration that my test boxes are mostly second generation i5 CPUs, some dual and some quad core with nominal memory of 8-12 Gb. That last test was run on a Dell E6220 i5-2520M and 8Gb Ram. On the other hand, all of those boxes have SSD storage for the databases (or in case of the laptops, the system drive) and that is where I spare no expense.
😎
Do you see high IO or CPU wait times when running these tests? There is a nifty little tool called SQL Performance Monitor (http://sqltouch.blogspot.ca/2014/03/free-real-time-sql-server-performance.html) which I often use when troubleshooting one off problems like this.
I'm not seeing anything in the free tool that I'm not already collecting data for. 🙂 Also, the Dev box that I'm talking about has virtually no usage yet (it's almost brand new and we've not migrated to it yet). I can tell you that it has no SSDs, though. It's a VM, to boot.
On the boxes you spoke of, are they Windows servers or laptops or desktops or ???
Jeff, did you figure out why your dev box was/is 3 times slower than the smallest amazon RDS instance?
😎
September 25, 2015 at 2:47 pm
Kristen-173977 (9/16/2015)
Lynn Pettis (9/16/2015)
Or this if you want to use a MS provided scalar function:... but somewhere on my travels I saw this which satisfies my perception that an arithmetic algorithm will perform better than a string one, but the formula is a bit Smoke & Mirrors to me
CONVERT(datetime, RTRIM(VHRGDT))
+ (VHRGTM * 9 +VHRGTM % 10000 * 6 + VHRGTM % 100 * 10)
/ 216e4
Heh, that is a funny bit of math there. In case it still seems like smoke and mirrors, I'll try to explain it as clearly as possible.
If you want to turn an integer representing HHMMSS into something that can be added to a date (let's say seconds for now) using just math, you have a couple choices.
First, you could rely on modulo and integer division to split up HH, MM, and SS, multiplying them each by the appropriate factor (3600, 60, 1), and sum them.
If you don't want to rely on integer division to cut off digits to the right, then you have to do something like this smoke and mirrors treatment.
Using just modulo, you'll be able to get the integer representations of HHMMSS, MMSS, and SS. Now you'll again want to multiply each of them by an appropriate factor and sum them, but now the integer representation of SS will get multiplied by three numbers before summing, and MM will get multiplied by two numbers before summing.
That means choosing the appropriate factors is a little trickier. Let's represent the problem like this:
(hoursFactor)*HHMMSS +
(minutesFactor)*MMSS +
(secondsFactor)*SS
Now, since those are just integers we're multiplying, that means minutes will be counted 100 times what seconds are. To bring this down to the appropriate 60, we'll need to make sure that the factors by which the minutes will be multiplied sum to 60% of the sum of the factors by which seconds will be multiplied. To put it another way, (hoursFactor)+(minutesFactor) must be 60% of (hoursFactor)+(minutesFactor)+(secondsFactor).
Similarly, in integer form, hours will count as 10,000 seconds, when they're really only 3600, so (hoursFactor) must be 36% of (hoursFactor)+(minutesFactor)+(secondsFactor).
In yet other words, we know that:
(hoursFactor)/[(hoursFactor)+(minutesFactor)+(secondsFactor)]=0.36
[(hoursFactor)+(minutesFactor)]/[(hoursFactor)+(minutesFactor)+(secondsFactor)]=0.6
From there we just have to assume a value for one of the factors, and solve the resulting system of two equations for one of the other factors and we will have three appropriate numbers.
If we end up with 9,6, and 10 as in that snippet, that means that we end up multiplying seconds by 25, resulting in the number of twenty-fifths of a second that time represents. If we want to directly add that number to a datetime value, then we need to convert it to a fraction of a day, so we divide it by the number of twenty-fifths of a second in a day, which is 216e4.
Easy as pie yet clear as mud, and as Sean pointed out, not exactly something you want to be documenting in your SQL Server code 🙂
Cheers!
Viewing 15 posts - 46 through 60 (of 62 total)
You must be logged in to reply to this topic. Login to reply