May 14, 2023 at 3:48 pm
Jeff Moden wrote:why they thought SQL couldn't be used to do any of those things.
He he, first guess is that they don't know how to! 😎
Thanks, Eirikur. I appreciate the response.
So, and just confirming and regardless of possible reasons why, your real first impression was the same as mine in that they're saying that SQL can't do the things cited in their statement, correct?
--Jeff Moden
Change is inevitable... Change for the better is not.
May 14, 2023 at 5:11 pm
"Learning SQL helps us to communicate with databases, but when it comes to cleaning, manipulating, analyzing, and visualizing data, you’ve got to know a bit of Python or R."
Okay let us simply dissect what this says and does not say:
1) Learning SQL helps us to communicate with database
Okay that is pretty straight forward and correct no ambiguities there, however that is followed by a very key word
2) but
This is a conjunction that connects two concepts that contrast with one another. Used in this sentence it strongly implies that the former is negated by the latter.
3) when it comes to cleaning, manipulating, analyzing, and visualizing data, you have got to know a bit of Python or R.
Okay there is no other way to interpret this other than it is saying that you "have got to know" or basically "it is required that you know" or "you must know" which then strongly implies that you must use these and going back to the "but" means you are required to use these instead of SQL for the items denoted.
Thus if the author intended to communicate something else, which I will not say is not the case, they utterly failed in doing so. As I cannot see how anyone that even half understands the English language would interpret that sentence in any other fashion.
That being said I would have nicely pointed out to the author that their claim was utterly false, and when they came back with that statement I would have suggested that they learn how to properly use English as that is not what their sentence communicates based on how it is phrased.
May 14, 2023 at 6:58 pm
Thanks, Dennis... that was my exact first impression and for the very reason you stated... The word "but".
How about the rest of you good folks... did anyone else have a FIRST impression that differed from mine?
--Jeff Moden
Change is inevitable... Change for the better is not.
May 15, 2023 at 8:32 am
Thanks, Dennis... that was my exact first impression and for the very reason you stated... The word "but".
How about the rest of you good folks... did anyone else have a FIRST impression that differed from mine?
First, second and third impressions are the same: "It is impossible to clean, manipulate, analyse and visualise data without knowing some R or Python."
The corollary being that SQL cannot do all of those things.
At a stretch, it could be argued that SQL cannot be used to visualise data and therefore the above statement is true, because one of {clean, manipulate, analyse, visualise} is false, even though {clean, manipulate, analyse} would all be true.
May 15, 2023 at 10:36 am
SQL would be my first choice for cleaning data, while it's a powerful tool for querying and cleaning data, it may not be the best choice for performing complex language analysis tasks like sentiment analysis. I was reading about someone who was analysing data on Twitter, they had to make a decision the positivity of comments, a statement like "yes, it was great" within a comment could be either genuine or sarcastic. Sentiment analysis involves determining the sentiment or emotion expressed in a piece of text, it often requires natural language processing (NLP) techniques to analyse the context, sentiment words, phrases, and other linguistic features. So for this case Python or R would be a good choice to integrate with SQL for cleaning data.
May 15, 2023 at 2:26 pm
Thanks, Dennis... that was my exact first impression and for the very reason you stated... The word "but".
How about the rest of you good folks... did anyone else have a FIRST impression that differed from mine?
My first impression was the same you had Jeff. I know nothing of Python or R and I can do everything I need with SQL. Ok a little SSRS for reports, but that's mainly SQL anyway. 100% agree with Phil statement as well.
-------------------------------------------------------------
we travel not to escape life but for life not to escape us
Don't fear failure, fear regret.
May 15, 2023 at 7:16 pm
SQL would be my first choice for cleaning data, while it's a powerful tool for querying and cleaning data, it may not be the best choice for performing complex language analysis tasks like sentiment analysis. I was reading about someone who was analysing data on Twitter, they had to make a a decision the positivity of comments, a statement like "yes, it was great" within a comment could be either genuine or sarcastic. Sentiment analysis involves determining the sentiment or emotion expressed in a piece of text, it often requires natural language processing (NLP) techniques to analyse the context, sentiment words, phrases, and other linguistic features. So for this case Python or R would be a good choice to integrate with SQL for cleaning data.
Totally agreed on all of that.
Still, there's a shedload of analysis that can be done in T-SQL (of course, you already know that so apologies for stating the obvious).
--Jeff Moden
Change is inevitable... Change for the better is not.
May 15, 2023 at 7:17 pm
Jeff Moden wrote:Thanks, Dennis... that was my exact first impression and for the very reason you stated... The word "but".
How about the rest of you good folks... did anyone else have a FIRST impression that differed from mine?
My first impression was the same you had Jeff. I know nothing of Python or R and I can do everything I need with SQL. Ok a little SSRS for reports, but that's mainly SQL anyway. 100% agree with Phil statement as well.
Aye... thank you for the feedback on that.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 15, 2023 at 7:27 pm
Jeff Moden wrote:Thanks, Dennis... that was my exact first impression and for the very reason you stated... The word "but".
How about the rest of you good folks... did anyone else have a FIRST impression that differed from mine?
First, second and third impressions are the same: "It is impossible to clean, manipulate, analyse and visualise data without knowing some R or Python."
The corollary being that SQL cannot do all of those things.
At a stretch, it could be argued that SQL cannot be used to visualise data and therefore the above statement is true, because one of {clean, manipulate, analyse, visualise} is false, even though {clean, manipulate, analyse} would all be true.
Aye. Thanks, Phil. I'll say that you can actually visualize data in SQL. Not all visualization needs to be in graphic form.
I also say that there are hybrid opportunities. For example, I saw one fellow post an article about Python and his code was "chunking" data in 100,000 row-size iterations. I asked by and he said to "increase performance". One of the many articles I have on the proverbial back burner is one about how to do linear regressions on millions of rows in a nasty fast manner. It would be interesting to compare Python with SQL Server on things like that.
Another, which I'm actually going to teach this coming Saturday at the Ohio North SQL Saturday 2023 is how to some pretty awesome, on the fly aggregates on 20 million rows out of 100 million rows (28 Milli-seconds on the first run and less on subsequent runs). It also doesn't take much longer on all 100 million rows.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 16, 2023 at 2:00 am
Okay I know enough about Python to know that it is not nearly as fast to use as straight C especially if you are doing it in MS Windows (ack hack cough puke) I need some relief pass me a bottle of Linux or Unix. This is because MS Windows does not support multi-threading (which many Python programmers are not even aware of) -- it does support multi-processing just not multi-threading. However, if you truly want to get high rates of data maniuplation you use straight C algorithms runing on a Unix server --- I know because I worked on the code that handles all the claims processing for Medcaid for the state of Florida where they had to process nearly billion claims every day and get them completed in a fraction of a day. The magnitude of that versus what most folks are doing is kind of mind boggling to say the least.
Still if I am going to use code to process data manipulation with a need for speed I am going to use a compiled language (C, C++, Java, etc...) not an interpretted one (Python).
May 16, 2023 at 2:41 am
Still if I am going to use code to process data manipulation with a need for speed I am going to use a compiled language (C, C++, Java, etc...) not an interpretted one (Python).
That's another tic I have... I love it when I tell someone that their SQL will be slow and they retort with something (fairly stupid, IMHO) that it's only ever going to have a small number of rows or will only run once and day and so performance doesn't matter. People just can't look past their one piece of code and understand that their code is just one of thousand of pieces of code where someone made the same mistake in thinking. Now, you have a server that's running twice, tens, and, sometimes, hundreds of times slower and there's no place to easily fix it because the poison has permeated the meat of the entire server.
Eh? Just throw hardware at it, right?
Lordy.
I have seen it quite a few times where adding extra hardware actually slowed things down because more things want to get at the same data but it's all slow.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 16, 2023 at 7:56 am
Okay I know enough about Python to know that it is not nearly as fast to use as straight C especially if you are doing it in MS Windows (ack hack cough puke) I need some relief pass me a bottle of Linux or Unix. This is because MS Windows does not support multi-threading (which many Python programmers are not even aware of) -- it does support multi-processing just not multi-threading. However, if you truly want to get high rates of data maniuplation you use straight C algorithms runing on a Unix server --- I know because I worked on the code that handles all the claims processing for Medcaid for the state of Florida where they had to process nearly billion claims every day and get them completed in a fraction of a day. The magnitude of that versus what most folks are doing is kind of mind boggling to say the least.
Still if I am going to use code to process data manipulation with a need for speed I am going to use a compiled language (C, C++, Java, etc...) not an interpretted one (Python).
To the best of my knowledge Windows does support multithreading, certainly in C# you can create multithreaded programs that only run on Windows.
Many popular Python libraries are actually built on top of optimised C or C++ code. These are much faster than interpreted Python libraries. So if most of your processing is in the libraries I think these are comparable with compiled C applications.
Nearly billion claims every day seems a very high number for Medicaid Florida, considering the population of Florida is only about 22 million.
May 16, 2023 at 12:49 pm
Nearly billion claims every day seems a very high number for Medicaid Florida, considering the population of Florida is only about 22 million.
That sounds about right. The number of medicaid recipients in Florida is very high. Without looking, I think they have the highest percentage of medicaid recipients per capita of all the states.
A regular trip to the doctors office results in multiple claims.
Side note on Florida medicaid. In 1999, they did an audit of our billings. They determined that we had underbilled them 700k over three years. So the company got fined 300k. The issue was the module that generated the bills. The programmer who wrote the code did not consider the time portion of the datetime columns, so what ever date was the last date of the billing report was never included.
Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/
May 16, 2023 at 12:52 pm
Dennis Jensen wrote:Still if I am going to use code to process data manipulation with a need for speed I am going to use a compiled language (C, C++, Java, etc...) not an interpretted one (Python).
That's another tic I have... I love it when I tell someone that their SQL will be slow and they retort with something (fairly stupid, IMHO) that it's only ever going to have a small number of rows or will only run once and day and so performance doesn't matter. People just can't look past their one piece of code and understand that their code is just one of thousand of pieces of code where someone made the same mistake in thinking. Now, you have a server that's running twice, tens, and, sometimes, hundreds of times slower and there's no place to easily fix it because the poison has permeated the meat of the entire server.
Eh? Just throw hardware at it, right?
Lordy.
I have seen it quite a few times where adding extra hardware actually slowed things down because more things want to get at the same data but it's all slow.
Dennis Jensen wrote:Still if I am going to use code to process data manipulation with a need for speed I am going to use a compiled language (C, C++, Java, etc...) not an interpretted one (Python).
That's another tic I have... I love it when I tell someone that their SQL will be slow and they retort with something (fairly stupid, IMHO) that it's only ever going to have a small number of rows or will only run once and day and so performance doesn't matter. People just can't look past their one piece of code and understand that their code is just one of thousand of pieces of code where someone made the same mistake in thinking. Now, you have a server that's running twice, tens, and, sometimes, hundreds of times slower and there's no place to easily fix it because the poison has permeated the meat of the entire server.
Eh? Just throw hardware at it, right?
Lordy.
I have seen it quite a few times where adding extra hardware actually slowed things down because more things want to get at the same data but it's all slow.
To coin a cliché, If I had a penny for every time I have heard that "It only runs once a week...", retirement would have occurred long ago.
Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/
Viewing 15 posts - 66,271 through 66,285 (of 66,815 total)
You must be logged in to reply to this topic. Login to reply