Assigning categories to values 1

  • Hugo Kornelis (5/15/2013)


    There are lots of other products that I do not love. Some Microsoft, some not. Some regularly used by me, some not. Try to find evidence on the internet of me ranting about some weird behaviour in MS Excel, MS Visio, Oracle, MySQL, ERWin Data Modeler, or any other product. You won't find anything. That's not because I consider those products perfect (They are not. Far from!) It is because I don't care a flying hoot about those products. SQL Server is the only product that I care so much about that I take the time and energy to investigate its rough edges, to warn people about it, and to shout and scream to Microsoft in the hope of getting them smoothed out.

    I *love* SQL Server. I really do!

    Sometimes I disagree horribly with you, but on this I'm in total agreement. Well said! And keep up the connect rants - even on the rare occasions when you get it wrong they are a useful contribution because it helps make the MS people think.

    Tom

  • L' Eomot Inversé (5/17/2013)


    You are demonstrating a lack of historical knowledge here.

    SQL became "SQL" because someone other than IBM owned the rights to the original name the IBM developers who invented SQL chose, which was SEQUEL - for Structured English QUEry Language.

    And anyone who thinks English OR means the same as XOR needs a remedial English course.

    Conversational English, at least here in Michigan, treats OR as XOR, with "and/or" being the way one would write or say OR in any situation other than programming. Thinking on it some more, legal documentation seems to follow the programmatic usage of OR, but I am not sufficiently familiar with law to state that as a certainty.


    Puto me cogitare, ergo puto me esse.
    I think that I think, therefore I think that I am.

  • Ouch. I got this one wrong because I had forgotten how far SQL diverged from its original aim of providing queries that meant what they looked as if they meant. It's a good question from my point of view, because it's probably ensured that I'll remember this bit of weirdness in future.

    Although I agree with those who complained that English "between" (like French entre, and Spanish entre, and Gàidhlig eadar, and Irish idir, and German zwischen .... I could go on and on and on here) usually represents a symmetrical concept (last time I looked Oxford was between Bristol and London as well as being between London and Bristol, and if I can say things like "zwischen den beiden" or "between the two of them" and so on it is certainly symmetrical in those cases) I don't agree with those who say it shouldn't include the bounds. If someone asks me to pick a number between 1 and 3 I don't think I'm forced to choose '2'.

    Anyway, I won't complain. I should know the language definition, it's not the fault of the language or of today's question that I don't, it's my fault. Besides, this is far from the only piece of poor language design in SQL, there are other things more important to complain about.

    Tom

  • srienstr (5/17/2013)


    L' Eomot Inversé (5/17/2013)


    You are demonstrating a lack of historical knowledge here.

    SQL became "SQL" because someone other than IBM owned the rights to the original name the IBM developers who invented SQL chose, which was SEQUEL - for Structured English QUEry Language.

    And anyone who thinks English OR means the same as XOR needs a remedial English course.

    Conversational English, at least here in Michigan, treats OR as XOR, with "and/or" being the way one would write or say OR in any situation other than programming. Thinking on it some more, legal documentation seems to follow the programmatic usage of OR, but I am not sufficiently familiar with law to state that as a certainty.

    I was never in Michigan, but I bet if you say someone to take whiskey OR tequilla you will not take it an error if they will take two bottles. 😎



    See, understand, learn, try, use efficient
    © Dr.Plch

  • honza.mf (5/17/2013)


    srienstr (5/17/2013)


    L' Eomot Inversé (5/17/2013)


    You are demonstrating a lack of historical knowledge here.

    SQL became "SQL" because someone other than IBM owned the rights to the original name the IBM developers who invented SQL chose, which was SEQUEL - for Structured English QUEry Language.

    And anyone who thinks English OR means the same as XOR needs a remedial English course.

    Conversational English, at least here in Michigan, treats OR as XOR, with "and/or" being the way one would write or say OR in any situation other than programming. Thinking on it some more, legal documentation seems to follow the programmatic usage of OR, but I am not sufficiently familiar with law to state that as a certainty.

    I was never in Michigan, but I bet if you say someone to take whiskey OR tequilla you will not take it an error if they will take two bottles. 😎

    Depends on the direction. If I offer them my whiskey or tequila, they had best take just one. If I ask them for whiskey or tequila, they're perfectly free to give me both. 😎


    Puto me cogitare, ergo puto me esse.
    I think that I think, therefore I think that I am.

  • If someone asks me to pick a number between 1 and 3 I don't think I'm forced to choose '2'.

    Nice attempt at a "proof by intimidation" but I would see it as an invitation to pick a non-integer! 🙂

    If you change the numbers to 1 and 30 (and explicitly ask for an integer) then I doubt that 1 and 30 would be as likely outcomes as the other integers in the range.

    But my main point was to point out the ambiguity, not that one choice was necessarily more correct than the other.

    I totally agree with you that there are worse "features" in SQL (and many other languages) and with Hugo that complaining is a consequence of caring...

    (edit: I would not even calls this a 'bug', like I would call certain other "features", but simply a 'bad choice')

  • This discussion reaffirms my conscious choice not to use BETWEEN in anything other than throwaway queries. It's a bit of syntactical sugar of dubious value - really, how much harder is it to write or read "x >= y AND x <= z"? - that admits a greater possibility of misunderstanding and unintended effects.

    Jason Wolfkill

  • Some Languages use the word "Range" which is probably more descriptive than "Between". I am happy with "Between" as it's usually what I'm looking for. I can't see a use for a Function that excludes the boundaries.

    David

  • Alex Fekken (5/17/2013)


    If someone asks me to pick a number between 1 and 3 I don't think I'm forced to choose '2'.

    Nice attempt at a "proof by intimidation" but I would see it as an invitation to pick a non-integer! 🙂

    If you change the numbers to 1 and 30 (and explicitly ask for an integer) then I doubt that 1 and 30 would be as likely outcomes as the other integers in the range.

    I have frequently heard high school exam grades being refered to as numbers between 1 and 10. I have never heard anyone call them a number between 0 and 11.

    But then, I am not a native English speaker, so what do I know?

    However, with other data types than integer values, there is indeed amibiguity in the English use of "between". On a numeric (non-integer) scale, when a value is between 11 and 35, I expect the values 11.000 and 35.000 to be included, just as (obviously) 11.001 and 34.999, but not 35.001. Just as the SQL operator BETWEEN is defined for numeric data types.

    However, with points in time, when people say between now and May 25, I expect May 25, 4:28 PM to be still included. For SQL with datetime and similar data types, the cutoff point would be May 25, midnight - one second later would be outside the range.

    (edit: I would not even calls this a 'bug', like I would call certain other "features", but simply a 'bad choice')

    Agreeed. This is definitely not a bug. I would call this a non-intuitive implementation choice. (And note that this choice is not made by the Microsoft team; they only followed what has been defined in the ISO/ANSI standard for SQL).


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (5/18/2013)


    However, with other data types than integer values, there is indeed amibiguity in the English use of "between". On a numeric (non-integer) scale, when a value is between 11 and 35, I expect the values 11.000 and 35.000 to be included, just as (obviously) 11.001 and 34.999, but not 35.001. Just as the SQL operator BETWEEN is defined for numeric data types.

    However, with points in time, when people say between now and May 25, I expect May 25, 4:28 PM to be still included. For SQL with datetime and similar data types, the cutoff point would be May 25, midnight - one second later would be outside the range.

    That's because "between now and May 25" in English treats May 25 as a day with duration, not as a point in time. If something is to happen between now and 25 May it can happen on 25 May, any time on 25 May. Same with months, years, centuries. But "between X and Y" when X and Y are times (with or without date attached) requires X and Y to be treated as points without duration. There are no things with duration in SQL, the SQL date type corresponds more to a time in (UK) English than to a day - its semantics is simply that it is the restriction of the datetime2 type to values which represent the early boundary of a day in the time-zone for which those values are correct.

    It could get quite amusing to try to work how between works in various domains in English: is it direction-free, does it include the endpoints, do the endpoints have extent, does the domain of the endpoints have an order and if so is it partial or total, how many dimensions does the domain have, what are its topological properties). But all (current) SQL types are one dimensional and contain only extent-free values, so there it isn't anything like as complicated as English.

    (edit: I would not even calls this a 'bug', like I would call certain other "features", but simply a 'bad choice')

    Agreed. This is definitely not a bug. I would call this a non-intuitive implementation choice. (And note that this choice is not made by the Microsoft team; they only followed what has been defined in the ISO/ANSI standard for SQL).

    Of course it's not a bug in SQL Server. It may or may not have been a bad choice in the standard or in IBM's original SEQUEL language definition or wherever the choice was originally made (directionality was a bad choice, end-point inclusiveness a good one, in my view), but conforming to the standard is certainly not a bug.

    Do you really think that end-point inclusiveness is counter-intuitive?

    edit: get quote tags right

    Tom

  • We shouldn't use English as a reference for language implementation. Almost nothing is self-evident.

    "Between Baltimore and Boston, which is your favorite city?"

    - you have two enumerated options, select one or the other

    - you have a geographical area bounded by only two non-exact endpoints. Many would agree that New York City is in this range. Is Philadelphia? Is Harrisburg? Is Chicago? How far east-west can be assumed from north-south endpoints?

    Granted this is a stretch for the discussion of "between" since it isn't intended to work with spatial data types, but it does illustrate the context-dependency of English.

    The beauty of _Structured_ Query Language is that there should be no ambiguity.

    here's another example of the difficulty of language from a discussion of AI:

    "I like to eat pizza with sausage"

    "I like to eat pizza with a fork"

    "I like to eat pizza with a friend"

    A meaning of 'with' that works for one of those sentences leads to confusion for the other two.

  • Mike Dougherty-384281 (5/18/2013)


    We shouldn't use English as a reference for language implementation. Almost nothing is self-evident.

    "Between Baltimore and Boston, which is your favorite city?"

    - you have two enumerated options, select one or the other

    - you have a geographical area bounded by only two non-exact endpoints. Many would agree that New York City is in this range. Is Philadelphia? Is Harrisburg? Is Chicago? How far east-west can be assumed from north-south endpoints?

    Granted this is a stretch for the discussion of "between" since it isn't intended to work with spatial data types, but it does illustrate the context-dependency of English.

    The beauty of _Structured_ Query Language is that there should be no ambiguity.

    here's another example of the difficulty of language from a discussion of AI:

    "I like to eat pizza with sausage"

    "I like to eat pizza with a fork"

    "I like to eat pizza with a friend"

    A meaning of 'with' that works for one of those sentences leads to confusion for the other two.

    .

    Using English as a reference is where SQL started from. It does lead to problems, and some of the decisions as to how to use some English words have resulted in pretty counter-intuitive stuff. It might be better to avoid words like "with" and "between" and "outer" and "merge" which tend to guide English speakers in a certain direction if that's not the direction intended.

    There's even a problem in using words like "reduce" and "map" and "function" in their strict mathematical sense, since (a) most English speakers haven't a clue what that strict mathematical sense is and (b) they are used differently in different parts of mathematics. "Relation" is a real beauty, too - I laugh whenever some ignoramus tells us that the relations of relational database theory are the mathematical relations of set theory (which they clearly are not - a relation may be a set of maps on a list of named domains, but it is certainly not a set of tuples taken from the cross product of an ordered list of domains), and then tells us in the next breath that we must not refer to column numbers.

    The trouble is though that many people need something that's almost familiar to hang onto in order to understand the language, and maybe words are all they've got. Not everyone can be a Turing and programme directly in machine code, a church and programme in recursive function notation, or a Curry and programme in lambda calculus notation (I've modified paper tape binaries with mask and hand punch, but that is NOT how to write programs). So programming language designers use words from natural language. It's not always English; but English is the most common choice, partly because English was the international language used for computational theory for a couple of decades before there were any computers and partly because the UK and the US, both mainly English-speaking nations, were the first and second nations to have computers, well before everyone else.

    Tom

  • L' Eomot Inversé (5/18/2013)


    [Of course it's not a bug in SQL Server. It may or may not have been a bad choice in the standard or in IBM's original SEQUEL language definition or wherever the choice was originally made (directionality was a bad choice, end-point inclusiveness a good one, in my view), but conforming to the standard is certainly not a bug.

    Do you really think that end-point inclusiveness is counter-intuitive?

    Nope, not at all. I was responding to the sub-discussion about whether or not "between" in the English language would generally be interpreted to be inclusive or exclusive. The biggest part of my message focused on what I believe to be the generic interpretation (inclusive) - and then I decided to show how dangerous it can be to try to find a parallel between a computer language and a human language by adding a contrasting example.

    I completely agree with your assessment. I am pretty sure that a big majority of people that speak English at the same level as I do (not native but, if I may say so, better than the average non-native English speaker) would normally interpret "between" to be inclusive of both begin and end point, and non-directional. I would consider not including the end point to be counter-intuitive (and not including the begin point even more so - and please don't ask me how to define begin and end point for a concept I just defended to be non-directional, I know it makes no sense, and yet it does. To me.)


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (5/18/2013)


    I would consider not including the end point to be counter-intuitive (and not including the begin point even more so - and please don't ask me how to define begin and end point for a concept I just defended to be non-directional, I know it makes no sense, and yet it does. To me.)

    To me it does make sense. The begin point is the first mentioned, if I say "between 10 and 100" ten is the begin point and one hundred the end point while if I say "between 100 and 10" one hundred is the begin point and ten the end point. Since the whole thing is non-directional, between-ness doesn't take account of which is the begin point and which is the end point, unless it has different preferences for inclusion of the two points, and since you have a different degree of preference for the two it's clearly essential that you distinguish between the two. That certainly makes sense - at least, to me. Your distinction between the two points may not work the same way as mine, but to me, at some level, that doesn't matter - we both make a distinction; it may have different consequences for each of us, but the fact that there is a distinction allows each of us to assign consequences. Of course at a more detailed level we may disagree - or not.

    edit: change /quite to /quote

    Tom

  • But then, I am not a native English speaker, so what do I know?

    I probably should have confessed earlier that neither is mine. But judging by your last name Hugo, it is probably the same as yours (dutch).

    Do you really think that end-point inclusiveness is counter-intuitive?

    It was me who said that because in my opinion endpoint issues are often a mess that I feel ambivalent about: I don't think there is an "intuitive" way of dealing with them. When I said I would exclude the endpoints I was thinking of the continuous case, because it is the more general one (e.g. pick a number between 0 and 1).

    I think the best way of dealing with ranges (which is what I think BETWEEN is trying to do) would be in the way SQL partition functions are defined because there are often multiple contiguous ranges to deal with. The moment you try the single out a single one of them you've got an issue with (or at least decisions to make about) the endpoints. Hugo avoided this by using integers, but coming from a mathematical background I prefer approaches and solutions that also work in, and are (or can be seen as) intuitive from, a more general perspective.

    (I hope this discussion isn't undermining future questions that Hugo has planned; I see there is a '1' at the end of the question title)

Viewing 15 posts - 46 through 60 (of 71 total)

You must be logged in to reply to this topic. Login to reply