Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««45678»»»

Assigning categories to values 1 Expand / Collapse
Author
Message
Posted Friday, May 17, 2013 7:03 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Tuesday, May 27, 2014 7:17 PM
Points: 197, Visits: 459
If someone asks me to pick a number between 1 and 3 I don't think I'm forced to choose '2'.

Nice attempt at a "proof by intimidation" but I would see it as an invitation to pick a non-integer!

If you change the numbers to 1 and 30 (and explicitly ask for an integer) then I doubt that 1 and 30 would be as likely outcomes as the other integers in the range.

But my main point was to point out the ambiguity, not that one choice was necessarily more correct than the other.

I totally agree with you that there are worse "features" in SQL (and many other languages) and with Hugo that complaining is a consequence of caring...

(edit: I would not even calls this a 'bug', like I would call certain other "features", but simply a 'bad choice')
Post #1454220
Posted Friday, May 17, 2013 10:38 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Wednesday, July 16, 2014 3:21 PM
Points: 1,210, Visits: 2,512
This discussion reaffirms my conscious choice not to use BETWEEN in anything other than throwaway queries. It's a bit of syntactical sugar of dubious value - really, how much harder is it to write or read "x >= y AND x <= z"? - that admits a greater possibility of misunderstanding and unintended effects.

Jason Wolfkill
Blog: SQLSouth
Twitter: @SQLSouth
Post #1454222
Posted Friday, May 17, 2013 11:44 PM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, July 10, 2014 3:25 PM
Points: 2,133, Visits: 867
Some Languages use the word "Range" which is probably more descriptive than "Between". I am happy with "Between" as it's usually what I'm looking for. I can't see a use for a Function that excludes the boundaries.

David
Post #1454234
Posted Saturday, May 18, 2013 2:10 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 7:40 AM
Points: 5,916, Visits: 8,168
Alex Fekken (5/17/2013)
If someone asks me to pick a number between 1 and 3 I don't think I'm forced to choose '2'.

Nice attempt at a "proof by intimidation" but I would see it as an invitation to pick a non-integer!

If you change the numbers to 1 and 30 (and explicitly ask for an integer) then I doubt that 1 and 30 would be as likely outcomes as the other integers in the range.


I have frequently heard high school exam grades being refered to as numbers between 1 and 10. I have never heard anyone call them a number between 0 and 11.
But then, I am not a native English speaker, so what do I know?

However, with other data types than integer values, there is indeed amibiguity in the English use of "between". On a numeric (non-integer) scale, when a value is between 11 and 35, I expect the values 11.000 and 35.000 to be included, just as (obviously) 11.001 and 34.999, but not 35.001. Just as the SQL operator BETWEEN is defined for numeric data types.
However, with points in time, when people say between now and May 25, I expect May 25, 4:28 PM to be still included. For SQL with datetime and similar data types, the cutoff point would be May 25, midnight - one second later would be outside the range.


(edit: I would not even calls this a 'bug', like I would call certain other "features", but simply a 'bad choice')

Agreeed. This is definitely not a bug. I would call this a non-intuitive implementation choice. (And note that this choice is not made by the Microsoft team; they only followed what has been defined in the ISO/ANSI standard for SQL).



Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Post #1454239
Posted Saturday, May 18, 2013 7:36 AM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Today @ 4:40 AM
Points: 8,557, Visits: 9,050
Hugo Kornelis (5/18/2013)
However, with other data types than integer values, there is indeed amibiguity in the English use of "between". On a numeric (non-integer) scale, when a value is between 11 and 35, I expect the values 11.000 and 35.000 to be included, just as (obviously) 11.001 and 34.999, but not 35.001. Just as the SQL operator BETWEEN is defined for numeric data types.
However, with points in time, when people say between now and May 25, I expect May 25, 4:28 PM to be still included. For SQL with datetime and similar data types, the cutoff point would be May 25, midnight - one second later would be outside the range.

That's because "between now and May 25" in English treats May 25 as a day with duration, not as a point in time. If something is to happen between now and 25 May it can happen on 25 May, any time on 25 May. Same with months, years, centuries. But "between X and Y" when X and Y are times (with or without date attached) requires X and Y to be treated as points without duration. There are no things with duration in SQL, the SQL date type corresponds more to a time in (UK) English than to a day - its semantics is simply that it is the restriction of the datetime2 type to values which represent the early boundary of a day in the time-zone for which those values are correct.

It could get quite amusing to try to work how between works in various domains in English: is it direction-free, does it include the endpoints, do the endpoints have extent, does the domain of the endpoints have an order and if so is it partial or total, how many dimensions does the domain have, what are its topological properties). But all (current) SQL types are one dimensional and contain only extent-free values, so there it isn't anything like as complicated as English.

(edit: I would not even calls this a 'bug', like I would call certain other "features", but simply a 'bad choice')

Agreed. This is definitely not a bug. I would call this a non-intuitive implementation choice. (And note that this choice is not made by the Microsoft team; they only followed what has been defined in the ISO/ANSI standard for SQL).

Of course it's not a bug in SQL Server. It may or may not have been a bad choice in the standard or in IBM's original SEQUEL language definition or wherever the choice was originally made (directionality was a bad choice, end-point inclusiveness a good one, in my view), but conforming to the standard is certainly not a bug.

Do you really think that end-point inclusiveness is counter-intuitive?

edit: get quote tags right


Tom
Post #1454259
Posted Saturday, May 18, 2013 9:26 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Thursday, June 26, 2014 7:28 AM
Points: 257, Visits: 902
We shouldn't use English as a reference for language implementation. Almost nothing is self-evident.

"Between Baltimore and Boston, which is your favorite city?"
- you have two enumerated options, select one or the other
- you have a geographical area bounded by only two non-exact endpoints. Many would agree that New York City is in this range. Is Philadelphia? Is Harrisburg? Is Chicago? How far east-west can be assumed from north-south endpoints?

Granted this is a stretch for the discussion of "between" since it isn't intended to work with spatial data types, but it does illustrate the context-dependency of English.

The beauty of _Structured_ Query Language is that there should be no ambiguity.

here's another example of the difficulty of language from a discussion of AI:
"I like to eat pizza with sausage"
"I like to eat pizza with a fork"
"I like to eat pizza with a friend"
A meaning of 'with' that works for one of those sentences leads to confusion for the other two.




Post #1454272
Posted Saturday, May 18, 2013 12:06 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Today @ 4:40 AM
Points: 8,557, Visits: 9,050
Mike Dougherty-384281 (5/18/2013)
We shouldn't use English as a reference for language implementation. Almost nothing is self-evident.

"Between Baltimore and Boston, which is your favorite city?"
- you have two enumerated options, select one or the other
- you have a geographical area bounded by only two non-exact endpoints. Many would agree that New York City is in this range. Is Philadelphia? Is Harrisburg? Is Chicago? How far east-west can be assumed from north-south endpoints?

Granted this is a stretch for the discussion of "between" since it isn't intended to work with spatial data types, but it does illustrate the context-dependency of English.

The beauty of _Structured_ Query Language is that there should be no ambiguity.

here's another example of the difficulty of language from a discussion of AI:
"I like to eat pizza with sausage"
"I like to eat pizza with a fork"
"I like to eat pizza with a friend"
A meaning of 'with' that works for one of those sentences leads to confusion for the other two.
.

Using English as a reference is where SQL started from. It does lead to problems, and some of the decisions as to how to use some English words have resulted in pretty counter-intuitive stuff. It might be better to avoid words like "with" and "between" and "outer" and "merge" which tend to guide English speakers in a certain direction if that's not the direction intended.
There's even a problem in using words like "reduce" and "map" and "function" in their strict mathematical sense, since (a) most English speakers haven't a clue what that strict mathematical sense is and (b) they are used differently in different parts of mathematics. "Relation" is a real beauty, too - I laugh whenever some ignoramus tells us that the relations of relational database theory are the mathematical relations of set theory (which they clearly are not - a relation may be a set of maps on a list of named domains, but it is certainly not a set of tuples taken from the cross product of an ordered list of domains), and then tells us in the next breath that we must not refer to column numbers.

The trouble is though that many people need something that's almost familiar to hang onto in order to understand the language, and maybe words are all they've got. Not everyone can be a Turing and programme directly in machine code, a church and programme in recursive function notation, or a Curry and programme in lambda calculus notation (I've modified paper tape binaries with mask and hand punch, but that is NOT how to write programs). So programming language designers use words from natural language. It's not always English; but English is the most common choice, partly because English was the international language used for computational theory for a couple of decades before there were any computers and partly because the UK and the US, both mainly English-speaking nations, were the first and second nations to have computers, well before everyone else.


Tom
Post #1454285
Posted Saturday, May 18, 2013 12:24 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 7:40 AM
Points: 5,916, Visits: 8,168
L' Eomot Inversé (5/18/2013)
[Of course it's not a bug in SQL Server. It may or may not have been a bad choice in the standard or in IBM's original SEQUEL language definition or wherever the choice was originally made (directionality was a bad choice, end-point inclusiveness a good one, in my view), but conforming to the standard is certainly not a bug.

Do you really think that end-point inclusiveness is counter-intuitive?


Nope, not at all. I was responding to the sub-discussion about whether or not "between" in the English language would generally be interpreted to be inclusive or exclusive. The biggest part of my message focused on what I believe to be the generic interpretation (inclusive) - and then I decided to show how dangerous it can be to try to find a parallel between a computer language and a human language by adding a contrasting example.

I completely agree with your assessment. I am pretty sure that a big majority of people that speak English at the same level as I do (not native but, if I may say so, better than the average non-native English speaker) would normally interpret "between" to be inclusive of both begin and end point, and non-directional. I would consider not including the end point to be counter-intuitive (and not including the begin point even more so - and please don't ask me how to define begin and end point for a concept I just defended to be non-directional, I know it makes no sense, and yet it does. To me.)



Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Post #1454286
Posted Saturday, May 18, 2013 6:54 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Today @ 4:40 AM
Points: 8,557, Visits: 9,050
Hugo Kornelis (5/18/2013)
I would consider not including the end point to be counter-intuitive (and not including the begin point even more so - and please don't ask me how to define begin and end point for a concept I just defended to be non-directional, I know it makes no sense, and yet it does. To me.)

To me it does make sense. The begin point is the first mentioned, if I say "between 10 and 100" ten is the begin point and one hundred the end point while if I say "between 100 and 10" one hundred is the begin point and ten the end point. Since the whole thing is non-directional, between-ness doesn't take account of which is the begin point and which is the end point, unless it has different preferences for inclusion of the two points, and since you have a different degree of preference for the two it's clearly essential that you distinguish between the two. That certainly makes sense - at least, to me. Your distinction between the two points may not work the same way as mine, but to me, at some level, that doesn't matter - we both make a distinction; it may have different consequences for each of us, but the fact that there is a distinction allows each of us to assign consequences. Of course at a more detailed level we may disagree - or not.

edit: change /quite to /quote


Tom
Post #1454298
Posted Saturday, May 18, 2013 7:27 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Tuesday, May 27, 2014 7:17 PM
Points: 197, Visits: 459
But then, I am not a native English speaker, so what do I know?

I probably should have confessed earlier that neither is mine. But judging by your last name Hugo, it is probably the same as yours (dutch).

Do you really think that end-point inclusiveness is counter-intuitive?

It was me who said that because in my opinion endpoint issues are often a mess that I feel ambivalent about: I don't think there is an "intuitive" way of dealing with them. When I said I would exclude the endpoints I was thinking of the continuous case, because it is the more general one (e.g. pick a number between 0 and 1).

I think the best way of dealing with ranges (which is what I think BETWEEN is trying to do) would be in the way SQL partition functions are defined because there are often multiple contiguous ranges to deal with. The moment you try the single out a single one of them you've got an issue with (or at least decisions to make about) the endpoints. Hugo avoided this by using integers, but coming from a mathematical background I prefer approaches and solutions that also work in, and are (or can be seen as) intuitive from, a more general perspective.

(I hope this discussion isn't undermining future questions that Hugo has planned; I see there is a '1' at the end of the question title)
Post #1454300
« Prev Topic | Next Topic »

Add to briefcase «««45678»»»

Permissions Expand / Collapse