SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Assigning categories to values 1


Assigning categories to values 1

Author
Message
Alex Fekken
Alex Fekken
Old Hand
Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)

Group: General Forum Members
Points: 385 Visits: 460
If someone asks me to pick a number between 1 and 3 I don't think I'm forced to choose '2'.

Nice attempt at a "proof by intimidation" but I would see it as an invitation to pick a non-integer! :-)

If you change the numbers to 1 and 30 (and explicitly ask for an integer) then I doubt that 1 and 30 would be as likely outcomes as the other integers in the range.

But my main point was to point out the ambiguity, not that one choice was necessarily more correct than the other.

I totally agree with you that there are worse "features" in SQL (and many other languages) and with Hugo that complaining is a consequence of caring...

(edit: I would not even calls this a 'bug', like I would call certain other "features", but simply a 'bad choice')
wolfkillj
wolfkillj
SSCrazy
SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)

Group: General Forum Members
Points: 2732 Visits: 2582
This discussion reaffirms my conscious choice not to use BETWEEN in anything other than throwaway queries. It's a bit of syntactical sugar of dubious value - really, how much harder is it to write or read "x >= y AND x <= z"? - that admits a greater possibility of misunderstanding and unintended effects.

Jason Wolfkill
Blog: SQLSouth
Twitter: @SQLSouth
David Conn
David Conn
Hall of Fame
Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)

Group: General Forum Members
Points: 3524 Visits: 1183
Some Languages use the word "Range" which is probably more descriptive than "Between". I am happy with "Between" as it's usually what I'm looking for. I can't see a use for a Function that excludes the boundaries.

David
Hugo Kornelis
Hugo Kornelis
SSCoach
SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)

Group: General Forum Members
Points: 18951 Visits: 12426
Alex Fekken (5/17/2013)
If someone asks me to pick a number between 1 and 3 I don't think I'm forced to choose '2'.

Nice attempt at a "proof by intimidation" but I would see it as an invitation to pick a non-integer! :-)

If you change the numbers to 1 and 30 (and explicitly ask for an integer) then I doubt that 1 and 30 would be as likely outcomes as the other integers in the range.


I have frequently heard high school exam grades being refered to as numbers between 1 and 10. I have never heard anyone call them a number between 0 and 11.
But then, I am not a native English speaker, so what do I know?

However, with other data types than integer values, there is indeed amibiguity in the English use of "between". On a numeric (non-integer) scale, when a value is between 11 and 35, I expect the values 11.000 and 35.000 to be included, just as (obviously) 11.001 and 34.999, but not 35.001. Just as the SQL operator BETWEEN is defined for numeric data types.
However, with points in time, when people say between now and May 25, I expect May 25, 4:28 PM to be still included. For SQL with datetime and similar data types, the cutoff point would be May 25, midnight - one second later would be outside the range.


(edit: I would not even calls this a 'bug', like I would call certain other "features", but simply a 'bad choice')

Agreeed. This is definitely not a bug. I would call this a non-intuitive implementation choice. (And note that this choice is not made by the Microsoft team; they only followed what has been defined in the ISO/ANSI standard for SQL).


Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Tom Thomson
Tom Thomson
One Orange Chip
One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)

Group: General Forum Members
Points: 26164 Visits: 12500
Hugo Kornelis (5/18/2013)
However, with other data types than integer values, there is indeed amibiguity in the English use of "between". On a numeric (non-integer) scale, when a value is between 11 and 35, I expect the values 11.000 and 35.000 to be included, just as (obviously) 11.001 and 34.999, but not 35.001. Just as the SQL operator BETWEEN is defined for numeric data types.
However, with points in time, when people say between now and May 25, I expect May 25, 4:28 PM to be still included. For SQL with datetime and similar data types, the cutoff point would be May 25, midnight - one second later would be outside the range.

That's because "between now and May 25" in English treats May 25 as a day with duration, not as a point in time. If something is to happen between now and 25 May it can happen on 25 May, any time on 25 May. Same with months, years, centuries. But "between X and Y" when X and Y are times (with or without date attached) requires X and Y to be treated as points without duration. There are no things with duration in SQL, the SQL date type corresponds more to a time in (UK) English than to a day - its semantics is simply that it is the restriction of the datetime2 type to values which represent the early boundary of a day in the time-zone for which those values are correct.

It could get quite amusing to try to work how between works in various domains in English: is it direction-free, does it include the endpoints, do the endpoints have extent, does the domain of the endpoints have an order and if so is it partial or total, how many dimensions does the domain have, what are its topological properties). But all (current) SQL types are one dimensional and contain only extent-free values, so there it isn't anything like as complicated as English.

(edit: I would not even calls this a 'bug', like I would call certain other "features", but simply a 'bad choice')

Agreed. This is definitely not a bug. I would call this a non-intuitive implementation choice. (And note that this choice is not made by the Microsoft team; they only followed what has been defined in the ISO/ANSI standard for SQL).

Of course it's not a bug in SQL Server. It may or may not have been a bad choice in the standard or in IBM's original SEQUEL language definition or wherever the choice was originally made (directionality was a bad choice, end-point inclusiveness a good one, in my view), but conforming to the standard is certainly not a bug.

Do you really think that end-point inclusiveness is counter-intuitive?

edit: get quote tags right

Tom

Mike Dougherty-384281
Mike Dougherty-384281
SSC Eights!
SSC Eights! (846 reputation)SSC Eights! (846 reputation)SSC Eights! (846 reputation)SSC Eights! (846 reputation)SSC Eights! (846 reputation)SSC Eights! (846 reputation)SSC Eights! (846 reputation)SSC Eights! (846 reputation)

Group: General Forum Members
Points: 846 Visits: 944
We shouldn't use English as a reference for language implementation. Almost nothing is self-evident.

"Between Baltimore and Boston, which is your favorite city?"
- you have two enumerated options, select one or the other
- you have a geographical area bounded by only two non-exact endpoints. Many would agree that New York City is in this range. Is Philadelphia? Is Harrisburg? Is Chicago? How far east-west can be assumed from north-south endpoints?

Granted this is a stretch for the discussion of "between" since it isn't intended to work with spatial data types, but it does illustrate the context-dependency of English.

The beauty of _Structured_ Query Language is that there should be no ambiguity.

here's another example of the difficulty of language from a discussion of AI:
"I like to eat pizza with sausage"
"I like to eat pizza with a fork"
"I like to eat pizza with a friend"
A meaning of 'with' that works for one of those sentences leads to confusion for the other two.
Tom Thomson
Tom Thomson
One Orange Chip
One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)

Group: General Forum Members
Points: 26164 Visits: 12500
Mike Dougherty-384281 (5/18/2013)
We shouldn't use English as a reference for language implementation. Almost nothing is self-evident.

"Between Baltimore and Boston, which is your favorite city?"
- you have two enumerated options, select one or the other
- you have a geographical area bounded by only two non-exact endpoints. Many would agree that New York City is in this range. Is Philadelphia? Is Harrisburg? Is Chicago? How far east-west can be assumed from north-south endpoints?

Granted this is a stretch for the discussion of "between" since it isn't intended to work with spatial data types, but it does illustrate the context-dependency of English.

The beauty of _Structured_ Query Language is that there should be no ambiguity.

here's another example of the difficulty of language from a discussion of AI:
"I like to eat pizza with sausage"
"I like to eat pizza with a fork"
"I like to eat pizza with a friend"
A meaning of 'with' that works for one of those sentences leads to confusion for the other two.
.

Using English as a reference is where SQL started from. It does lead to problems, and some of the decisions as to how to use some English words have resulted in pretty counter-intuitive stuff. It might be better to avoid words like "with" and "between" and "outer" and "merge" which tend to guide English speakers in a certain direction if that's not the direction intended.
There's even a problem in using words like "reduce" and "map" and "function" in their strict mathematical sense, since (a) most English speakers haven't a clue what that strict mathematical sense is and (b) they are used differently in different parts of mathematics. "Relation" is a real beauty, too - I laugh whenever some ignoramus tells us that the relations of relational database theory are the mathematical relations of set theory (which they clearly are not - a relation may be a set of maps on a list of named domains, but it is certainly not a set of tuples taken from the cross product of an ordered list of domains), and then tells us in the next breath that we must not refer to column numbers.

The trouble is though that many people need something that's almost familiar to hang onto in order to understand the language, and maybe words are all they've got. Not everyone can be a Turing and programme directly in machine code, a church and programme in recursive function notation, or a Curry and programme in lambda calculus notation (I've modified paper tape binaries with mask and hand punch, but that is NOT how to write programs). So programming language designers use words from natural language. It's not always English; but English is the most common choice, partly because English was the international language used for computational theory for a couple of decades before there were any computers and partly because the UK and the US, both mainly English-speaking nations, were the first and second nations to have computers, well before everyone else.

Tom

Hugo Kornelis
Hugo Kornelis
SSCoach
SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)

Group: General Forum Members
Points: 18951 Visits: 12426
L' Eomot Inversé (5/18/2013)
[Of course it's not a bug in SQL Server. It may or may not have been a bad choice in the standard or in IBM's original SEQUEL language definition or wherever the choice was originally made (directionality was a bad choice, end-point inclusiveness a good one, in my view), but conforming to the standard is certainly not a bug.

Do you really think that end-point inclusiveness is counter-intuitive?


Nope, not at all. I was responding to the sub-discussion about whether or not "between" in the English language would generally be interpreted to be inclusive or exclusive. The biggest part of my message focused on what I believe to be the generic interpretation (inclusive) - and then I decided to show how dangerous it can be to try to find a parallel between a computer language and a human language by adding a contrasting example.

I completely agree with your assessment. I am pretty sure that a big majority of people that speak English at the same level as I do (not native but, if I may say so, better than the average non-native English speaker) would normally interpret "between" to be inclusive of both begin and end point, and non-directional. I would consider not including the end point to be counter-intuitive (and not including the begin point even more so - and please don't ask me how to define begin and end point for a concept I just defended to be non-directional, I know it makes no sense, and yet it does. To me.)


Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Tom Thomson
Tom Thomson
One Orange Chip
One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)

Group: General Forum Members
Points: 26164 Visits: 12500
Hugo Kornelis (5/18/2013)
I would consider not including the end point to be counter-intuitive (and not including the begin point even more so - and please don't ask me how to define begin and end point for a concept I just defended to be non-directional, I know it makes no sense, and yet it does. To me.)

To me it does make sense. The begin point is the first mentioned, if I say "between 10 and 100" ten is the begin point and one hundred the end point while if I say "between 100 and 10" one hundred is the begin point and ten the end point. Since the whole thing is non-directional, between-ness doesn't take account of which is the begin point and which is the end point, unless it has different preferences for inclusion of the two points, and since you have a different degree of preference for the two it's clearly essential that you distinguish between the two. That certainly makes sense - at least, to me. Your distinction between the two points may not work the same way as mine, but to me, at some level, that doesn't matter - we both make a distinction; it may have different consequences for each of us, but the fact that there is a distinction allows each of us to assign consequences. Of course at a more detailed level we may disagree - or not.

edit: change /quite to /quote

Tom

Alex Fekken
Alex Fekken
Old Hand
Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)

Group: General Forum Members
Points: 385 Visits: 460
But then, I am not a native English speaker, so what do I know?

I probably should have confessed earlier that neither is mine. But judging by your last name Hugo, it is probably the same as yours (dutch).

Do you really think that end-point inclusiveness is counter-intuitive?

It was me who said that because in my opinion endpoint issues are often a mess that I feel ambivalent about: I don't think there is an "intuitive" way of dealing with them. When I said I would exclude the endpoints I was thinking of the continuous case, because it is the more general one (e.g. pick a number between 0 and 1).

I think the best way of dealing with ranges (which is what I think BETWEEN is trying to do) would be in the way SQL partition functions are defined because there are often multiple contiguous ranges to deal with. The moment you try the single out a single one of them you've got an issue with (or at least decisions to make about) the endpoints. Hugo avoided this by using integers, but coming from a mathematical background I prefer approaches and solutions that also work in, and are (or can be seen as) intuitive from, a more general perspective.

(I hope this discussion isn't undermining future questions that Hugo has planned; I see there is a '1' at the end of the question title)
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search