How bad this in terms of performance; udf in WHERE

Question

How bad this in terms of performance; udf in WHERE

Viewing 7 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 1

CELKO (9/4/2014)
... The pattern for validation is '[12][0-9][0-9][0-9]-00-00' and '[12][0-9][0-9][0-9]-[01][0-9]-00'
Your mindset is still back in FORTRAN and BASIC with UDF and procedural programing. Aas you have been told, they have awful performance, do not port, do not support parallelism, etc.

And once again you post erroneous code The highlighted (bolded) pattern for validation will allow for INVALID MONTHS. Since when do we have a 13th, 14th, 15th, 16th, 17th, 18th, or 19th month in ANY year???

Not only that dates should be stored in date or datetime columns, not stored as character data. You are thinking in COBOL terms from many years ago (and yes, I know COBOL is still around today). This SQL Server, use the appropriate data types for the data being stored.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 2

CELKO (9/4/2014)
The highlighted (bolded) pattern for validation will allow for INVALID MONTHS. Since when do we have a 13th, 14th, 15th, 16th, 17th, 18th, or 19th month in ANY year
Agreed, This is part of the T-SQL weakness of LIKE and not having the ANSI/ISO predicate SIMILAR TO for a regular expression. I used this with T-SQL since it gives some protection and does not require CLR and extra overhead. I do the full check in DB2.
Not only that dates should be stored in date or datetime columns, not stored as character data.
No! these are names for intervals beyond the range of a DATE, which T-SQL cannot implement as a temporal data type.

DATE and DATETIME are data type that will hold a date (2014-09-05) or a date/time (2014-09-05 01:58:36.230). They aren't intervals, and no, MS SQL Server doesn't have a data type that supports temporal ranges. Get over it.

MS SQL Server didn't support some of the window functions that Oracle supported until SQL Server 2012 was released. The product is evolving, maybe not fast enough for you but they aren't try to please you. Give them time, they may finally provide support for intervals. Until then we'll have to make due with kludges to make those work. Oh, right, you hate kludges. Unfortunately, you have to work with what you have.

Again, use the proper data type for the data you are storing. You don't store dates as character strings. That's old school COBOL and you are old enough to know more than how to spell COBOL. By storing dates in a date data type you know you are going to get an invalid date, like 2013-02-29 or 2014-19-00.

TomThomson SSC Guru Points: 104768 More actions · Answer 3

CELKO (9/4/2014)
I like the MySQL convention of using double zeroes for months and years, That is 'yyyy-mm-00' for a month within a year and 'yyyy-00-00' for the whole year.

Yes, I've noticed before that you like that particular violation of ISO 8061. It's amusing to see how when you like something the standard doesn't matter, but in all other cases you treat the standard as inviolable.

Tom

TomThomson SSC Guru Points: 104768 More actions · Answer 4

Lynn Pettis (9/4/2014)

CELKO (9/4/2014)
The highlighted (bolded) pattern for validation will allow for INVALID MONTHS. Since when do we have a 13th, 14th, 15th, 16th, 17th, 18th, or 19th month in ANY year
Agreed, This is part of the T-SQL weakness of LIKE and not having the ANSI/ISO predicate SIMILAR TO for a regular expression. I used this with T-SQL since it gives some protection and does not require CLR and extra overhead. I do the full check in DB2.

why dont you do the full check in standard SQL? are you telling us that standard SQL forbids anything more complicated than a simple expression using a single LIKE operator in a CHECK constraint? If so it's clearly impossible to do anything relational in it because it has no mechanism for handling domain constraints. I've already commented on this thoroughly inadequate CHECK constraint and provided most of the T-SQL CHECK constraint that delivers the required domain constraint in a different thread. I see no point in repeating it here but it does make clear that it is easy to write the required constraint in T-SQL. I guess your attributing the problem to "the weakness of LIKE" is just an attempt to divert us from noticing that (a) you are too lazy to write a constraint that does what is desired, since the weakness of like doesn't prevent you from writing the correct constraint and (b) you are creating the problem yourself in teh first place by insisting on using a presentation format inside the DB instead of using the perfectly good DATE type.

Besides, expressing an interval as some ghastly utterly non-standard concoction of strings rather than a pair of dates (beginning and end) is nothing better than prejudice on your part in the first place; expressing the dates as dates makes expressing the domain constraint trivial - it's just CHECK(startdate <= enddate) - of course additional constraints on startdate and on enddate individually can be used to specify that interval boundaries are month boundaries or year boundaries or any other sort of boundaries without the gratuitous violations of ISO 8061 that your method requires, without wasting a lot of storage (your 16 byte interval encoding requires more than twice the space of the sensible representation), and without holding presentation data in the database instead of leaving presentation to the application.

Tom

TomThomson SSC Guru Points: 104768 More actions · Answer 5

CELKO (9/4/2014)
The highlighted (bolded) pattern for validation will allow for INVALID MONTHS. Since when do we have a 13th, 14th, 15th, 16th, 17th, 18th, or 19th month in ANY year
Agreed, This is part of the T-SQL weakness of LIKE and not having the ANSI/ISO predicate SIMILAR TO for a regular expression. I used this with T-SQL since it gives some protection and does not require CLR and extra overhead. I do the full check in DB2.
Not only that dates should be stored in date or datetime columns, not stored as character data.
No! these are names for intervals beyond the range of a DATE, which T-SQL cannot implement as a temporal data type.

Why do we want names here? I can see no point to them.

Presumably the standard sql interval supports operators/functions that deliver its start and its finish and its length, and we certainly want that in however we represent the interval in T-SQL. Codd's atomicity principle requires that any operator or function delivering a component of an atom is a built in function of teh database system, not something built by the user; since the three functions I mentioned are not all built-in functons in the SQL Server database for your awful representation, you have chosen to violate the atomicity principle instead of sticking to relational principles, and your propsed table is not even in first normal form. To me that seems the wrong approach. Representing an interval by storing its start and end as two different attributes of a relation does not violate the atomicity principle, and does not prevent normalisation. Until T-SQL gains an interval type, that two part representation and the two alternative representations storing start or end and length will be the valid ways of representing an interval in T-SQL, and your representation will be an unrelational abomination.

Tom

TomThomson SSC Guru Points: 104768 More actions · Answer 6

CELKO (9/5/2014)
Why do we want names here? I can see no point to them.
To help preserve atomicity, of course. 😛

Clearly then you haven't read this passage:-

E.F.Codd (1990)

From a database perspective, data can be classified into two types: atomic and compound. Atomic data cannot be decomposed into smaller pieces by the DBMS (excluding certain special functions). Compound data, consisting of structured combinations of atomic data, can be decomposed by the DBMS.

In the relational model there is only one type of compound data: the relation. The values in the domains on which each relation is defined are required to be atomic with respect to the DBMS.[/quote]

Quite clearly also you haven't read the debate between Codd and Date about what atomicity means. Or if you have read any of that, yo have failed to understand any of it.

Putting it simply; you are introducing a domain whose values essentially have multiple components. The functons which extract these essential components from values in the domain are not certain special function built into the DBMS. You clearly intend to be able in the DBMS to extract these functions, since you haven't suggested that every time we want to discover, for example, whether a particular date lies within a particular interval we have to exit the DBMS and let the application decide (and if you had made that suggestion you would have built such a horrible performance disaster that no sane person would be prepared to accept it). But the DBMS in which you are proposing to do this doesn't have those functions built in, you are inventing them anew, so they are not just "certain special functions" but functions added by a user to the DBMS. You are violating atomicity as described in the Codd quote above, and even as finally accepted by Date (who at one point appeared te want to do what you are proposing). Certainly Michael Stonebreaker decided to change the rules for Atomicity in Postgres, but he did it with his eyes open and freely admitting it, unlike you. Date and Darwen also decided to change the rules in their 3rd Manifesto but, although Date argued extensively with Codd about what atomicity should mean, the only fundamental difference in the end was that Data wanted Codd's compound data, tables, to be usable in within tables, with typing matching table signatures. Of course standard Sql has added the interval type to the database and added special functions to cary out the necessary decomposition so that it is an atomic type. I find it appalling that you, claiming to understand relational theory, have completely failed to understand this elementary logic.

Like a lot of noobs, you confuse "atomic" with "scalar";

And there, by the use of the word "noob", you make a complete fool of yourself and demonstrate that you are not sufficiently bright to check the public record before you put your foot in your mouth.

I presented the thesis for my first masters (by research) degree 15 years before 1982. In 1982 you were learning (or failing to to learn?) basic CS at Georgia Tech. I was in correspondance (paper, not electronic, of course) and telephone contact now and again with IBM UK from 1968 until the late 70s, was doing academic research in information retrieval in 1969-70 (having gone temporarily back to academe from industry where I had mainly been concerned with language matters), became a chartered engineer in computer and software engineering in the early 70s (10 years before your Georgia Tech time), and was doing things with relational theory, relational calculi, and normalisation while also running large data communications and interworking projects (interworking - think about isolation and distributed commitment in for example TP systems) from 1972 onwards.

I think it quite clear that you, not I, are the noob in respect of experience of working at, carrying out research in, or even just studying relational science, RDBMS, and related topics.

After 1982 (when you started semi-serious learning) I learnt to play with Oracle (awful!) and then had a few years in charge of the architecture and design of all software in a project which designed a relational calculus language which was completely declarative except at transaction boundaries and completely relational (so absolutely nothing like SQL) and built a working system which demonstrated linear speed up from 1 to 16 (the largest we built) machines with a declarative operating system connected on a network designed for graph reduction (with more pennies we would have built a larger network) but then the marketeers realised that we had to have an SQL variant rather than a relational language; we got into bed with industrial and academic partners throughout Europe, played with Ingres and PostGres and eventually Oracle, created patents on all sorts of RDB and TP and language issues, and generated useful products for the various industrial partners. Over those years I published a lot of Relational Database research mainly in the form of Manchester University Research Reports (because although I worked in industry I was connected through shared research and teaching with MU), managed for my sins to be named as a contributor on the first Haskell Report, was elected a Fellow of the Institute of Mathematics and its Applications for my contibution to applied mathematics research (all in queuing theory and relational theory) and elected a Fellow of the Institution of Electrical Engineers. After that I worked as in various senior SE roles (divisional chief designer and technology group leader, director and VP of R&D, chief architect and technical director), mainly with a heavy RDBMS component. So I think that my time after you started leaning the CS game at Georgia Tech was spent doing serious research in RDBMS related topics like relational theory and in other areas of CS and acquiring serious experience in using that stuff to satisfy real customers, whereas yours was spent working with standards groups who were scared to fix IBM's mistakes or to make any significant advance and couldn't understand Codd's model even if a good teacher explained it to them, writing pedestrian text books, and acquiring an outstanding reputation for obnoxiousness in online forums.

Now tell me who is the noob?

Tom

TomThomson SSC Guru Points: 104768 More actions · Answer 7

CELKO (9/6/2014)
You got my job history completely wrong. I have worked in more industries than most people, my academic work is still quoted in text books and my trade press titles have stayed in print and translations for decades.

That's embarrassing - I fell into exactly the trap I accused you of falling into, not checking the public record adequately. Apologies for the errors.

I still think your awful "interval" object for T-SQL is a hopeless violation of relational principles; of course the version in the standard isn't, because it makes it part of the RDBMS rather than a rickety tower balanced on top of it.

It's pretty clear that we agree on two things, anyway; the quality of Ada (or rather its lack) and the mistakeness of Chris Date's ideas (on atomicity, at least; and, I suspect, on NULLs as well, although that one didn't come up in the current discussion).

Where is the footnote?

I don't understand the question. The text I quoted is the beginning of section 1.2.1 of Codd's book "The Relational Model fo Database Management - Version 2"; there isn't any footnote. I regard this is being a much simpler and clearer statement of his ideas about atomicity than that in his 1979 paper "Extending the Database Relational Model to Capture More Meaning", but the ideas are indeed the same despite being expressed in plain English instead of in mathematical language and I thought it more useful to to quote this concise summary rather than to quote almost the whole of the earlier paper.

Tom