March 11, 2004 at 9:26 am
Anyone knows how to create a mathematical function for the following purpose:
Problem: A table contains rows of a single field. We wish to determine the number of combinations of these chosen 2 at a time.
examples:
if the table contained rows a, b and c the answer would be three rows: ab, ac, bc
if the table contained rows a, b, c and d the answer would be six rows: ab, ac, ad, bc, bd, cd
if the table contains n rows, the answer has (n!)/(2*(n-2)!)
if I have a table of 26 letters, the result set should contain 325 rows.
How to create a user-defined function to generate the combinations of any arbitrary populated table?
Thanks!
Frank
March 11, 2004 at 11:08 am
In general the formula for a combination of M things taken N at a time is:
C(M,N)=M!/(N!)(M-N)!
For example:
In numbers, C(5,3) = (5*4*3*2*1)/(3*2*1)(2*1) = 60/6 = 10. These numbers mean that there are 60 ways to order any three of five objects, and for each particular group of three there are six ways to order them; so the number of distinct groups of three out of five objects is 60 divided by 6.
Francis
March 11, 2004 at 11:18 am
Thanks. You are right about the formula. Any ideas how to write it in sql?
March 11, 2004 at 11:26 am
You will find it difficult to create a UDF for this as there are some limitations in SQL data types. For example, 21! is creater than the largest number a biginit in sql can hold ; 20! = 2432902008176640000. Also if you use recusion, you are limited to 32 calls so theoretically even if you could store a number bigger than 21! and you used recusion in the function you couldn't calculate anything bigger than 32! I can probably come up with a function with a bit of thought.. stay tuned.
Francis
March 11, 2004 at 1:39 pm
You don't need a function to do this, just a simple self-join.
SELECT a.Col + b.Col
FROM Table a JOIN Table b ON a.Col < b.Col
--Jonathan
March 11, 2004 at 1:48 pm
Thank you very much Jonathan!
March 11, 2004 at 2:03 pm
I missread the question. I thought you wanted the number, not the actual rows. That is I thought you wanted to supply the numbers 5, 2 and get the answer 10 and opposed to supply a table with5 rows and get a result set with 10 rows. This was like one of the QOD that you just don't get even after reading it a couple of times. Of course now its obvious.![]()
Francis
March 11, 2004 at 3:08 pm
Thanks fhanlon.
Johnathan, There is still a problem with simple self-join. If the table has duplicates, the total number of return will not match the binomial coefficient sequence by self-join. In my case, aa is legitimate for two rows a and a. Any ideas to count dups and combine them?
Thanks.
Frank
March 11, 2004 at 3:13 pm
![]()
The "problem," then is not with the code but instead with your examples. How about:
SELECT a.Col + b.Col
FROM Table a JOIN Table b ON a.Col <= b.Col
--Jonathan
March 11, 2004 at 3:34 pm
You will miscount unique rows if you use this script. It can't distinguish unique rows from dups. For example, you have five rows with values a, b, c, d and d. The return should be ten rows as:
------
ab
ac
bc
ad
bd
cd
ad
bd
cd
dd
Using SELECT a.Col + b.Col
FROM Table a JOIN Table b ON a.Col <= b.Col, the return is
------
aa
ab
bb
ac
bc
cc
ad
bd
cd
dd
dd
ad
bd
cd
dd
dd
(16 row(s) affected)
Using SELECT a.Col + b.Col
FROM Table a JOIN Table b ON a.Col < b.Col, the return is
------
ab
ac
bc
ad
bd
cd
ad
bd
cd
(9 row(s) affected)
March 11, 2004 at 7:18 pm
Okay, now I finally know what you're trying to do here.
In SQL, it's a good idea to have a primary key... ![]()
create table tab(
col char)
insert tab select 'a' union all select 'b' union all select 'c' union all select 'd' union all select 'd'
SELECT IDENTITY(int) Id, Col
INTO #Tab
FROM Tab
SELECT a.Col + b.Col
FROM #Tab a JOIN #Tab b ON a.Col <= b.Col and a.Id < b.Id
--Jonathan
March 12, 2004 at 8:10 am
Thanks Johnathan.
In general, the issue is related to a binomial coefficient theory. The return rows should be nCm, read as n choose m. Here is the list of values:
m n nCm
------- --------- -------
2 2 1
3 2 3
4 2 6
5 2 10
6 2 15
7 2 21
Using you script can’t get correct returns. For example if a table has 6 rows and two pairs of dups, then return rows should be 15 as
col
----
a
b
c
d
d
a
(6 row(s) affected)
----
ab
ac
bc
ad
bd
cd
ad
bd
cd
dd
aa
ab
ac
ad
ad
(15 row(s) affected)
However, your script only returns 11 rows as follows
----
ab
ac
bc
ad
bd
cd
ad
bd
cd
dd
aa
(11 row(s) affected)
March 12, 2004 at 9:13 am
Okay, how about something like this?
SELECT a.Col + b.Col
FROM #Tab a JOIN #Tab b ON a.Id <> b.Id AND a.Col <= b.Col AND NOT (a.Col = b.Col AND a.Id > b.Id)
--Jonathan
March 12, 2004 at 9:29 am
Thank you very much! You got all right. Appreciate ...![]()
March 12, 2004 at 11:20 pm
SELECT a.Col + b.Col
FROM Table a JOIN Table b ON a.Col < b.Col
UNION
SELECT a.Col + b.Col
FROM Table a JOIN Table b ON a.Col = b.Col
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply