Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««1234»»»

Puzzle: vertex covers in SQL Expand / Collapse
Author
Message
Posted Monday, January 7, 2013 8:29 AM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Today @ 3:28 AM
Points: 536, Visits: 2,099
Celko,
How about me proposing this as a problem in Beyond Relational's TSQL Challenges? There are some very sharp minds there that could come up with something interesting.
Post #1403653
Posted Monday, January 7, 2013 12:51 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Today @ 2:28 PM
Points: 1,945, Visits: 2,900
I've never seen this problem, so I'm not even sure if I am solving it correctly. This code is probably quite inefficient, but I THINK it solves the Min Cover problem?


It is an NP-Complete problem. NOBODY has an efficient answer. I just thought it might be fun. I would add a bit more to my posting to get use some tools:

CREATE TABLE Grid
(v1 SMALLINT NOT NULL CHECK (v1 > 0),
v2 SMALLINT NOT NULL CHECK (v2 > 0),
v_cover SMALLINT DEFAULT 0 NOT NULL,
PRIMARY KEY (v1, v2),
CHECK (v1 < v2));

INSERT INTO Grid (v1, v2)
VALUES (1, 2), (1, 4), (2, 3), (2, 5), (2, 6);

CREATE VIEW Vertexes (v)
AS
SELECT v1, v_cover
FROM Grid
UNION
SELECT v2, v_cover
FROM Grid;

CREATE VIEW Vertex_Degrees (v, degree)
AS
SELECT V.v, COUNT(*) AS degree
FROM Grid AS G, Vertexes AS V
WHERE V.v IN (G.v1, G.v2)
GROUP BY V.v;

If you make the vertex cover flag a SMALLINT, then we can play games with it. We probably use the degree for a greedy algorithms.


Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #1403829
Posted Monday, January 7, 2013 5:38 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: 2 days ago @ 4:14 AM
Points: 3,618, Visits: 5,254
Joe,

When I was playing around with adding the additional column into the table, I chose to make all the vertices be a CHAR data type instead of SMALLINT. The reason being is that you're not likely to perform math (or would you?) on the identifier of a vertex.

This also eliminates the need for the CASTs in the solution I did provide (presumably making it just a tad faster).

Not sure why I couldn't get my version with the additional column to work. Well, actually it did work but it wasn't providing "as optimal" a solution as the solution I have posted using the TABLE VARIABLE. I'll probably try again at some point when I have some time, because even though it wasn't as optimal it was faster as I expected it might be.



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1403932
Posted Tuesday, January 8, 2013 2:37 AM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Today @ 3:28 AM
Points: 536, Visits: 2,099
I see two approaches:
1. going from all vertices selected to a solution (top-down)
2. going from no vertices selected to a solution (bottom-up)

In both cases you will eventually run into a number of vertices where you will not arrive at a solution in a reasonable amount of time, i.e. you cannot traverse the full search tree. This is where you would need to apply an algorithm that uses a random approach and provides the best solution found within a given amount of time. It will not necessarily be the theoretically best solution.

The random approach is divided into two (given N vertices):
1. starting with K = N-1 generate all combinations K of N vertices and check for coverage; once a solution is found for given combination stop and decrease K by one and repeat; stop this when the number of combinations exceeds some threshold;
2. generate K random vertices and check if it is a solution; if not then repeat; if yes then decrease K by 1 and repeat;
Stop when exceeding a given time threshold.

Now let me think about how to map this into an SQL approach...
Post #1404073
Posted Tuesday, January 8, 2013 2:56 AM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: 2 days ago @ 4:14 AM
Points: 3,618, Visits: 5,254
Michael Meierruth (1/7/2013)
Celko,
How about me proposing this as a problem in Beyond Relational's TSQL Challenges? There are some very sharp minds there that could come up with something interesting.


I would be interested to know how you would judge such a competition. Speed is important but so is a "nearer optimal" solution.

I've seen this in bin packing solutions, where you usually have to trade off speed for better bin utilization.



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1404084
Posted Tuesday, January 8, 2013 3:43 AM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Today @ 3:28 AM
Points: 536, Visits: 2,099
For a challenge like this the requirement would be to find the optimal solution(s).

For the perfomance test a sufficiently large number of vertices would be used such that the cpu time would not exceed 60 seconds for the best solution. Any solution exceeding 5 minutes would be eliminated. To prevent 'cheating' there would be two performance tests, one for which the data is made public before hand and one where the data is not made public until the challenge has been closed.
Post #1404102
Posted Tuesday, January 8, 2013 6:48 AM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Today @ 3:28 AM
Points: 536, Visits: 2,099
diamondgm (1/7/2013)
I've never seen this problem, so I'm not even sure if I am solving it correctly.
This code is probably quite inefficient, but I THINK it solves the Min Cover problem?
Scuze the messiness, I'm at work and did it in a rush

diamondgm,
For the arbitrary data below your code produces a solution with 11 vertices. Dwain's code produces a solution with 6 vertices (and which is correct). Thus, by definition, yours is not a min cover solution. Dwain's solution looks to be min cover (but still checking that).

values (1,2),(2,3),(3,4),(4,15),(14,15),(4,9),(9,13),(9,10),(10,12),(10,11),(7,11),(7,8),(8,9),(6,7),(5,6),(5,8),(4,5)

Just looking at this sample data leads me to believe that a good approach is to select vertices with the most edges first.
Post #1404214
Posted Tuesday, January 8, 2013 10:24 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Today @ 2:28 PM
Points: 1,945, Visits: 2,900
When I was playing around with adding the additional column into the table, I chose to make all the vertices be a CHAR data type instead of SMALLINT. The reason being is that you're not likely to perform math (or would you?) on the identifier of a vertex.,


Agreed; I did numeric because the book I was using numbered the nodes, and fell into that notation. I cannot imagine any math on them.


Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #1404369
Posted Tuesday, January 8, 2013 5:32 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: 2 days ago @ 4:14 AM
Points: 3,618, Visits: 5,254
Michael Meierruth (1/8/2013)

Just looking at this sample data leads me to believe that a good approach is to select vertices with the most edges first.


Obviously I agree with this as that's the approach I used.

There will probably be cases though, where this only leads to a good but suboptimal solution. It would be a factor of a "look ahead" to see if choosing the maximum coverage on this pass is worse than the final result of chossing a less-than-maximum coverage now, so as to arrive at a better solution down the line. Kinda like a chess game. The optimal next move is determined by the depth of look ahead. What looks like a great move at 2 levels of lookahead could lead to getting checkmated if you look ahead 4 moves.

BTW. Thanks for validating diamondgm's solution (and mine). I was meaning to but didn't get around to it.



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1404507
Posted Tuesday, January 8, 2013 5:50 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: 2 days ago @ 4:14 AM
Points: 3,618, Visits: 5,254
Here's another network you can validate it on, assuming you want to manually calculate the optimal cover (from the 3rd article in my signature line).



Here's the setup data (note where the CHECK constraint has been removed because originally this was designed as a directed graph so the vertex traversal can go from larger to smaller).

CREATE TABLE #Grid
(v1 CHAR(1) NOT NULL, -- CHECK (v1 > 0),
v2 CHAR(1) NOT NULL, -- CHECK (v2 > 0),
PRIMARY KEY (v1, v2)--,
--CHECK (v1 < v2)
);

INSERT INTO #Grid
SELECT 'A', 'B'
UNION ALL SELECT 'A', 'C'
UNION ALL SELECT 'A', 'D'
UNION ALL SELECT 'B', 'E'
UNION ALL SELECT 'B', 'F'
UNION ALL SELECT 'C', 'G'
UNION ALL SELECT 'C', 'H'
UNION ALL SELECT 'D', 'I'
UNION ALL SELECT 'D', 'J'
UNION ALL SELECT 'E', 'K'
UNION ALL SELECT 'F', 'L'
UNION ALL SELECT 'G', 'K'
UNION ALL SELECT 'G', 'M'
UNION ALL SELECT 'H', 'L'
UNION ALL SELECT 'I', 'M'
UNION ALL SELECT 'J', 'M'
UNION ALL SELECT 'K', 'N'
UNION ALL SELECT 'K', 'O'
UNION ALL SELECT 'L', 'O'
UNION ALL SELECT 'M', 'O'
UNION ALL SELECT 'O', 'Q'
UNION ALL SELECT 'Q', 'R'
UNION ALL SELECT 'J', 'P'
UNION ALL SELECT 'N', 'Q'
UNION ALL SELECT 'N', 'R'
UNION ALL SELECT 'P', 'S'
UNION ALL SELECT 'P', 'T'
UNION ALL SELECT 'Q', 'W'
UNION ALL SELECT 'Q', 'U'
UNION ALL SELECT 'R', 'U'
UNION ALL SELECT 'R', 'V'
UNION ALL SELECT 'S', 'V'
UNION ALL SELECT 'S', 'R'
UNION ALL SELECT 'T', 'V'
UNION ALL SELECT 'T', 'Y'
UNION ALL SELECT 'U', 'W'
UNION ALL SELECT 'U', 'X'
UNION ALL SELECT 'V', 'X'
UNION ALL SELECT 'V', 'Y'
UNION ALL SELECT 'X', 'W'
UNION ALL SELECT 'W', 'Z'
UNION ALL SELECT 'Y', 'Z'


My solution should return these covers:

V   V,R,S,T,X,Y
Q Q,N,O,R,U,W
M M,G,I,J,O
K K,E,G,N,O
P P,J,S,T
L L,F,H,O
D D,A,I,J
C C,A,G,H
B B,A,E,F
Z Z,W,Y




My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1404508
« Prev Topic | Next Topic »

Add to briefcase ««1234»»»

Permissions Expand / Collapse