Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Exploring Recursive CTEs by Example

By Dwain Camps, (first published: 2012/07/17)

While I have been often chided by many SQL folks with much more experience than me for my fascination with recursive CTEs, I have nonetheless continued to explore them.  I have cast my net far and wide across the Net for examples that are relevant for study.  Alas, I have found that there seems to be a dearth of good examples out there to guide the beginner.

For those of us that are recursively challenged, the Books On Line description of the process is technically accurate but leaves me wanting.  Furthermore, I have found only around 3 examples in the BOL article Using WITH common_table_expression (Transact-SQL), and all of those seem to revolve around hierarchies of one sort or another (e.g., managers/employees or products).

Many of the other sites I’ve found rehash these same hierarchy examples.  Honestly, if I run across another self-proclaimed SQL guru regurgitating the same hierarchies in their blogs, without adding any additional value, I think I may retch.  C’mon guys (and gals), how about a little originality… please!

Besides the recursively challenged or beginners that need some guidance in developing recursive CTEs, the audience for this article is also more experienced professionals that may wish to explore some less orthodox examples, to see how SQL may be applied in ways that may not have previously been attempted.  This article is not intended to address performance of recursive CTEs, even though the performance of one of our examples is explored to a certain degree.

There are a few well-hidden, interesting examples that I’d like to bring to light here and well-deserved kudos should go to their authors:

If you know me, you may know that I’ve also had a script and an article published by SSC with a few more examples:

Finally, here are some additional links that, while they are based on the same examples I complained about above, provide some very insightful information on how recursive CTEs operate behind the scenes:

Many thanks are owed to Paul White (SQL Kiwi) and Chris Morris (ChrisM@Work) of this site for pointing me in the right direction on these articles and forum posts.

To avoid being accused of lack of originality, we shall attempt to explore the application of recursive CTEs to problems in the fields of number theory, numerical analysis, financial applications and operations research.  Before that description scares off those that have trouble balancing their checkbooks, let me most adamantly proclaim that I am no mathematician!  These examples will be simplistic and hopefully direct to the point where there is something that can be learned from them.

So without further ado, let us begin our exploration of recursive CTEs by example.

Number Theory

In high school, most of us first become aware of elementary number theory by being introduced to the Fibonacci sequence.  To refresh your memory, this is the sequence of numbers that progresses as follows:

1, 1, 2, 3, 5, 8, 13, 21, …

Start with the numbers 1 and 1, and the next number in the sequence is calculated as the sum of the prior two numbers.  The recursion formula for this is (FN=Fibonacci Number):

FNn = FNn-1 + FNn-2                            for n > 2

-- Generate Fibonacci numbers
DECLARE @fn1 BIGINT, @fn2 BIGINT
SELECT @fn1 = 1, @fn2 = 1

;WITH Sequence AS (
    -- Put both initial numbers into the anchor leg
    SELECT
        n      = 1,
        fn     = @fn1,
        [n-1]  = CAST(NULL AS BIGINT),
        Ratio  = CAST(NULL AS FLOAT)
    UNION ALL
    SELECT
        n      = 2,
        fn     = @fn2,
        [n-1]  = @fn1,
        -- Multiply * 1. to avoid integer division
        Ratio  = 1. * @fn2 / @fn1 
    UNION ALL
    -- Recursive leg
    SELECT
        n      = n + 1,
        fn     = fn + [n-1],
        [n-1]  = fn,
       Ratio  = (fn + [n-1]) / CAST(fn AS FLOAT)
    FROM Sequence
    WHERE n BETWEEN 2 AND 90)
SELECT n, fn, Ratio
FROM Sequence

Fibonacci numbers grow quite large very quickly, hence the reason for CASTing them to the BIGINT datatype.  Even so, you can see that we must stop prior to reaching the default recursion limit of SQL.

n    fn                                 Ratio

1    1                                  NULL

2    1                                  1

3    2                                  1

4    3                                  2

5    5                                  1.5

6    8                                  1.66666666666667

7    13                                1.6

8    21                                1.625

9    34                                1.61538461538462

<snip>

90   2880067194370816120  1.61803398874989

91   4660046610375530309   1.61803398874989

92   7540113804746346429  1.61803398874989

The significance of the ratio column is to show how the ratio of successive Fibonacci numbers converges on the Golden Ratio (1.6180339887).  So how’s that for a history lesson?

Lesser known of course is the Lucas sequence of numbers which, like Fibonacci’s sequence, shares the same recursion formula but starts with a different pair of initial numbers.  Hence the reason for calling my recursive CTE simply Sequence, since n1 and n2 may be set to any numbers upon initialization.  The Lucas sequence is typically started with initial numbers of 2 and 1.

Factorials, while not strictly speaking number theory, primarily manifest themselves in statistics but also in many other areas of mathematics.  These are quite simple to generate with a recursive CTE.  Recall that:

n! (n factorial) = 1 * 2 * 3 * …  * (n-1) * n 

Once again, the rate of growth requires stopping the CTE at 20 recursions even at BIGINT precision.

-- Generate first 20 Factorials
;WITH Factorial (n, factorial) AS (
    SELECT 1, CAST(1 AS BIGINT)
    UNION ALL
    SELECT n + 1, (n + 1) * factorial
    FROM Factorial WHERE n<20)
SELECT n, factorial FROM Factorial

Results:

n             factorial

1              1

2              2

3              6

4              24

5              120

6              720

7              5040

<snip>

17            355687428096000

18            6402373705728000

19            121645100408832000

20            2432902008176640000

Notice in both of these examples, the need to CAST within the anchor and recursive legs of the CTE.  SQL requires that all columns generated by the anchor and recursive legs must match in datatype.

Geometric and Arithmetic Sequences and Progressions

Arithmetic, geometric and power series typically all share the fact that they can be generated by a recursion formula.  I studied statistics, calculus, numerical analysis and some other esoteric branches of mathematics in university and I can assure you they’re everywhere.  We shall provide only two examples here, of the many thousands that exist in mathematical treatise, for the sake of brevity.  Once you grasp the method and constraints, any alternative sequence or progression is easily implemented using this approach.

Consider the power series: 11 + 22 + 33 + 44

This power series does not converge; in fact it grows large quite quickly.  Hence we will store the result as a BIGINT

-- Generate first 15 of Power Series: 1+2^2+3^3+4^4 etc.
;WITH Power (n, result) AS (
    SELECT CAST(1 AS BIGINT), CAST(1 AS BIGINT)
    UNION ALL
    SELECT n + 1, result + POWER(n + 1, n + 1)
    FROM Power
    WHERE n < 15)
SELECT n, result
FROM Power

Results:

n             result

1             1

2             5

3             32

4             288

5             3413

6             50069

7             873612

8             17650828

9             405071317

10           10405071317

11           295716741928

12           9211817190184

13           312086923782437

14           11424093749340453

15           449317984130199845

More than 14 invocations of the CTE’s recursive leg yields an arithmetic overflow error.

Another example: 11 + 1/22 + 1/33 + 1/44:

-- Generate first 99 of Power Series: 1+1/2^2+1/3^2+1/4^2 etc.
;WITH Power (n, result) AS (
    SELECT 1, CAST(1 AS FLOAT)
    UNION ALL
    SELECT n + 1, result + 1. / SQUARE(n + 1)
    FROM Power
    WHERE n < 99)
SELECT n, result
 FROM Power

Results:

n             result

1              1

2              1.25

3              1.36111111111111

4              1.42361111111111

5              1.46361111111111

6              1.49138888888889

<snip>

97            1.63467774649787

98            1.63478186977983

99            1.63488390018489

In this case, this series converges on a number near 1.635 so we could continue the computation as far as we like, however we stopped at 99 due to the default setting for MAXRECURSION.

Numerical Analysis

One of the first topics we studied in numerical analysis is the Newton-Raphson approach to solving for the root of a function.  A root is the point on the graphical representation of the function that passes through zero of the y-axis.  Here is where we must delve briefly into the calculus, but we’ll try to keep this as simple as possible.

The recursive formula for Newton-Raphson is:

Xn+1 = Xn – F(Xn)/F’(Xn)

F’(X) is the derivative of F(X) and is quite simple to determine for polynomial functions.  Here’s a quick example and the briefest of possible explanations.

F(X) = 3X3 + 2X2 + 5X + 4

The derivative is:

F’(X) = 9X2 + 4X + 5          where 9 = 3*3, 4 = 2*2 and 5=5*1

To differentiate the function, we take the exponent times the coefficient and reduce the exponent by 1 for each factor in the polynomial.  The constant is understood to be a coefficient multiplied by X0 (=1), so it drops out upon differentiation.

To kick start the process, you must provide an initial guess (X0) for the root and then iterate through the recursive formula until F(X) is zero (or sufficiently close).  We’ll define a table that contains coefficients of our functions (the first record is equivalent to our example above) along with an ID and the initial guess (x0).

DECLARE @Eqns TABLE
    (ID INT         -- Polynomial equation to solve for root
    ,Coefx3 FLOAT   -- Coefficient of x^3
    ,Coefx2 FLOAT   -- Coefficient of x^2
    ,CoefX FLOAT    -- Coefficient of x
    ,C FLOAT        -- Constant
    ,x0 FLOAT)      -- Initial guess

INSERT INTO @Eqns
 SELECT 1, 3, 2, 5, 4, 0        -- f(x) = 3*x^3 + 2*x^2 + 5*x + 4
 UNION ALL SELECT 2, 1, 0, -2, 2, -10  -- 0 -- f(x) = x^3 - 2*x + 2

;WITH NewtonRaphson (ID, [Polynomial Function], [Try], x, [f(x)]
    ,Coefx3, Coefx2, CoefX, C, xn)
    AS (                       -- Passthru variables for iterations
    SELECT ID, Equation
        ,1 , x0, [f(x)]        -- Try, current x, value of f at x
        ,Coefx3, Coefx2, CoefX, C    -- Coefficients of f(x)
        ,x0 - [f(x)] / [f'(x)]  -- Newton's formula
       FROM @Eqns
       CROSS APPLY (
        SELECT [f(x)] = Coefx3*POWER(x0, 3) + Coefx2*SQUARE(x0) + CoefX*x0 + C
            , [f'(x)] = Coefx3*3*SQUARE(x0) + Coefx2*2*x0 + CoefX
            ,Equation = CAST(Coefx3 AS VARCHAR)+'*x^3 + '+CAST(Coefx2 AS VARCHAR)+
                '*x^2 + ' + CAST(Coefx AS VARCHAR) + '*x + ' + CAST(C AS VARCHAR)) fcn
       UNION ALL
       SELECT ID, [Polynomial Function]
        ,[Try]+1, xn, fcn.[f(x)] -- Try, current x, value of f at x
        ,Coefx3, Coefx2, CoefX, C -- Coefficients of f(x)
        ,xn - fcn.[f(x)] / [f'(x)] -- Newton's formula
       FROM NewtonRaphson n
       CROSS APPLY (
              SELECT [f(x)] = Coefx3*POWER(xn, 3) + Coefx2*SQUARE(xn) + CoefX*xn + C
                     , [f'(x)] = Coefx3*3*SQUARE(xn) + Coefx2*2*xn + CoefX) fcn
       WHERE [Try] < 99 and n.[f(x)]  <> 0)
SELECT [Polynomial Function], [Try], x, [f(x)] 
 FROM NewtonRaphson
 --WHERE [f(x)]  = 0     -- Uncomment to see only the roots
 ORDER BY ID, [Try]

The results show that convergence occurs after 7 iterations for ID=1 and 10 iterations for ID=2.

Polynomial Function       Try   x                              f(x)

3*x^3 + 2*x^2 + 5*x + 4   1     0                              4

3*x^3 + 2*x^2 + 5*x + 4   2     -0.8                          -0.256

3*x^3 + 2*x^2 + 5*x + 4   3     -0.766137566137566  -0.00584616847588215

3*x^3 + 2*x^2 + 5*x + 4   4     -0.76532764005746    -3.20958564792306E-06

3*x^3 + 2*x^2 + 5*x + 4   5     -0.765327194913966   -9.69002655892837E-13

3*x^3 + 2*x^2 + 5*x + 4   6     -0.765327194913831   8.88178419700125E-16

3*x^3 + 2*x^2 + 5*x + 4   7     -0.765327194913831   0

1*x^3 + 0*x^2 + -2*x + 2  1     -10                             -978

1*x^3 + 0*x^2 + -2*x + 2  2     -6.71812080536913     -287.773693306638

1*x^3 + 0*x^2 + -2*x + 2  3     -4.56088768547868     -83.7524259396923

1*x^3 + 0*x^2 + -2*x + 2  4     -3.17437494759505     -23.638335444194

1*x^3 + 0*x^2 + -2*x + 2  5     -2.33702597608057     -6.09006041603878

1*x^3 + 0*x^2 + -2*x + 2  6     -1.91366620851223     -1.18073973548998

1*x^3 + 0*x^2 + -2*x + 2  7     -1.7822736999897       -0.0968441911772411

1*x^3 + 0*x^2 + -2*x + 2  8     -1.76941172922642     -0.000882398415809504

1*x^3 + 0*x^2 + -2*x + 2  9     -1.76929236447105     -7.56297415804852E-08

1*x^3 + 0*x^2 + -2*x + 2  10    -1.76929235423863    0

Newton-Raphson brings with it the delight that if it converges, it will likely do so quite quickly.  Unfortunately, it also has the drawback that in some cases, either due to an ill-behaved function or a poor initial guess, it may end up in a cyclical loop or otherwise not converge on any root.  For the two cases demonstrated, you can see that it converged quite quickly, however try the second case with an initial guess of 0 (instead of -10) and see what happens!  You should also be aware that it isn’t designed to find all roots (some functions can have many) but will find the one (that is probably) closest to the initial guess.

If you’ve gotten this far, you’re probably realizing that all of these examples are deeply rooted in mathematics and you must be wondering what possible use this could be for more traditional business applications.  I would have to agree, that if you are not an engineer, these examples are probably seem pretty exotic.  Their value is in demonstrating how to convert a simple recursion formula into a recursive CTE.  For some more practical business applications, please read on.

Financial Applications

I worked supporting financial applications for an embarrassingly long period of time so I am able to draw upon that experience to deliver the following two examples.  I must confess that the second comes with a vague recollection of someone else applying this technique; however I encountered that web page early in my search and unfortunately I did not bookmark it at the time.  Rest assured that this is my solution to the problem, particularly when it comes to certain wrinkles related to rounding.

Depreciation Schedules

The company that I worked for was in the business of buying and leasing vehicles.  Along with any capital purchase (with the exception of land) comes depreciating that asset for tax purposes.  A recursive CTE can be used to generate the depreciation schedule for an asset.

Book Value0 = Purchase Price [less salvage value if you want to avoid gain/loss at the end of the period]

Accumulated Depreciation0 = 0

The recursion formulas for asset depreciation are:

Book Valuen = Book Valuen-1 – Depreciation (for current period)

Accumulated Depreciationn = Accumulated Depreciationn-1 + Depreciation (for current period)

This first example applies the straight line depreciation method with a little fudging at the end to make sure the final asset balance is driven to zero.  Check with your tax department on how they want to handle this, and if rounding should be to dollars, before you apply this technique.

DECLARE @Assets TABLE (ID INT, PurchaseCost MONEY, Period INT)

INSERT INTO @Assets

SELECT 1, 20000, 48

UNION ALL SELECT 2, 30000, 60


;WITH SLDepSched (AssetID, [Month], Period -- Asset base

    ,SLDepAmt, SLBookValue, SLCumDep       -- Straight Line Depreciation Method

    ) AS (

    SELECT ID, 0, Period

        ,ROUND(PurchaseCost/Period, 2)     -- Straight Line Depreciation Amount

        ,PurchaseCost, CAST(0 AS MONEY)

    FROM @Assets

    UNION ALL

    SELECT AssetID, [Month]+1, Period

        ,CASE [Month]+1 WHEN Period THEN SLBookValue ELSE SLDepAmt END

        ,CASE [Month]+1 WHEN Period THEN CAST(0 AS MONEY) ELSE SLBookValue - SLDepAmt END

        ,CASE [Month]+1 WHEN Period THEN SLCumDep + SLBookValue ELSE SLCumDep + SLDepAmt END

    FROM SLDepSched

    WHERE [Month] < Period)

SELECT AssetID, [Month], SLDepAmt, SLBookValue, SLCumDep                  

FROM SLDepSched

ORDER BY AssetID, [Month]

AssetID    Month  SLDepAmt  SLBookValue  SLCumDep

1              0         416.67         20000.00        0.00

1              1         416.67         19583.33        416.67

1              2         416.67         19166.66        833.34

1              3         416.67         18749.99        1250.01

<snip>

1              46       416.67          833.18           19166.82

1              47       416.67          416.51           19583.49

1              48       416.51          0.00               20000.00

2              0         500.00          30000.00        0.00

2              1         500.00          29500.00        500.00

2              2         500.00          29000.00        1000.00

2              3         500.00          28500.00        1500.00

<snip>

2              58       500.00         1000.00           29000.00

2              59       500.00          500.00            29500.00

2              60       500.00          0.00               30000.00

Any depreciation method can be used of course, and the next example replaces the straight line depreciation method with the double-declining balance method of depreciation, with turnover mid-life to straight line.

DECLARE @Assets TABLE (ID INT, PurchaseCost MONEY, Period INT)
DECLARE @DBFactor INT

SET @DBFactor = 2                 -- 2=Double Declining Balance

INSERT INTO @Assets
 SELECT 1, 20000, 48
 UNION ALL 
 SELECT 2, 30000, 60

;WITH DBDepSched (AssetID, [Month], Period -- Asset base
    ,DBDepAmt, DBBookValue, DBCumDep       -- Declining Balance Depreciation Method
    ) AS (
    SELECT ID, 0, Period
        ,ROUND(2*PurchaseCost/Period, 2)     -- Declining Balance Depreciation Amount
        ,PurchaseCost, CAST(0 AS MONEY)
    FROM @Assets
    UNION ALL
    SELECT AssetID, NextMo, Period
        ,CASE WHEN [Month] = MidPeriod THEN ROUND(DBBookValue/MidPeriod, 2)
            WHEN NextMo    = Period    THEN DBBookValue
            WHEN [Month]   > MidPeriod THEN DBDepAmt
            WHEN YE        = 1         THEN ROUND(@DBFactor*DBBookValue/Period, 2)
            ELSE DBDepAmt END
        ,CASE WHEN [Month] = MidPeriod THEN DBBookValue - ROUND(DBBookValue/MidPeriod, 2)
            WHEN NextMo    = Period    THEN CAST(0 AS MONEY)
            WHEN [Month]   > MidPeriod THEN DBBookValue - DBDepAmt
            WHEN YE        = 1         THEN DBBookValue - ROUND(@DBFactor*DBBookValue/Period, 2)
            ELSE DBBookValue - DBDepAmt END
        ,CASE WHEN [Month] = MidPeriod THEN DBCumDep + ROUND(DBBookValue/MidPeriod, 2)
            WHEN NextMo    = Period    THEN DBCumDep + DBBookValue
            WHEN [Month]   > MidPeriod THEN DBCumDep + DBDepAmt
            WHEN YE        = 1         THEN DBCumDep + ROUND(@DBFactor*DBBookValue/Period, 2)
            ELSE DBCumDep + DBDepAmt END
    FROM DBDepSched
       CROSS APPLY (SELECT NextMo=[Month]+1, MidPeriod=Period/2, YE=([Month]+1)/12) x
    WHERE [Month] < Period
)
SELECT AssetID, [Month], DBDepAmt, DBBookValue, DBCumDep
 FROM DBDepSched 
 ORDER BY AssetID, [Month]

AssetID    Month  DBDepAmt  DBBookValue  DBCumDep

1              0         833.33         20000.00         0.00

1              1         833.33         19166.67         833.33

1              2         833.33         18333.34         1666.66

1              3         833.33         17500.01         2499.99

<snip>

1              46       208.33          416.74           19583.26

1              47       208.33          208.41           19791.59

1              48       208.41          0.00               20000.00

2              0         1000.00        30000.00        0.00

2              1         1000.00        29000.00        1000.00

2              2         1000.00        28000.00        2000.00

2              3         1000.00        27000.00        3000.00

2              4         1000.00        26000.00        4000.00

<snip>

2              58        288.00         576.00            29424.00

2              59        288.00         288.00            29712.00

2              60        288.00         0.00               30000.00

Various depreciation methods with explanations can be found in Wikipedia.

Loan Amortization

Our second and final example in the realm of financial applications will be familiar to anyone that has purchased a home.  Somewhere around closing time, your proud moment of becoming a first time home buyer is shattered when you are handed the loan amortization schedule and humbled by the realization of how much interest you’ll have to pay on this debt!

Let’s first look at the basic components of the loan formulization:

P             = Amount of Principal, i.e., the amount of money borrowed

i               = Interest rate and for our case we’ll express this as Annual Percentage Rate (APR)

n             = Total number of (monthly) payments

From these factors, we know that the annuity or monthly payment (A) can be calculated from this formula:

A             = P * (R * (1 + R)n / ((1+R)n – 1), where R = (i/100)/12 [i.e., the interest paid per period]

The recursion formulas for the loan amortization schedule are:

Loan Balancen                    = Loan Balancen-1 – (Principal Paidn + Interest Due on Balancen)

Cumulative Interestn         = Cumulative Interestn-1 + Interest Due on Balancen

We can now construct a recursive CTE that performs the annuity calculation:


-- Payment schedule for a loan
DECLARE @Loans TABLE (
   ID INT, 
   LoanAmount MONEY, 
   Period INT, 
   InterestAPR FLOAT
 )

INSERT INTO @Loans
 SELECT 1, 20000, 48, 12
 UNION ALL SELECT 2, 30000, 60, 11.5
 UNION ALL SELECT 3, 126000, 360, 8.5

;WITH Payments AS (
    SELECT LoanID=ID, PaymentNo=0, Balance=LoanAmount
        ,Payment   = CAST(NULL AS MONEY)
        ,Principal = CAST(NULL AS MONEY)
        ,Interest  = CAST(NULL AS MONEY)
        ,CumInt    = CAST(0 AS MONEY)      -- Cumulative Interest
        ,R                                 -- APR converted to a monthly rate
        -- Calculate the monthly payment amount only once (in the anchor of the rCTE)
        ,Pmt       = ROUND(R*((POWER(1+R, Period)*LoanAmount)/(POWER(1+R, Period)-1)), 2)
        ,Period
    FROM @Loans
    CROSS APPLY (SELECT R=CAST((InterestAPR/100.)/12. AS MONEY)) x
    UNION ALL
    SELECT LoanID, PaymentNo + 1
        ,Balance   = CASE PaymentNo + 1
                     WHEN Period THEN CAST(0 AS MONEY)
                     ELSE Balance - (Pmt - ROUND(R * Balance, 2)) END
        ,Payment   = Pmt
        ,Principal = CASE PaymentNo + 1 WHEN Period THEN Balance
                     ELSE Pmt - ROUND(R * Balance, 2) END
        ,Interest  = CASE PaymentNo + 1 WHEN Period THEN Pmt - Balance
                     ELSE ROUND(R * Balance, 2) END
        ,CumInt    = CASE PaymentNo + 1 WHEN Period THEN CumInt + Pmt - Balance
                     ELSE ROUND(CumInt + R * Balance, 2) END
        ,R
        ,Pmt
        ,Period
    FROM Payments
    WHERE PaymentNo < Period)
SELECT LoanID, PaymentNo As No, Balance, Payment, Principal, Interest, CumInt
 FROM Payments
 ORDER BY LoanID, PaymentNo
 OPTION(MAXRECURSION 360)

The results for our two example loans are:

LoanID No   Balance    Payment  Principal   Interest    CumInt

1         0     20000.00   NULL       NULL        NULL      0.00

1         1     19673.31   526.69     326.69      200.00     200.00

1         2     19343.35   526.69     329.96      196.73     396.73

1         3     19010.09   526.69     333.26      193.43     590.16

1         4     18673.50   526.69     336.59      190.10     780.26

<snip>

1         45    1548.18    526.69     506.15      20.54       5249.23

1         46    1036.97    526.69     511.21      15.48       5264.71

1         47    520.65      526.69     516.32      10.37       5275.08

1         48    0.00          526.69     520.65      6.04        5281.12

2         0      30000.00   NULL       NULL        NULL      0.00

2         1      29627.91   660.09     372.09      288.00     288.00

2         2      29252.25   660.09     375.66      284.43     572.43

2         3      28872.98   660.09   379.27   280.82   853.25

2         4      28490.07   660.09   382.91   277.18   1130.43

<snip>

2         57    1942.00     660.09    635.35   24.74     9567.13

2         58    1300.55     660.09    641.45   18.64     9585.77

2         59    652.95       660.09    647.60   12.49     9598.26

2         60    0.00          660.09     652.95   7.14      9605.40

3         0      126000.00  NULL      NULL     NULL    0.00

3         1      125923.98  970.62    76.02     894.60   894.60

3         2      125847.42  970.62    76.56     894.06   1788.66

3         3      125770.32  970.62    77.10     893.52   2682.18

3         4      125692.67  970.62    77.65     892.97   3575.15

3         5      125614.47  970.62    78.20     892.42   4467.57

<snip>

3         357   2867.35     970.62    943.56   27.06     223378.69

3         358   1917.09     970.62    950.26   20.36     223399.05

3         359   960.08       970.62    957.01  13.61      223412.66

3         360   0.00           970.62    960.08  10.54     223423.20

Once again, fudging at the end is essential to drive the principal balance to zero.  For this case, we needed to set our MAXRECURSION limit to handle the case of a 30 year mortgage (30x12).

Operations Research

Many of you are probably asking yourself what the heck is operations research anyway?  It is typically not something that you encounter in most undergraduate university programs unless you were enrolled in certain highly specialized engineering or quantitative business disciplines.  I happen to be one of the unfortunate few that were.  Operations research can basically be thought of as methods to solve a wide variety of problems associated with the allocation of limited resources.

In the discussion thread for my article Generating n-Tuples with SQL, I proposed two business problems that originate in operations research, that of the grocer and the thief.  On page 4 of the discussion thread, I proposed a solution to the grocer’s problem, so for the sake of originality, we shall not rehash it here.

The Knapsack Problem

I left the problem of the thief open-ended but it is quite easily solved in our first operations research example.  This is better known as the knapsack problem and it is one of my personal favorites.   We shall rehash that problem statement now, mainly because it is quite simply a lot of fun:

“I am an expert thief and I've just diabolically entered a vault filled with priceless antiquities. My cunning, larcenous brain can easily establish the intrinsic value of each of these items (or at least the amount that I know my niggardly fence will pony up) and my adroit fingers can instantaneously determine the valuables’ weight but alas, my knapsack (and my old weary bones) can carry no more than a certain weight of items. My objective is to abscond with the highest value of booty that I can safely carry out of there, without breaking my knapsack (or my back) in the process.  The maximum weight that I can carry in my knapsack is 50.”

First we shall initialize our valuable baubles:

DECLARE @t TABLE (item VARCHAR(40), value MONEY, weight DECIMAL(6,2))
DECLARE @MaxWeight DECIMAL(6,2)
SET @MaxWeight = 50

INSERT INTO @t (item, value, weight)
SELECT 'Jade Buddha', 40000, 15
UNION ALL SELECT 'Gilded Lyre', 20000, 18
UNION ALL SELECT 'Pearl-handled Scimitar', 19000, 17
UNION ALL SELECT 'Ruby-ensconced Scepter', 24000, 21
UNION ALL SELECT 'Arc of the Covenant', 53000, 25
UNION ALL SELECT 'Diamond Tiara of Cleopatra', 27000, 17
UNION ALL SELECT 'Bust of Nefertiti', 14000, 12
UNION ALL SELECT 'The Necronomicon 1st Edition', 18000, 14
UNION ALL SELECT 'Katana of the 1st Emperor', 22000, 13
UNION ALL SELECT 'Sword of Damocles', 25000, 19
UNION ALL SELECT 'Spear of Destiny', 21000, 15
UNION ALL SELECT 'The Holy Grail', 32000, 20

This is a brute-force approach and not a dynamic programming one, although the algorithm for the latter is also recursive in nature.  The idea is to generate all the possible combinations (n-Tuples) of items, without concern for ordering and without duplication, and calculate the total weight and value of each combination, satisfying our objective when we determine the highest value within our weight limit.  We shall draw on our approach to the n-Tuples problem to generate this solution to the knapsack problem:

;WITH UNIQUEnTuplesKP (n, Tuples, [Value], Weight) AS (
        SELECT DISTINCT 1, CAST(item AS VARCHAR(MAX)), [Value], Weight
        FROM @t
        UNION ALL
        SELECT 1 + n.n, t.item + ', ' + n.Tuples, n.[Value] + t.[Value]
            ,CAST(n.Weight + t.Weight AS DECIMAL(6,2))
        FROM @t t JOIN UNIQUEnTuplesKP n ON t.item < n.Tuples
        WHERE CHARINDEX(t.item, n.Tuples) = 0),
    Ranking AS (
        SELECT n, value, Weight, Tuples
            ,RANK() OVER (ORDER BY [Value] DESC) AS Rank
        FROM UNIQUEnTuplesKP
        WHERE Weight <= @MaxWeight)
SELECT n As NoItems, [Value], Weight, Rank, Tuples AS ListofItems
 FROM Ranking
 WHERE Rank <= 5

If you're sure you have no duplications in the items to be placed in the knapsack, you can remove the DISTINCT on the SELECT of the anchor leg for a slight improvement.

Our result set includes the top 5 ranked combinations (7 records) because there are a couple of ties.  Note our check that weight of the Tuple doesn’t exceed the total allowed weight is in the recursive leg of the CTE because once a combination is over the weight limit, no future combinations including it will meet the constraint.

NoItems Value          Weight  Rank  ListofItems

3            94000.00     48.00    1       Jade Buddha, Katana of the 1st Emperor, The Holy Grail

3            93000.00     50.00    2       Jade Buddha, Spear of Destiny, The Holy Grail

2            93000.00     40.00    2       Arc of the Covenant, Jade Buddha

3            90000.00     49.00    4       Jade Buddha, The Holy Grail, The Necronomicon 1st Edition

3            89000.00     45.00    5       Diamond Tiara of Cleopatra, Jade Buddha, Katana of the 1st Emperor

3            89000.00     50.00    5       Arc of the Covenant, Bust of Nefertiti, Katana of the 1st Emperor

Rats!  I really hated leaving the Arc of the Covenant behind!

The Transportation Problem

The next problem is a classic of operations research, namely the transportation problem.  This is similar to the hierarchy traversals proposed by Microsoft, et al, however we believe that it is sufficiently different (and since I don’t claim to be a SQL guru anyway), we hope that the complaint I voiced in my opening paragraphs does not apply.

The traditional solution approach is the Simplex method; however we shall use a recursive CTE to solve this, once again by brute force.

Given the directed graph represented below, where nodes can be considered locations and edges (directed arrows) show traversals between locations and the associated cost, our objective is to traverse the graph from A to Z at the lowest total cost.

Our implementation of this solution is shown below and runs quite quickly for our limited network of 26 nodes.  The idea is to generate a solution set based on all possible traversals of the grid, along with the total cost of doing so.  Note that your network must be carefully pruned of potential feedback loops.

First the setup data, where we have provided costs consistent with the above graphic but also the ability to randomly generate costs of 25-125 for each segment.  Note that we found the INDEX we added to be slightly helpful.

CREATE TABLE #Edges (
  Origin CHAR(3), 
  Destination CHAR(3), 
  cost MONEY
 PRIMARY KEY (Origin, Destination))

CREATE UNIQUE INDEX t1 ON #Edges (Destination, Origin)

INSERT INTO #Edges
SELECT 'A', 'B', 21 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'A', 'C', 23 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'A', 'D', 42 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'B', 'E', 24 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'B', 'F', 82 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'C', 'G', 5 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'C', 'H', 17 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'D', 'I', 34 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'D', 'J', 27 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'E', 'K', 22 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'F', 'L', 32 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'G', 'K', 67 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'G', 'M', 52 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'H', 'L', 41 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'I', 'M', 67 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'J', 'M', 16 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'K', 'N', 34 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'K', 'O', 18 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'L', 'O', 23 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'M', 'O', 22 -- ABS(CHECKSUM(NEWID())) % 100 + 25 
UNION ALL SELECT 'O', 'Q', 25 -- ABS(CHECKSUM(NEWID())) % 100 + 25 
UNION ALL SELECT 'Q', 'R', 41 -- ABS(CHECKSUM(NEWID())) % 100 + 25 
UNION ALL SELECT 'J', 'P', 54 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'N', 'Q', 28 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'N', 'R', 18 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'P', 'S', 23 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'P', 'T', 29 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'Q', 'W', 33 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'Q', 'U', 22 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'R', 'U', 24 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'R', 'V', 27 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'S', 'V', 36 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'S', 'R', 42 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'T', 'V', 41 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'T', 'Y', 61 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'U', 'W', 28 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'U', 'X', 26 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'V', 'X', 18 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'V', 'Y', 27 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'X', 'W', 43 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'W', 'Z', 55 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'Y', 'Z', 22 -- ABS(CHECKSUM(NEWID())) % 100 + 25

And now the solution, where you can uncomment the CostCalculation column if you’d like to check the results:

;WITH Graph AS (
    SELECT 1 As n, Origin, Destination, Cost
        ,CAST(RTRIM(Origin) + '->' + RTRIM(Destination) AS VARCHAR(MAX)) AS [Path]
        ,CAST(Cost AS VARCHAR(MAX)) AS CostCalculation
    FROM #Edges WHERE Origin = 'A'
    UNION ALL
    SELECT n + 1, g.Origin, t.Destination, g.Cost + t.Cost
        ,[Path] + '->' + RTRIM(t.Destination)
        ,CostCalculation + '+' + CAST(t.Cost AS VARCHAR(10))
    FROM Graph g
    INNER JOIN #Edges t ON g.Destination = t.Origin
)
SELECT n AS Segments, Cost, [Path] AS [Path From A-Z]
 FROM Graph
 WHERE Destination = 'Z'
 ORDER BY Cost

The TOP 10 results below, where if only the lowest cost is needed add TOP 1 to the SELECT. 

Segments   Cost      Path From A-Z

8                195.00   A->B->E->K->N->R->V->Y->Z

7                198.00   A->B->E->K->O->Q->W->Z

8                215.00   A->B->E->K->O->Q->U->W->Z

7                215.00   A->C->G->M->O->Q->W->Z

7                217.00   A->C->H->L->O->Q->W->Z

7                217.00   A->B->E->K->N->Q->W->Z

7                220.00   A->D->J->M->O->Q->W->Z

8                223.00   A->C->G->K->N->R->V->Y->Z

7                226.00   A->C->G->K->O->Q->W->Z

8                226.00   A->B->E->K->N->R->U->W->Z

Your recursion limit must be set to the largest number of nodes that must be traversed to get from A to Z.  In our case, that number is quite small. 

Now we will make the following modifications to generalize the solution somewhat:

  1. Modify the hard-coded WHERE clauses for Origin and Destination to use a table we’ll create (@Terminus) to identify origin and destination nodes.
  2. Modify the output to display results for the two top ranked solutions for each of the unique origin to final destination paths in the network.

We will also set up two additional nodes that enter and exit the network, resulting in multiple origins and destinations.

-- Add an additional origin and destination node
INSERT INTO #Edges
UNION ALL SELECT '0', 'N', 15 -- ABS(CHECKSUM(NEWID())) % 100 + 25
UNION ALL SELECT 'Y', '9', 22 -- ABS(CHECKSUM(NEWID())) % 100 + 25

DECLARE @Terminus TABLE (OD INT, node CHAR(3))

INSERT INTO @Terminus
 SELECT 1, origin FROM (SELECT origin 
                         FROM #Edges 
                        EXCEPT 
                        SELECT destination 
                         FROM #Edges) x
 UNION ALL
 SELECT 2, destination FROM (SELECT destination 
                              FROM #Edges 
                             EXCEPT 
                             SELECT origin 
                              FROM #Edges) x

Here is the solution modified based on the two points above:

;WITH Graph
    AS (
        SELECT 1 As n, Origin, Destination, Cost
            ,CAST(RTRIM(Origin) + '->' + RTRIM(Destination) AS VARCHAR(MAX)) AS [Path]
        FROM #Edges WHERE Origin IN (SELECT node FROM @Terminus WHERE OD = 1)
        UNION ALL
        SELECT n + 1, g.Origin, t.Destination, g.Cost + t.Cost
            ,[Path] + '->' + RTRIM(t.Destination)
        FROM Graph g
        INNER JOIN #Edges t ON g.Destination = t.Origin
    ),
    CostByPath ([Start/End], Segments, Cost, [Path From Origin-Terminus], CostRanking)
    AS (
        SELECT Origin + '/' + Destination, n, Cost, [Path]
            ,RANK() OVER (PARTITION BY Origin, Destination ORDER BY Cost)
        FROM Graph
        WHERE Destination IN (SELECT node FROM @Terminus WHERE OD = 2)
    )

SELECT CostRanking, 
       [Start/End], 
       Segments, 
       Cost, 
       [Path From Origin-Terminus]
 FROM CostByPath
 WHERE CostRanking < 3
 ORDER BY [Start/End], Cost

In the results we see 8 records, which are the two top ranked (optimal) solutions for each of the paired origin/destination combinations.

CostRanking  Start/End  Segments   Cost     Path From Origin-Terminus

1                   0  /9          5               109.00  0->N->R->V->Y->9

2                   0  /9          6               160.00  0->N->Q->R->V->Y->9

1                   0  /Z          5               109.00  0->N->R->V->Y->Z

2                   0  /Z          4               131.00  0->N->Q->W->Z

1                   A  /9          8               195.00  A->B->E->K->N->R->V->Y->9

2                   A  /9          8               223.00  A->C->G->K->N->R->V->Y->9

1                   A  /Z          8               195.00  A->B->E->K->N->R->V->Y->Z

2                   A  /Z          7               198.00  A->B->E->K->O->Q->W->Z

Let’s explore this example a bit more by setting up a test harness to generate a network for us.

;WITH Tally (n) AS (
        SELECT TOP 26 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM sys.all_columns),
    Letters (l, n) AS (
        SELECT SUBSTRING ('ABCDEFGHIJKLMNOPQRSTUVWXYZ', n, 1), n FROM Tally),
    Nodes (nodes) AS (
        SELECT l1.l + l2.l FROM Letters l1 CROSS APPLY (SELECT l, n FROM Letters) l2
        WHERE l1.n < 15 AND l2.n < 5),
    Network (Origin, Destination, Cost, r) AS (
        SELECT n1.nodes, n2.nodes, ABS(CHECKSUM(NEWID())) % 100 + 25
            ,ROW_NUMBER() OVER (PARTITION BY n1.nodes ORDER BY (SELECT NULL))
        FROM Nodes n1 INNER JOIN Nodes n2 ON n1.nodes < n2.nodes)

INSERT INTO #Edges
SELECT Origin, Destination, Cost
 FROM Network
 WHERE r < 4 -- ABS(CHECKSUM(NEWID())) % 9 + 1

Explanation of the test harness:

  1. The Nodes CTE combines Letters traversed by the Tally table to generate 15x5 (or 75) nodes.
  2. The Network CTE does an INNER JOIN of Nodes on itself, where the left node must be less than the right node, to ensure that there are no feedback loops.  It also assigns a random cost to each Edge from 25 to 125.
  3. The WHERE r < 4 can be called the “bushiness factor,” which determines precisely how many routes out from a node (in this case 3) are allowed.

You can run with this test harness but you should be aware that it generates multiple origin nodes and one destination node (the network has a generally triangular shape).  For our tests though, we will introduce some additional code which adds one node on each end of the graph, to consolidate multiple origins and multiple destinations into a single one (of each).

DECLARE @Terminus TABLE (OD INT, node CHAR(3))

INSERT INTO @Terminus
-- Identify origin nodes
 SELECT 1, origin FROM (SELECT origin FROM #Edges EXCEPT SELECT destination FROM #Edges) x
 UNION ALL
-- Identify termination nodes
 SELECT 2, destination FROM (SELECT destination FROM #Edges EXCEPT  SELECT origin FROM #Edges) x

DECLARE @Origins INT, @Destinations INT
SELECT @Origins = SUM(CASE WHEN OD = 1 THEN 1 ELSE 0 END)
       ,@Destinations = SUM(CASE WHEN OD = 2 THEN 1 ELSE 0 END)
FROM @Terminus

-- Tie all origins to one origin and all destinations to one destination
INSERT INTO #Edges
 SELECT CASE OD WHEN 1 THEN '00' ELSE node END
    ,CASE OD WHEN 1 THEN node ELSE '99' END
    ,ABS(CHECKSUM(NEWID())) % 100 + 25
 FROM @Terminus

DELETE FROM @Terminus

-- Recreate table of origin and termination nodes
INSERT INTO @Terminus
 SELECT 1, origin FROM (SELECT origin FROM #Edges EXCEPT SELECT destination FROM #Edges) x
 UNION ALL
 SELECT 2, destination FROM (SELECT destination FROM #Edges EXCEPT SELECT origin FROM #Edges) x

We’re also going to consider two additional modifications that may make the brute force approach more efficient:

  1. Since it is pretty senseless to be analyzing a graph that consists of AàB, AàC, etc. only, we will make the assumption that the minimum number of segments in the network is 2, meaning these paths look like AàB àC, etc.  We will do this by modifying the anchor leg of the recursive CTE.
  2. Taking a look at the first network we explored (our 26 nodes AàZ example) we see:

          1: A(23)C(17)H(41)L – Cost = 81

          2: A(21)B(82)F(32)L – Cost =  135

Since any path from A to Z can be reached at a cost of 81 + path from L to Z, we can eliminate the second path from further consideration.  In other words, eliminate all partial paths between points where a similar partial path (same origin and destination) is available at a lower cost.  We will do this by modifying the recursive leg of the CTE.

These two additional revisions (revision 2 does not contain revision 1) are shown below.



-- Revision 1: Modify anchor leg to include first 2 edges in the path

;WITH Graph (n, Origin, Destination, Cost, [Path])

    AS (

        SELECT 2 As n, e1.Origin, e2.Destination, e1.Cost + e2.Cost

            ,CAST(RTRIM(e1.Origin) + '->' + RTRIM(e1.Destination) + '->' +

                RTRIM(e2.Destination) AS VARCHAR(MAX)) AS [Path]

        FROM #Edges e1 INNER JOIN #Edges e2 ON e1.Destination = e2.Origin

        WHERE e1.Origin IN (SELECT node FROM @Terminus WHERE OD = 1)

        UNION ALL

        SELECT n + 1, g.Origin, t.Destination, g.Cost + t.Cost

            ,[Path] + '->' + RTRIM(t.Destination)

        FROM Graph g

        INNER JOIN #Edges t ON g.Destination = t.Origin

       ),

    CostByPath ([Start/End], Segments, Cost, [Path From Origin-Terminus], CostRanking)

    AS (

        SELECT Origin + '/' + Destination, n, Cost, [Path]

            ,RANK() OVER (PARTITION BY Origin, Destination ORDER BY Cost)

        FROM Graph

        WHERE Destination IN (SELECT node FROM @Terminus WHERE OD = 2)

       )

SELECT CostRanking, [Start/End], Segments, Cost, [Path From Origin-Terminus]

FROM CostByPath

WHERE CostRanking < 3

ORDER BY [Start/End], Cost


-- Revision 2: Prune equivalent paths of higher cost

;WITH Graph (n, Origin, Destination, Cost, [Path])

    AS (

        SELECT 1 As n, Origin, Destination, Cost

            ,CAST(RTRIM(Origin) + '->' + RTRIM(Destination) AS VARCHAR(MAX)) AS [Path]

        FROM #Edges WHERE Origin IN (SELECT node FROM @Terminus WHERE OD = 1)

        UNION ALL

        SELECT n + 1, g.Origin, t.Destination, g.Cost + t.Cost

            ,[Path] + '->' + RTRIM(t.Destination)

        FROM (

            SELECT n, Origin, Destination, Cost, [Path]

                ,RANK() OVER (PARTITION BY Origin, Destination ORDER BY Cost)

            FROM Graph    

        ) g (n, Origin, Destination, Cost, [Path], Rank)

        INNER JOIN #Edges t ON g.Destination = t.Origin

        WHERE Rank = 1

       ),

    CostByPath ([Start/End], Segments, Cost, [Path From Origin-Terminus], CostRanking)

    AS (

        SELECT Origin + '/' + Destination, n, Cost, [Path]

            ,RANK() OVER (PARTITION BY Origin, Destination ORDER BY Cost)

        FROM Graph

        WHERE Destination IN (SELECT node FROM @Terminus WHERE OD = 2)

       )

SELECT CostRanking, [Start/End], Segments, Cost, [Path From Origin-Terminus]

FROM CostByPath

WHERE CostRanking < 3

ORDER BY [Start/End], Cost

We will now do some test runs on the following network set up to see if our revisions have helped to improve performance:

Before adjustment: Nodes: 75, Edges: 149, Origins: 36, Destinations: 1

After adjustment: Nodes: 77, Edges: 186, Origins: 1 (00), Destinations: 1 (99)

Our run results were as follows (averages):

Rev        CPU      Elapsed

0            13236    14468

1            12862    14236

2            14360    15759

Surprisingly, revision 1 seems to make a slight improvement, while revision 2 (which we had high hopes for) seemed to make things worse, possibly because the computational overhead of RANK() was significantly more than the reduction achieved in analyzed paths.  Furthermore, we tried it with ROW_NUMBER(), which would also work but possibly sacrifice some optimal solutions, and got similar results.

Feel free to try this on your own networks; however you should be careful using the test harness if you increase the “bushiness factor.”  This can have the effect of exploding the size of the network quite quickly, with a corresponding increase in run time of the solution.  In our SQL, we provided the ability to create “random bushiness.”

In practice, the transportation problem is likely to contain thousands of nodes and thousands of connecting edges.  Setting up the problem is nearly as time-consuming as generating the result.  Another constraint may also be applied – time of traversal.  By associating a duration (i.e., time) with each edge, we can modify our objective to be that the traversal should be of lowest cost and within a specified time interval.  We shall leave this modification to our intrepid readers as an exercise.

Additional Remarks and Conclusions

Should a recursive CTE be used to solve your specific problem?  The answer is always “it depends” and often it is “probably not.”  As Jeff Moden has pointed out to me on numerous occasions, they contain hidden RBAR and his article clearly illustrates this sobering fact.

We have intentionally avoided the issue of performance of our recursive CTE examples (well mostly), and we have not explored alternative solutions to these examples.  The purpose of this article was again, to simply explore recursive CTEs so that those of us who are recursively challenged may learn from examples.

“If you know the enemy and know yourself, you need not fear the result of a hundred battles. If you know yourself but not the enemy, for every victory gained you will also suffer a defeat. If you know neither the enemy nor yourself, you will succumb in every battle.”

-- Sun Tzu, The Art of War, Special Edition

While recursive CTEs may not necessarily be our enemies, they are most definitely not always our friends.  To know them is to understand their weaknesses and their potential strengths.  To avoid them is to close the door on opportunity.  More significantly though, it is useful to explore recursive CTEs, to understand recursive algorithms that have significant importance in business and industry.  This understanding of the recursion model may ultimately lead you to a better performing solution to your particular problem.

Recursion algorithms are remarkably prevalent across number theory, sequences and numerical analysis and while seemingly theoretical in nature, these help us to understand the world around us.  They are equally if not more important in financial and operations research, for the simple reason that these are the domains that typically generate wealth.

Our brute force approach to the operations research examples, while interesting in limited cases, certainly does not provide a convenient method for solving such problems on a large scale.

Our exploration has taken us on an unexpected path back to our roots.  We hope you have found this article stimulating and sincerely hope that it will continue to spawn future discussion and debate on areas previously unexplored adequately in the SQL arena.

Special thanks to Chris Morris (ChrisM@Work) for his valuable feedback based on review of the draft of this article.

Until next time my valued reader, I thank you for accompanying us on this exploration.

Total article views: 20634 | Views in the last 30 days: 280
 
Related Articles
FORUM

checksums and unicode data

Same checksum value for two different values.

FORUM

CHECKSUM

When I execute below SQL statement in SQL server 2000, I get the same value of 128414903 for the CHE...

FORUM

How to Use BINARY_CHECKSUM operation.

BINARY_CHECKSUM

FORUM

working with union all or union

union all vs union

Tags
cte    
recursion    
t-sql    
 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones