Grabbing sets less than an arbitrary number; not quite a bin packing problem

  • I'm in a SQL2016 environment.
    Help me find the logic mistake in my code, please!

    I have an issue where i need to grab all related records from multiple groups, where the sum of complete groups is less than or equal to an arbitrary value.
    I must be sure to grab whole groups, and not partials; a perfect example is tables and columns....a table is not complete without it's columns.

    So if i group tables up and get some column counts, and order them based on some order by criteria, i can visually see what i might be limiting.
    the query below seems to almost work, however, but I'm seeing there is a problem when the # of columns of two tables are the same...you can see that the "RollingSum" does not increment the way i expected it to.

    The same issue is appearing in my real world problem.

    run this query on any database with a  decent number of tables: at some point, the RollingSum does not increment, and it's not because a table has no columns, it's because two tables have the exact same number of columns and the sort order puts them under each other.

    objz.name     AS TableName,
    objz.type_desc   AS TypeDescription,
    COUNT(colz.OBJECT_ID) AS TheCount
    FROM sys.objects objz
    INNER JOIN sys.columns colz
      ON objz.OBJECT_ID = colz.OBJECT_ID
      WHERE objz.type_desc IN('USER_TABLE','VIEW')
    GROUP BY objz.name,objz.type_desc
    --ORDER BY objz.type_desc,TheCount DESC

    This is basically my complete example; by coincidence only, it works sometimes because my first bunch of tables have differnet # of columns.


    DECLARE @MaxRecords int = 500;
    ;WITH MyLimits
    AS
    (
    SELECT SUM(COUNT(colz.OBJECT_ID)) OVER( ORDER BY objz.type_desc,COUNT(colz.OBJECT_ID) DESC) AS RollingSum,
    objz.name     AS TableName,
    objz.type_desc   AS TypeDescription,
    COUNT(colz.OBJECT_ID) AS TheCount
    FROM sys.objects objz
    INNER JOIN sys.columns colz
      ON objz.OBJECT_ID = colz.OBJECT_ID
      WHERE objz.type_desc IN('USER_TABLE','VIEW')
    GROUP BY objz.name,objz.type_desc
    --ORDER BY objz.type_desc,TheCount DESC
    )

    SELECT row_number() OVER (order by objz.name) As RW,
    row_number() OVER (PARTITION BY objz.name order by objz.name) As PTRW,
    objz.name,
    colz.name,
    column_id
    FROM sys.objects objz
    INNER JOIN sys.columns colz
      ON objz.OBJECT_ID = colz.OBJECT_ID
    INNER JOIN MyLimits ON objz.name = MyLimits.TableName
     WHERE objz.type_desc IN('USER_TABLE','VIEW')
    AND MyLimits.RollingSum < = @MaxRecords
    ORDER BY objz.type_desc

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • You need to include your table name in the ORDER BY clause in windowed function in your CTE.  I'm not sure where to get the results you need.

    Also, your windowed function is using the default frame RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.  As far as I can tell, ROWS UNBOUNDED PRECEDING would give the same results and would be more efficient.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Could you be missing the ROWS clause in the OVER()?

    DECLARE @MaxRecords int = 500;

    WITH MyLimits
    AS
    (
    SELECT SUM(COUNT(colz.OBJECT_ID)) OVER( ORDER BY objz.type_desc,COUNT(colz.OBJECT_ID) DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RollingSum,
    objz.name  AS TableName,
    objz.type_desc  AS TypeDescription,
    COUNT(colz.OBJECT_ID) AS TheCount
    FROM sys.objects objz
    INNER JOIN sys.columns colz
    ON objz.OBJECT_ID = colz.OBJECT_ID
    WHERE objz.type_desc IN('USER_TABLE','VIEW')
    GROUP BY objz.name,objz.type_desc
    --ORDER BY objz.type_desc,TheCount DESC
    )

    SELECT row_number() OVER (order by objz.name) As RW,
    row_number() OVER (PARTITION BY objz.name order by objz.name) As PTRW,
    objz.name,
    colz.name,
    column_id
    FROM sys.objects objz
    INNER JOIN sys.columns colz
    ON objz.OBJECT_ID = colz.OBJECT_ID
    INNER JOIN MyLimits ON objz.name = MyLimits.TableName
    WHERE objz.type_desc IN('USER_TABLE','VIEW')
    AND MyLimits.RollingSum < = @MaxRecords
    ORDER BY objz.type_desc

    Or maybe I misunderstood the problem.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • drew.allen - Thursday, May 4, 2017 11:44 AM

    You need to include your table name in the ORDER BY clause in windowed function in your CTE.  I'm not sure where to get the results you need.

    Also, your windowed function is using the default frame RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.  As far as I can tell, ROWS UNBOUNDED PRECEDING would give the same results and would be more efficient.

    Drew

    It won't give the same results. I'm guessing that's the problem, but I'm not sure.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • OK walking away and doing a mental reset let me see my issue, or at least come up with a working solution
    i *think* i had two things going on. 
    I needed to pre-render my counts before i could use them in the SUM() OVER() clause.
    second, my ORDER BY int he SUM() OVER() needed additional granularity to avoid dupes; i just added another column name, and poof problem looks better now.

    DECLARE @MaxRecords int = 650;
    ;WITH PreRender
    AS
    (
    SELECT
    objz.name  AS TableName,
    objz.type_desc  AS TypeDescription,
    COUNT(colz.OBJECT_ID) AS TheCount
    FROM sys.objects objz
    INNER JOIN sys.columns colz
    ON objz.OBJECT_ID = colz.OBJECT_ID
    WHERE objz.type_desc IN('USER_TABLE','VIEW')
    GROUP BY objz.name,objz.type_desc
    )
    ,MyLimits
    AS
    (
    SELECT SUM(TheCount) OVER( ORDER BY TheCount DESC,TableName) AS RollingSum,
    * FROM PreRender

    )

    SELECT * FROM sys.objects objz
    INNER JOIN sys.columns colz
    ON objz.OBJECT_ID = colz.OBJECT_ID
    INNER JOIN MyLimits ON objz.name = MyLimits.TableName
    WHERE objz.type_desc IN('USER_TABLE','VIEW')
    AND MyLimits.RollingSum < = @MaxRecords

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Luis you posted your solution between the time i looked again; your's works as well, thank you!

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply