• saravanatn - Wednesday, December 6, 2017 7:28 PM

    For the test data created by you I want the following output updated in inline commands.

    INSERT INTO details 
    VALUES (1, 'sara', 'chennai', 't.nagar'), 
           (1, NULL, NULL, NULL), 
           (1, 'sara', 'chennai', 't.nagar'), 
           (2, 'sara', 'chennai', 't.nagar'), 
           (2, NULL, NULL, NULL), 
           (3, 'sara', 'chennai', 't.nagar'), 
           (4, NULL, NULL, NULL), 
           (5, 'raj', 'vellore', 'kodambakkam'), 
           (5, 'raj', 'vellore', NULL), 
           (6, NULL, 'London','Kensington'), /* not required as it is having multiple values for ID(Count=2) and it is having Null value/*
           (6,'Steve', 'London',NULL), /* not required as it is having multiple values for ID(Count=2) and it is having Null value/*
           (7,'Steve', 'London','Kensington'), /* Required as it is not having null values/*
           (7,'Steve', 'London','Barnet') /* Required as it is not having null values/*;

    Output Results:

    idnamecityplace
    1sarachennait.nagar
    1sarachennait.nagar
    2sarachennait.nagar
    3sarachennait.nagar
    4NULLNULLNULL
    5rajvellorekodambakkam
    7SteveLondonKensington
    7SteveLondonBarnet

    Kindly let me know if you requires any additional details.

    Regards,
    Saravanan

    This returns the above results:
    WITH CountsandNulls AS (
        SELECT
            id
        ,    name
        ,    city
        ,    place
        ,    COUNT(*) OVER (PARTITION BY id) AS RowsPerID
        ,    CASE
                WHEN name + city + place IS NULL THEN 1
                ELSE 0
            END AS HasNulls
        FROM details
        )
    SELECT
        id
    ,    name
    ,    city
    ,    place
    FROM CountsandNulls
    WHERE RowsPerID = 1
    OR HasNulls = 0;

    jcelko212 32090 - Thursday, December 7, 2017 11:51 AM
    Can you please buy read and study a book on RDBMS and SQL? There is no such thing as a generic "id" in RDBMS. It has to be the identifier of something in particular to be valid. Then on top of that, and identifier can never be a numeric because you don't do math on it.

    Can you please buy read and study a book on logic?  The (correct) premise "if we need to do arithmetic on a column then the column must have a numeric data type" does not imply "if we do not need to do arithmetic on a column then the column must not have a numeric data type", any more than "if I need to drive a car it must have wheels" implies "if nobody needs to drive a car it must not have wheels".

    saravanatn - Thursday, December 7, 2017 10:57 PM

    jcelko212 32090 - Thursday, December 7, 2017 11:51 AM

    Thanks Celko. I agree with you. We are moving the records from Oracle to Hive(Hadoop), which is similar to SQL. In hive there is no update concept. In hive(for old version) only insert is available .Every time a record came it is inserted (not updated) in hive and  I am fetching the records in hive  using ID(primary key) and Max(run_time) for that also  we are getting multiple records for the particular ID.we are some how getting null values as well  in Hive to eliminate the null record I posted the sample data.
    What Thorn suggested almost worked.

    Despite Joe's dogma and shaky logic, I actually agree with most of what he says.  In particular, why are you keeping two identical rows for ID 1, and why are you keeping a row of NULLs for ID 4 that doesn't actually tell you anything?

    John