• leehbi - Friday, February 17, 2017 9:19 AM

    I have a sub-set of customers :

    CREATE TABLE #temptable ( [ID] int, [Mobile] nvarchar(20), [DOB] date, [Email] nvarchar(50), [PostCode] nvarchar(10) )
    INSERT INTO #temptable
    VALUES
    ( 1, N'078', N'1970-01-01T00:00:00', N'bob@bob.com', N'L1 3RT' ),
    ( 2, N'079', N'1981-01-02T00:00:00', N'bob@bob.com', N'L1 3RT' ),
    ( 3, N'034', N'1986-05-03T00:00:00', N'bert@aol.com', N'CH1 3ER' ),
    ( 4, N'078', N'1970-01-01T00:00:00', N'bob2@bob.com', N'L1 3RT1' )

    I need to detect associations between customers - this is done by matching customers where any 2 fields of the 4 (Mobile, DOB, Email, PostCode) match.

    The record size is approx 2.5m.

    Everything I've tried to date is too slow, for instance self-joins on pairs of fields.   I've not tried cursors as think this will be a no no too. 

    The result I'm after is a table with 2 columns :   ID, RelatedID   (Each ID can have 1 or many relatedID's)

    For instance the result on the sample data would be :


    ID         RelatedID
    1           1
    1           2
    1           4
    2           1
    3           3
    4           1
    4           4

        
    I wondered what the optimal way of doing this is.  To be honest I'm beginning to think it's not even up SQL's street.

    I wondered if there's anything in TSQL2016 that comes in handy in this respect?

    Beforeyou post again, please read a book on basic RDBMS, so you want postgarbage. A table has to have a key, but there is no such thing as ageneric “id†in RDBMS; it has to be the identifier of somethingin particular. The ANSI ISO standard SQL does not recognize the “Tâ€separator in temporal data (frankly, I think that is a flaw and Ishould of voted against it). You do know that identifier can never   numeric? You do not do any math on identifiers. I hope 🙁
     CREATETABLE Customers
    (mobile_somethingCHAR(3) NOT NULL,
    birth_dateDATE NOT NULL,
    Customers_emailCHAR(256) NOT NULL PRIMARY KEY,
    post_codeVARCHAR(10)NOT NULL);

    the first week of your database class? The definition of a primary key?It has to be made up of attributes in the table and not a physicalcount of things in the storage of the table. We also have a DATE datatype in SQL that you failed to use in the insertion statement. Hereis an attempt correct what you did

    INSERT INTO Customers
    VALUES
    ('078','1970-01-01', 'bob@bob.com', 'L1 3RT'),
    ('079','1981-01-02', 'bob@bob.com', 'L1 3RT'), – error
    ('034','1986-05-03', 'bert@aol.com', 'CH1 3ER'),

    ('078','1970-01-01', 'bob2@bob.com', 'L1 3RT1');

    the two lines of the text of the insertion statement are wrong;they have a duplicate primary key, which is the email address. This should have been handled in an update of some kind to could change the birthdate. We really cannot fix the fact that you have a bad system with the improper performance.
    >>I need to detect associations between customers - this is done bymatching customers where any 2 fields of [sic] the 4 (Mobile,birth_date, Email, PostCode) match. <<
    do you understand the fundamental concept of RDBMS? Rows in the tablehave a key which makes them unique. You should have been scrubbing your data before you put it in the table to guarantee that you have actual real records. What you are doing is a way we used to do with Mag tapes and punchcards; throw a lot of bad data in a file sortedand then scan for payers a possible duplicates. We do not do that anymore.

    >>The record [sic: rows are not records!] size is approx 2.5m. <<
    please learn the difference between rows and records is one of your major problemsalso, what do you mean the size of a row is 2.5 m?
    >>Everything I've tried to date is too slow, for instance self-joins on pairs of fields [sic]. <<

    you do you understand the problem is that you are trying to clean up the mess you make after you make it? Do not let the bad data get in in thefirst place.

    >>I've not tried cursors as think this will be a no no too. <<
    oh yes! This like walking on to a vegan website and saying that you want to eat fried babies. :crying::crying:

    Please post DDL and follow ANSI/ISO standards when asking for help.