Hopefully an intriguing 'Update' question, for a data cleanse

  • HI all,

    I'm cleaning a table with about 2k rows of false data.  Basically I have 3 columns as follows:
    CREATE TABLE [dbo].[logs](
     [id] [int] NOT NULL,
     [profileId] [int] NULL,
     [logId] [int] NULL,
     [date] [date] NOT NULL,
     [time] [datetime] NULL,
     [notes] [nvarchar](max) NULL
    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

    The 'Notes' contains details of the class a student attended, the Date column should reflect when he attended.  However in about 2.8% of columns there is a mismatch between the day stated in 'Notes', and the actual date.  I've used the following code to highlight them..

    Select datename(dw,  Date) As  'Day This Data Was Entered', Date, (notes) --, *
    from Logs
    Where Notes not like '%' + datename(dw,  Date) + '%'
     --And (Notes like '%Attended - %' Or Notes like '%Came to %' or Notes like '%Came on%' or Notes like '%Attedned%')
     --And Notes Not like '%Make-up class%'
    --Ensure Weekdays are included in the Notes
     And (Notes like '%Monday%' or Notes like '%Tuesday%' or Notes like '%Wednesday%' or Notes like '%Thursday%' or Notes like '%Friday%' or Notes like '%Saturday%'
      or Notes like '%Sunday%')

    (There are quite a few extra filters but i have  excluded them for brevity)
    Sample of the (faulty) data:
    Day This Data Was Entered Date notes
    Sunday 2016-11-27 Attended - Tuesday Beginners
    Saturday 2009-03-07 Attended 12:00 this past Thursday.
    Tuesday 2016-06-14 Attended Class last Friday

    This particular chain of schools isn't open on Sundays 🙂

    So, what I want to do is update the Date column to reflect the previous weekday,  mentioned in 'Notes', the corrected (sample) data should thus be :
    Tuesday 2016-11-22 Attended - Tuesday Beginners
    Thursday 2009-03-05 Attended 12:00 this past Thursday.
    Friday 2016-06-10 Attended Class last Friday

    Where I'm flummoxed though is how to set the date accordingly, I imagine this would need Dateadd and possibly a variable, and I imagine it would have to be RBAR (ouch!!)

    TIA

    JB

  • Don't have any usable test data, but this should be close at least.  [Btw, overlaying the day name was trivial, so I didn't include that.]


    UPDATE l
    SET /*back up date to specified weekday*/
        date = DATEADD(DAY, -DATEDIFF(DAY, day_number, date) % 7, date)
    FROM dbo.logs l
    CROSS APPLY (
        SELECT CASE
            WHEN notes LIKE '%Monday%' THEN 0
            WHEN notes LIKE '%Tuesday%' THEN 1
            WHEN notes LIKE '%Wednesday%' THEN 2
            WHEN notes LIKE '%Thursday%' THEN 3
            WHEN notes LIKE '%Friday%' THEN 4
            WHEN notes LIKE '%Saturday%' THEN 5
            WHEN notes LIKE '%Sunday%' THEN 6
            ELSE 99 END AS day_number
    ) AS check_for_day_in_notes
    WHERE ...

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • That hit the spot, though I was surprised that: 

    A) You concocted that advanced code in what...15 mins flat??!?!?
    B) The code affected 86,808 rows of data, which was initially worrisome as I only wanted 2095 updated, but I compared that to the primary copy where there were only 84713 rows that had both a day specified and the RIGHT date, hey presto, that too is a difference of 2095.
    C) The update ran in 22 secs flat, where the select above ran at the fastest at 2:30 !!

    Thanks!!

    BTW by 'usable' test data, does that mean comma-separated around here??

  • A) Yeah ... I've worked with date shifting a lot 🙂
    C) Yes, we strive for performance here quite a bit.  The "DATEADD" technique is only math, not text, so it's extremely fast.

    This shows "usable" data; i.e., code that execs correctly to create sample data:

    CREATE TABLE #logs ( day char(9), date date, notes nvarchar(4000) )
    INSERT INTO #logs VALUES
        ('Tuesday', '2016-11-22', 'Attended - Tuesday Beginners'),
        ('Thursday', '2009-03-05', 'Attended 12:00 this past Thursday'),
        ('Friday', '2016-06-10', 'Attended Class last Friday')
    SELECT * FROM #logs

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply