Automatic Text File Import Challenge

  • I am trying to import a text file with 37 fields delineated by "/". This file is from a live GPS feed from several devices so it is constantly being updated. Each device is identified with a unique ID. The timestamp for each entry is in the following format: DDHHMMSS, so 2:10:45 PM today would be 10141045--no date or year is given.

    Importing the data is easy. However, I need to parse out some the information, like the timestamp field, into usable information (make it into a real timestamp that SQL can understand). In addition, I need to have this setup so that it automatically runs every couple of minutes and imports only the most recent data. Just to make things interesting, the script would need to make sure each line is complete because it could attempt to important data while a new record is being recorded so not all of the 37 fields would be there.

    Finally, I need to identify the most current record for each device ID and indicate it somehow in the database (flag it in some way). We will be using the data in a GIS mapping program, so by identifying the most current record, I can also show the breadcrumbs (previous locations). I only need to be able to see the previous 15 records or so, but I need to store all of the records long term. One idea I had was to use a field within the database that stores a number 1-16 with 1 being the most current record and 16 being the oldest. However, I have no idea how to do this.

    I am at a complete loss on how to do this. Any help would be greatly appreciated. Thanks.

  • >>I need to parse out some the information, like the timestamp field, into usable information (make it into a real timestamp that SQL can understand).

    I would think you'd want an timestamp field defaulted to getdate() in your destination table. You could combine this with your DDHHMMSS format to generate a real datetime field with which you could make time and date based calculations. Since there is no month or year in the incoming data you will need to insure you import records created in a particular month are imported in the same month. This is a problem with your source data which I would consider serious. This will create integrity/timing problems. What if you import records in the first minute of January? You'll have to have some sort of logic to say these are really December records.

    >>I need to have this setup so that it automatically runs every couple of minutes

    Not a problem just schedule it a a desired frequency.

    >> imports only the most recent data

    Typically a series of files would be imported but it sounds like you have one file that is constantly being written to? Correct? Do you have a unique way to identify each record. Then you could import all the data into a staging table and then compare using NOT IN to your destination table to identify 'recent data'. Then import the records which don't exist in your destination table from the staging table.

    >>the script would need to make sure each line is complete because it could attempt to important data

    Typically you can't import data from an open file just for reasons like this.

    >>Finally, I need to identify the most current record for each device ID and indicate it somehow in the database

    Select the max timestamp for each ID (if you can generate a valid datetimestamp)

    >>show the breadcrumbs (previous locations). I only need to be able to see the previous 15 records or so, but I need to store all of the records long term. One idea I had was to use a field within the database that stores a number 1-16 with 1 being the most current record and 16 being the oldest

    Bad idea. If you have valid data you shouldn't have to do this. You would use the TOP statement instead.

  • can you post some sample data you would be importing as well as the CREATE TABLE of the destination table?

    you said the first two values of 10141045 was the day, so it assumes the 10th of the current month, is that correct? you explained the rest of teh field just fine.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • also, here is an example of converting that unusualdate format into a datetime field:

    --results:

    BeginOfMonth DaMonth DaFinalDate

    ----------------------- ----------------------- -----------------------

    2009-12-01 00:00:00.000 2009-12-11 00:00:00.000 2009-12-11 14:10:45.000

    2009-12-01 00:00:00.000 2009-12-11 00:00:00.000 2009-12-11 14:11:37.000

    2009-12-01 00:00:00.000 2009-12-11 00:00:00.000 2009-12-11 15:09:22.000

    create table #import (ImportDateAsUnusualFormat char(10))

    insert into #import

    SELECT '10141045' UNION ALL

    SELECT '10141137' UNION ALL

    SELECT '10150922'

    With MyCTE As (

    --First Day of this Month

    select

    CONVERT(int,substring(ImportDateAsUnusualFormat,1,2)) As theDayOfMonth,

    CONVERT(int,substring(ImportDateAsUnusualFormat,3,2)) As theHours,

    CONVERT(int,substring(ImportDateAsUnusualFormat,5,2)) As theMinutes,

    CONVERT(int,substring(ImportDateAsUnusualFormat,7,2)) As theSeconds

    from #import )

    SELECT

    --showing each substring for reference:

    --Dec 1 for example

    DATEADD(mm, DATEDIFF(mm,0,getdate()), 0) As BeginOfMonth,

    --add theDayOfMonth to the beggining of month

    DATEADD(dd,theDayOfMonth,DATEADD(mm, DATEDIFF(mm,0,getdate()), 0)) As DaMonth,

    --be lazy, convert hours/min/seconds to seconds, and just add them to the date we created:

    dateadd(ss,

    (theHours * 3600) + (theMinutes) * 60 + theSeconds,

    DATEADD(dd,theDayOfMonth,DATEADD(mm, DATEDIFF(mm,0,getdate()), 0))

    ) As DaFinalDate

    From MyCTE

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • As far as turning the text file timestamp into a real timestamp, take a look at this, see if it'll do what you need:

    declare @TS char(8);

    select @TS = '10141045';

    select

    dateadd(month, datediff(month, 0, getdate()), 0),

    substring(@TS, 1, 2) - 1,

    substring(@TS, 3, 2),

    substring(@TS, 5, 2),

    substring(@TS, 7, 2),

    dateadd(second,

    cast(substring(@TS, 7, 2) as int),

    dateadd(minute,

    cast(substring(@TS, 5, 2) as int),

    dateadd(hour,

    cast(substring(@TS, 3, 2) as int),

    dateadd(day,

    cast(substring(@TS, 1, 2) as int) - 1,

    dateadd(month, datediff(month, 0, getdate()), 0)))));

    The first part just creates a variable for testing. The query is split in two. The first half is just the pieces of the string. It's just in there to show you what I'm doing, and you'd get rid of it in the real query. The second part just uses nested DateAdd functions on each piece of the "timestamp".

    Instead of a variable, you'd use a column name, of course.

    For the import, you'd want to import to a staging table. Then you can format the data the way you want, validate that you're only getting the new rows, and then add it to the target table.

    For the numbering thing, I'd do that at runtime, not in the table.

    Might look something like this:

    -- Set up test/proof of concept

    if object_id(N'tempdb..#T') is not null

    drop table #T;

    if object_id(N'tempdb..#GPSes') is not null

    drop table #GPSes;

    create table #T (

    ID int identity primary key,

    GPSID uniqueidentifier,

    LogTime datetime,

    Lat float,

    Long float);

    create table #GPSes (

    ID uniqueidentifier);

    insert into #GPSes (ID)

    select newid()

    from dbo.Numbers

    where Number between 1 and 10;

    with

    Times (DT) as

    (select dateadd(second, Number-1, '12/14/2009')

    from dbo.Numbers

    where Number between 1 and 3600)

    insert into #T (GPSID, LogTime, Lat, Long)

    select ID, DT, checksum(newid())%180, checksum(newid())%180

    from #GPSes

    cross join Times

    order by ID, DT;

    -- Query data (use your real columns and table name here)

    ;with

    CTE as

    (select

    *,

    row_number() over (partition by GPSID order by LogTime desc) as Row

    from #T)

    select *

    from CTE

    where Row <= 16

    order by GPSID, Row desc;

    Does that help?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply