How to insert data from localdb to amazon redshift?

  • Hi,

    I have a populated table on my local machine, I have created the linked server and connection is working fine.

    I have also created the same table on redshift cluster and i want to insert all the data from localdb to redshift.

    this is my script:

    EXECUTE('INSERT INTO viasatsubscriptionID

    ,subscriptionrowdate

    ,phonenumberday

    ,viasatcustomerid

    FROM testschema.testoperation') AT REDSHIFT64

    SELECT viasatsubscriptionID

    ,subscriptionrowdate

    ,phonenumberday

    ,viasatcustomerid

    FROM rdata.dbo.testoperation

    and throw this error:

    OLE DB provider "MSDASQL" for linked server "REDSHIFT64" returned message "[Amazon][RedShift ODBC] (30) Error occurred while trying to execute a query: ERROR: syntax error at or near ","

    LINE 2: ,subscriptionrowdate

    where am i going wrong?

  • I don't know much about Redshift syntax, but wouldn't it be

    insert into table(col1, col2)

    ...

    ie, with the columns in brackets?

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • it would be but since we are inserting from local to remote it has to be either openquery or execute command. i tried both but always returns an error.

  • According to the documentation (here), the INSERT INTO ... FROM construction does not appear to be valid.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • sqlbidev (7/7/2015)


    Hi,

    I have a populated table on my local machine, I have created the linked server and connection is working fine.

    I have also created the same table on redshift cluster and i want to insert all the data from localdb to redshift.

    this is my script:

    EXECUTE('INSERT INTO viasatsubscriptionID

    ,subscriptionrowdate

    ,phonenumberday

    ,viasatcustomerid

    FROM testschema.testoperation') AT REDSHIFT64

    SELECT viasatsubscriptionID

    ,subscriptionrowdate

    ,phonenumberday

    ,viasatcustomerid

    FROM rdata.dbo.testoperation

    and throw this error:

    OLE DB provider "MSDASQL" for linked server "REDSHIFT64" returned message "[Amazon][RedShift ODBC] (30) Error occurred while trying to execute a query: ERROR: syntax error at or near ","

    LINE 2: ,subscriptionrowdate

    where am i going wrong?

    Just remember that whatever text you include in that EXECUTE is going to run at the destination server, so every reference you make has to be valid in that environment, so you probably can't refer to things local to you unless you have a linked server set up on their end that points back to your local server. I'm no suggesting that you set that up... I'm just calling attention to the need for the SQL you send to that REDSHIFT64 server to be valid from that server's point of view. That server isn't going to know anything about tables or functions or procs in your local database.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Also, why wouldn't you use the following?

    INSERT INTO REDSHIFT64.databasename.testschema.testoperation

    (viasatsubscriptionID, subscriptionrowdate,

    phonenumberday, viasatcustomerid)

    SELECT viasatsubscriptionID, subscriptionrowdate,

    phonenumberday, viasatcustomerid

    FROM rdata.dbo.testoperation

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • There are various way to load data from database to amazon redshift[/url], But Here I am sharing very simple way to do same. Simply You have to follow some steps that I will mention below. Kindly have a look.

    1.Export local RDBMS data to flat files (Make sure you remove invalid characters, apply escape sequence during export)

    2.Split files into 10-15 MB each to get optimal performance during upload and final Data load

    3.Compress files to *.gz format so you don’t end up with $1000 surprise bill 🙂 .. In my case Text files were compressed 10-20 times

    4.List all file names to manifest file so when you issue COPY command to Redshift its treated as one unit of load

    5.Upload manifest file to Amazon S3 bucket

    6.Upload local *.gz files to Amazon S3 bucket

    7.Issue Redshift COPY command with different options

    8.Schedule file archiving from on-premises and S3 Staging area on AWS

    9.Capturing Errors, setting up restart ability if something fails

    Doing it easy way

    Hope It will be helpful.:-)

    If you are getting trouble, Please follow this links.There are various alternate here and video is shared with demo work.

    http://zappysys.com/posts/sql-server-to-redshift-data-load-using-ssis/

    SSIS Json source[/url] : The Json file soure is used to to read data from Json file. We have mentioned source code with example.

  • sqlbidev (7/7/2015)


    Hi,

    I have a populated table on my local machine, I have created the linked server and connection is working fine.

    I have also created the same table on redshift cluster and i want to insert all the data from localdb to redshift.

    this is my script:

    EXECUTE('INSERT INTO viasatsubscriptionID

    ,subscriptionrowdate

    ,phonenumberday

    ,viasatcustomerid

    FROM testschema.testoperation') AT REDSHIFT64

    SELECT viasatsubscriptionID

    ,subscriptionrowdate

    ,phonenumberday

    ,viasatcustomerid

    FROM rdata.dbo.testoperation

    and throw this error:

    OLE DB provider "MSDASQL" for linked server "REDSHIFT64" returned message "[Amazon][RedShift ODBC] (30) Error occurred while trying to execute a query: ERROR: syntax error at or near ","

    LINE 2: ,subscriptionrowdate

    where am i going wrong?

    You're assuming that what you've written is a single statement when it's two:

    EXECUTE('INSERT INTO viasatsubscriptionID

    ,subscriptionrowdate

    ,phonenumberday

    ,viasatcustomerid

    FROM testschema.testoperation') AT REDSHIFT64

    SELECT viasatsubscriptionID

    ,subscriptionrowdate

    ,phonenumberday

    ,viasatcustomerid

    FROM rdata.dbo.testoperation

    and of course the first statement has several syntax errors.

    Steve's solution has syntax correct for what you are attempting to do.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • [font="Arial Black"]{EDIT}[/font] I see that the WebMasters have restored the post I cited below. I left my original post, though.

    A member known as "SQL Daddy" posted a very useful post on this thread and it was caught by the despamming software of this site and hidden. I've requested that they unhide the post but it appears to have immediately useful data in it so I'm reposting the hidden data here...

    There are various way to load data from database to amazon redshift, But Here I am sharing very simple way to do same. Simply You have to follow some steps that I will mention below. Kindly have a look.

    1.Export local RDBMS data to flat files (Make sure you remove invalid characters, apply escape sequence during export)

    2.Split files into 10-15 MB each to get optimal performance during upload and final Data load

    3.Compress files to *.gz format so you don’t end up with $1000 surprise bill 🙂 .. In my case Text files were compressed 10-20 times

    4.List all file names to manifest file so when you issue COPY command to Redshift its treated as one unit of load

    5.Upload manifest file to Amazon S3 bucket

    6.Upload local *.gz files to Amazon S3 bucket

    7.Issue Redshift COPY command with different options

    8.Schedule file archiving from on-premises and S3 Staging area on AWS

    9.Capturing Errors, setting up restart ability if something fails

    Doing it easy way

    Hope It will be helpful.

    If you are getting trouble, Please follow this links.There are various alternate here and video is shared with demo work.

    http://zappysys.com/posts/sql-server-to-redshift-data-load-using-ssis/

    SSIS Json source : The Json file soure is used to to read data from Json file. We have mentioned source code with example.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • When loading data into any column-orientated database it is important to know how the underlying architecture works.

    Both SQL Server and Redshift have much the same architecture for column-oriented tables. There are minor differences in details, but the same principals apply to both systems. I will refer to both SQL Server and Redshift as C-O databases below.

    A key feature of these C-O databases is that space is managed in units of 1 extent. A given extent is only ever written to, read, or deleted. No updates are ever done to an extent after it has been written. An index of unique values for each column and which rows contain a given value is held in memory. The index can be updated.

    A key aspect of making C-O databases perform is optimising the placement of data into the extents. In a worst case, each row insert is followed by a Commit. This will give you one row per extent, resulting in a lot of disk space being used and a lot of I-O and very little in the way of performance.

    A good scenario for inserts is to use a naturally ascending key value for the Cluster Key (SQL) or Sort Key (Redshift), and to insert data in batch sizes that are multiples of the extent size. This way you get the minimum number of extents used, but remember that the last extent for the last batch will only ever be partially filled. Both SQL and Redshift allow multiple batches to be inserted in parallel which reduces overall load time. As with traditional SQL Server, the order of extents on a disk is not important for performance.

    The best type of insert key to use is a datetime value. This is always ascending, and will always be unique (subject to timer resolution). Both C-O databases allow you to have a Primary Key that is different to the insert key. For many tables, the datetime a row was last changed can be an ideal insert key leaving you free to use whatever primary key you need.

    When you want to update data, pre-sort the updates using the same key as the inserts, and insert the data in batches. This again will use the minimum number of extents. The in-memory index will get updated to reflect which extent now holds the row with a given primary key, and performance should remain good.

    For both C-O databases you should plan regular maintenance to reclaim space (which reduces IO and therefore helps performance), but if you use a datetime as the insert key you will minimise the amount of maintenance needed. At my old place we had multiple update runs from SQL to Redshift during each business day and only needed to do a Vacuum once a week.

    When you know how the architecture works, you can plan the optimum data load and update strategy. The optimum way to load data into a C-O database is far more rigid than loading data into a traditional page-orientated relational database.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • Apologies, Ed. I'm out of my element. What does "C-O" mean in this case?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • C-O: column-orientated. The main purpose of my post above was to highlight the internal structure of column-orientated storage is different to row-orientated storage, and that you need to work with the strengths of C-O storage if you are going to get optimal performance out of them.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply