Download content ie page like txt file

  • Hi, I have problem, how can I download content ie page like txt file. Page is : http://www.cnb.cz/cs/financni_trhy/devizovy_trh/kurzy_devizoveho_trhu/denni_kurz.txt?

    I need this page save like txt file on drive, e.g. c:/denni_kurz.txt

    Thanks for help.

  • much easier to do that from an application.

    from sql server 2000, it's difficult to do in TSQL because you must use an object with sp_oacreate to get the text of the web page, and you are limited to 4000 bytes at a time...so if the page is 25K, you need to get 7 "slices" of the web page, one at a time, to get it into sql server.

    I did that for fun in a different thread here on SSC, but it required a separate program i wrote to download the web page selected.

    In my case, I was building a "scraper", so it got a web page and put it in a table in SQL server.

    In order to do this, you'll need to grab the dll from this project and put it in your path or .bin directory for SQL server: the source code, vb6 project, and compiled dll are all in this zip file:

    http://www.stormrage.com/blogpix/WebReader.zip

    [

    this procedure does all the work:

    CREATE FUNCTION dbo.GetWebPage (@webpage varchar(1000)='' )

    RETURNS @webcontents table

    (

    rowid smallint IDENTITY(1,1), --Array index

    pagetext varchar(4000) --Array element contents

    )

    AS

    BEGIN

    DECLARE @hr int,

    @webreader int,

    @Num4K int,

    @i int,

    @contents varchar(4000)

    IF LTRIM(RTRIM(ISNULL(@webpage,''))) =''

    SET @webpage = 'http://www.yahoo.com'

    EXEC @hr = sp_OACreate 'WebReader.Reader', @webreader OUT

    EXEC @hr = sp_OAMethod @webreader, 'GetWebPageContentsAsString',null,@webpage

    EXEC @hr = sp_OAGetProperty @webreader, 'Num4KLengths', @Num4K OUT

    SET @i = 1

    WHILE @i <=@Num4K

    BEGIN

    EXEC @hr = sp_OAGetProperty @webreader, 'PageSlice', @contents OUT, @i

    INSERT INTO @webcontents(pagetext) VALUES(@contents)

    SET @i=@i + 1

    End

    EXEC @hr = sp_OADestroy @webreader

    Return

    End

    select * from dbo.GetWebPage(default)

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • You can do it in an activex script in a DTS package like this:

    Function Main()

    Set socket = CreateObject("MSXML2.ServerXMLHTTP.4.0")

    socket.setTimeouts 300000, 300000, 350000, 350000

    socket.Open "GET", "http://www.cnb.cz/cs/financni_trhy/devizovy_trh/kurzy_devizoveho_trhu/denni_kurz.txt?", False

    socket.send

    Set fso = CreateObject("scripting.filesystemobject")

    Set myfile = fso.createtextfile("c:\denni_kurz.txt", True)

    myfile.write socket.responsetext

    Main = DTSTaskExecResult_Success

    End Function


  • Thanks to all. I use version of mrpolecat, and that's works fine.

  • Please I have one problem yet. I save txt file onto disk, but when I try read this txt file I will get error.

    When I try DTS package and bulk insert with row terminator comma and column terminator TAB, i get error Unexpected end of file.

    When I try stored procedure and bulk insert :

    declare @SQL varchar(8000)

    SET @SQL= 'BULK INSERT #WRK FROM ''c:\denni_kurz.txt' + ''' WITH (CODEPAGE = ''1250'' , FIELDTERMINATOR = ''|'',ROWTERMINATOR ='''') '

    CREATE TABLE #WRK (TEXT_STRING text)

    EXEC(@SQL)

    SELECT * FROM #WRK

    I get only first few rows, but not all.

    When I try vb.net project with StreamReader, then I read all text.

    Where is the problem ?

    I don't see anything wrong in this text.

    Thanks for help.

  • Come on - we can't do all the work for you. You need to understand the format of your incoming data - first, the text file available currently at that URL is in UTF8 format, so you need to set your codepage to 65001. And I'm not sure why you would be trying a column seperator of TAB and rowterminator of comma when the data is pipe-delimited with standard CRLF row terminators...

    Here's a really quick test using the text file available now at the URL - I've made some assumptions as to the column names as I can't read Czech:

    -------------------------------------------------------

    create table #Test

    (Country nvarchar(256),

    CurrencyName nvarchar(256),

    Units int,

    ISOCode char(3),

    CurrentValue varchar(256))

    bulk insert #Test

    from 'c:\denni_kurz.txt'

    with

    (

    FIELDTERMINATOR ='|',

    ROWTERMINATOR ='',

    CODEPAGE = '65001',

    FIRSTROW = 3

    )

    select

    *

    from #Test

    drop table #Test

    -------------------------------------------------------

    It imports fine for me. Note however I've used varchar for the CurrentValue (last column) as the amounts are presumably in European decimal style with commas as the decimal seperator. You'd need to do some further manipulation on that column to get it into decimal format - there may be a neat way of doing that with a proper format file.

    Regards,

    Jacob

  • The url I am accessing requires a user name and the password. How can I pass the user name and the password to down load the file from a url??

  • socket.Open "GET", "http://www.website.com/page.html", False, "username", "password"


  • Hi Lowell.

    Do you still have the file http://www.stormrage.com/blogpix/WebReader.zip available somewhere?

    I did not find it on the stormrage website.

    Thanks.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply