Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Chad Miller

Chad Miller is a Senior Manager of Database Administration at Raymond James Financial. Chad has worked with Microsoft SQL Server since 1999 and has been automating administration tasks using Windows Powershell since 2007. Chad is the Project Coordinator/Developer of the Powershell-based Codeplex project SQL Server PowerShell Extensions (SQLPSX). Chad leads the Tampa Powershell User Group and is a frequent speaker at users groups, SQL Saturdays and Code Camps.

Using Invoke-WebRequest

The website PowershellCommunity.org is moving to a new site poweshell.org and I wasn’t sure the old forum posts will be brought over. Well, I kind like some of┬áthe answers I provided in the SQL Server forum and wanted to download them into a text file in case I ever need to refer to them. It looks like I finally found a use case for the Powershell V3 cmdlet, Invoke-WebRequest which sends web requests and allows you to parse the response.

The following script is purpose built. Of course any parsing of HTML is going very specific to the problem at hand. Powershell and the invoke-webrequest cmdlet provides an excellent way to explore the details of a webpage so you can quickly craft a custom web-scraping script. Here’s the script I came up with, again it won’t be applicable to your particular task, but it shows an example of using invoke-webrequest to parse links and download specific text from various pages.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$outputfile = 'C:UsersPublicbinpscommunitysql.txt'
 
1..14 | foreach {
            invoke-webrequest -Uri http://www.powershellcommunity.org/Forums/tabid/54/aff/24/afpg/$_/Default.aspx |
                where { $_.Links | where { $_.title -and $_.title -notlike "*Page" -and $_.title -notlike "PowerShellCommunity*" } |
                    foreach { invoke-webrequest -Uri $_.href } |
                        where { $_.Links | where { $_.href -like "*printmode*" } |
                            foreach { invoke-webrequest -Uri $_.href } |
                                foreach { $_.ParsedHtml.forms |
                                    foreach { "####################`n$($_.innerText)" | out-file $outputfile -append -encoding 'utf8' }
                                }
                        }
                }
        }

Explanation

  1. On line 3, I know there’s 14 pages of forums questions which I determined by looking at SQL Server forum in the browser with roughly 30 questions per page, so I’ll loop through 1 to 14
  2. On lines 4-5, I loop through each page then grab the link for each question by filtering out links which don’t apply. This was something I determined through trial and error looking at the object invoke-webrequest returned in the previous line
  3. On lines 6-7 I’ll get the web page for each question and since a question can have multiple page answers, I’ll call invoke-webrequest on the questions’s print preview mode which shows the question as a single page. This was something figured out by looking at links being returned by the invoke-webrequest call and noticing a little printer icon when viewing the same page in the browser
  4. Finally on lines 8 – 10, I’ll use invoke-webrequest to get the print preview page for the question, get the parsed forms data and then the inner text for the form which is appended to a text file.

Comments

Leave a comment on the original post [sev17.com, opens in a new window]

Loading comments...