To celebrate the announcement of the planned demise of Google Reader, I’ve done a PowerShell script that gives you the items from the OPML collection of feeds that you import or export between your feed readers. Basically, you create your own primitive feed reader. I’m afraid it isn’t as good as Google Reader.
So what is involved? RSS/Atom is a rather loose definition, in that the only attribute a feed item actually needs is link and the content. The spec has been liberally interpreted too, so that there isn’t much you can really guarantee being able to read every RSS file…
To get a well-constructed RSS feed is trivial. In PowerShell v3, it is a one-liner. The problem is in getting resilience. To get every feed to work is a struggle, and so I apologise for giving up at a point.
Because I can throw lists of links at this routine instead of an OPML, or use it in a function with several OPML files, I use this type of PowerShell routine for specific tasks such as checking to see if particular groups of sites have had postings. It is very easy to set up an alert if a particular site gets a posting.
I’ve added things to the script to take out all the HTML tags from the description and just view the first five-hundred characters. I’ve limited it to the first hundred feeds just to test it, and I’ve limited it to report just the current days articles. You’ll want to change all that, I expect.
You’ll need to fill in the path to the location of your OPML file (basically an XML list of links), and the number of days back you want to read items from, and either change or delete the ‘Select -first 100 | ‘ bit, which just gets the first articles. You’ll want to change the (truncate ($_.xxx -replace “<.*?>”) 500) (take out all the HTML tags and truncate to 500 characters or less) to suit your tastes.
At the end of the pipeline you can, of course, save the results to a database or file, or maybe send it as an email, or format it into an HTML file: but there is no sense in adding all that stuff because you know it already!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
$MyOPMLFile= '.\AllMyFeeds.opml' #change this to the name of your OPML file $RestError=[xml]'<broken></broken>' $DaysBack=[int]-1 #the number of days back you want articles from function truncate([string]$value, [int]$MaxLength) {#can you believe there is no powershell built-in way of doing this? if ($value.Length -gt $MaxLength) { $value.Substring(0, $MaxLength) } else { $value } } [xml]$opml= Get-Content $MyOPMLFile # grab the OPML file of feeds $opml.opml.body.outline.outline.xmlurl| Select -first 100 | # only the first few for testing foreach {try{Invoke-RestMethod $_} catch{ $RestError }} | # flag if an error happened where {{try {$_.SelectSingleNode('link')} catch{$null} -ne $null}} | #filter out 404s, malformed items and bad links <# Each <item> within a feed represents an article. The <item> must include at least the following elements: <link>: The canonical URL for the article. <content:encoded>: The full HTML content of the article. But you are also likely to find .. <title>: The article's headline. If it isn't there, you'd need to find it in the content <pubDate>: The date of the article's publication, in RFC822 format. <description>: A short, summary or abstract of the article. <dc:creator>: Name of the person who wrote the article. <media:content> and <media:group>: URLs and metadata for image, video, and audio assets. #> Select @{name="Title"; Expression = {try {$_.title} catch {'Unknown title'}}}, @{name="Description"; # this isn't mandatory, but you can get the content Expression ={try { if ($_.SelectSingleNode('description') -eq $null) { truncate ($_.encoded.'#cdata-section' -replace "<.*?>") 500} elseif ( $_.description.ToString() -eq 'description') {truncate ($_.description.'#cdata-section' -replace "<.*?>") 500 } else {truncate ($_.description -replace "<.*?>") 500 }} catch {'error'}}}, @{name="PubDate"; Expression = {try {get-date ($_.PubDate -replace "UT")} # force it into a PS date catch {Get-Date '01 January 2006 00:00:00'}}}, @{name="author"; Expression = {try {if ( $_.author.length -eq 0) {$_.creator} else {$_.author}} catch{'Unknown Author'}}}, link | #we already checked for a link! where-object {$_.Pubdate -gt (Get-Date).AddDays($DaysBack)} # we only get the fresh news from the last couple of days. |
If you don’t already have an OPML file to practice on, here is one you can use that I’ve put together to give you exciting articles and blogs from Simple-Talk. Just save it to a file, extend it with your favourite blogs and sites, and you’ll soon be wondering why you ever felt that Google Reader was essential! Of course, you can still use the routine above with a simple list of RSS feeds, but then you wouldn’t have something that could be stitched into your news feed reader OPML file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
<?xml version="1.0" encoding="ISO-8859-1"?> <opml version="1.1"> <head> <title>SimpleTalk Subscriptions</title> <dateModified>Wed, 20 Mar 2013 07:21:56 GMT</dateModified> </head> <body> <outline text="simple-talk"> <outline text="Home Page" title="Simple Talk Home Page" type="rss" xmlUrl="https://www.simple-talk.com/feed/" htmlUrl="https://www.simple-talk.com/"/> <outline text="SQL Articles" title="SQL Home" type="rss" xmlUrl="https://www.simple-talk.com/sql/rss.aspx" htmlUrl="https://www.simple-talk.com/sql/"/> <outline text=".NET Articles" title=".NET Articles" type="rss" xmlUrl="https://www.simple-talk.com/dotnet/rss.aspx" htmlUrl="https://www.simple-talk.com/dotnet/"/> <outline text="SysAdmin Articles" title="SysAdmin Articles" type="rss" xmlUrl="https://www.simple-talk.com/sysadmin/rss.aspx" htmlUrl="https://www.simple-talk.com/sysadmin/"/> <outline text="Opinion and Geeks" title="Opinion and Geeks" type="rss" xmlUrl="https://www.simple-talk.com/opinion/rss.aspx" htmlUrl="https://www.simple-talk.com/opinion/"/> <outline text="Books and Book Reviews" title="Books and Book Reviews" type="rss" xmlUrl="https://www.simple-talk.com/books/rss.aspx" htmlUrl="https://www.simple-talk.com/books/"/> <outline text="Cloud" title=".NET Articles" type="rss" xmlUrl="https://www.simple-talk.com/cloud/rss.aspx" htmlUrl="https://www.simple-talk.com/cloud/"/> <outline text="Blogs" title=".NET Articles" type="rss" xmlUrl="https://www.simple-talk.com/blogs/feed/" htmlUrl="https://www.simple-talk.com/blogs/"/> </outline> <outline text="SQL Server Central"> <outline title="Main Articles" text="www.sqlservercentral.com/Xml/Rss/articles" type="rss" xmlUrl="http://www.sqlservercentral.com/Xml/Rss/articles"/> <outline title="SQL Server Central Blogs" text="www.sqlservercentral.com/blogs/feed/" type="rss" xmlUrl="http://www.sqlservercentral.com/blogs/feed/"/> <outline title="Ask Sqlservercentral Questions" text="ask.sqlservercentral.com/feed/questions.rss" type="rss" xmlUrl="http://ask.sqlservercentral.com/feed/questions.rss"/> </outline> </body> </opml> |
Load comments