Blog Post

Reading a SharePoint 2007 List from SSIS 2016


It's been a while, but since I now get to play with SQL 2016 full time, I have new things to talk about!  One of major difficulties that any technology pro has to deal with is integrating different systems.  Vendors really do try to make this as easy as they can - but the main pain points they usually focus on is intra-generational versus inter-generational.  And sometimes when the vendor falls down (looking at you Microsoft), some enterprising people craft solutions - but even then, it's usually intra-, not inter-generational.

You Can't Get There From Here

Specifically, you can connect to SharePoint 2007 with SQL Integration Services 2008R2 using Kevin Idzi's SharePoint List Adapter components provided by Microsoft as "samples".  I've used that many times, and even though it's rough around the edges, what wasn't about 2008R2?  It got the job done - and at the end of the day, that's what we're looking for, right?  (And I'm pre-excusing the roughness of what's coming... :))
Microsoft improved things with the OData support in SharePoint 2010+, and built-in OData source in SSIS 2016.  But again... that's intra-generational.  How does one connect to a long-in-the-tooth SharePoint instance that just hasn't yet been migrated off of with a brand-spanking-new BI stack?  Tough nut to crack, as the SharePoint List Adapter is in beta for 2014, and my hacks at getting it shoehorned into a 2016 environment didn't get me far.

Scripts... To The Rescue Again

Well, if you can't use something pre-packaged, you turn to the script component, right?  It's easy to "resort" to the script, because the script also solves for a few problems that custom objects don't: allowing 3rd party (black box) code to run inside your systems.  At least you can code review (hah!) this stuff.
What I've got for you today is a semi-generic script - you will have some editing and setup to do - that connects (I think?) any version of SSIS to any version of SharePoint (until they deprecate WSDL).  Of course, all those claims are unwarranted and untested - please do you own validation in your environment.  There is a fair bit of code behind the scenes - unrepentantly cribbed from the aforementioned SharePoint List Adapter code on CodePlex, and other internet sources.
Please read through the commentary at the top of the script - you will have to set up the script to work for your server and list - adding some code and service references, and setting up the columns you want to retrieve.
Find the code here.  All you'll need to do is:

  • Drop a Script Component in your Data Flow, and say it's a "source"
  • Set up some output columns
  • Identify an HTML connection manager
  • Alter the "Setup" part of the script as described in the script comments.

Issues Overcome

There were several issues that presented themselves that required a little searching of the internet, and trial-and-error hacking to overcome:

Understanding "Name" in SharePoint

Which isn't a "name", it's a GUID - of course.  There are some hoops to jump through to translate the human-readable list and view name into the GUIDs that the rest of the SharePoint web service needs as arguments.

Reflection - Accessing Script Columns Generically

When you have a script you want to make generic, you don't want to have the "user" of the script have to dig into your code and edit a bunch of assignment code.  (This means assigning a script column called "ID" the value from the SharePoint column "ID".)  It would be nicer if you could just have them define the column mappings, and have the code figure out what buffer properties to set.  That takes use of the System.Reflection namespace in .Net to dynamically find the properties of the BufferManager class that represent the columns you want to set.

Seeing ALL of the SharePoint Columns

Argh.  So frustrating.  I was calling GetListItems with the correct view and (allegedly good) defaults of "null" for query modifiers.  But I wasn't seeing values from all of the columns... and there seemed to be no reason for why not.  Until I stumbled over someone else's description of a similar problem, where they suggested passing an empty XML node instead of null as the "list of columns to get".  Go figure.

Using Blobs for Really Wide Columns

I swear SharePoint is a massive key-value-pair store because of how it abuses SQL :).  Anyway - you can stuff quite a few characters into SharePoint strings, more than SSIS likes to handle (comfortably), and so you may need to use DT_NTEXT (in SSIS) and NVARCHAR(MAX) (in SQL) to handle those.  This issue wasn't too difficult to overcome - there was a little more work to do in reflecting the BufferWrapper properties to see if a column was DT_NTEXT (in C#, it's a BlobColumn), then in assigning the value to the buffer.

Issues Left

I tried (a little) to make the script completely copy-paste-able... but gave up because of...

Dynamic Reference to Web Services

The script, as written, requires you - the script "user" - to add two "Service References" and two "References" to the project in VSTA in order for it to work.  From what I can see, it is possible to do this only from code, but didn't make time to go this far.  If you can extend the script to do so and wish to share it, please add a comment to where you publish the extension!

Original post (opens in new tab)


You rated this post out of 5. Change rating




You rated this post out of 5. Change rating