SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 

Parsing SQL Saturday Data – Getting Titles from the XML document

I’m continuing on with my project to grab SQL Saturday data and automatically insert it into a SQL Server database. In this piece, I’m picking up from the last one where I had a loop to load all XML documents in a folder based on a pattern.

This time I want to query the XML and get out specific elements and capture them.

The Source

The XML source looks like this for the sessions:

</event>
<event>
  <importID>2102</importID>
  <speakers>
    <speaker>
      <id>2102</id>
      <name>Jason Strate</name>
    </speaker>
  </speakers>
  <track>Track 3</track>
  <location>
    <name>2520C (Conference room)</name>
  </location>
  <title>Using XML to Query Execution Plans </title>
  <description>SQL Server stores its execution plans as XML in dynamic management views. The execution plans are a gold mine of information. From the whether or not the execution plan will rely on parallelism to what columns are requiring a key lookup after a non-clustered index seek. Through a the use of XML this information can be available at your fingertips to help determine the value and impact of an index and guide you in improving the performance of your SQL Server databases. In this session we’ll look at how you can begin to understand and query the structure of the execution plans in the procedure cache. Also, we’ll review how to uncover some potential performance issues that may be lurking in your SQL Server.</description>
  <startTime>9/18/2010 12:15:00 PM</startTime>
  <endTime>9/18/2010 1:30:00 PM</endTime>
</event>
<event>
  <importID>2109</importID>
  <speakers>
    <speaker>
      <id>2109</id>
      <name>Jason Strate</name>
    </speaker>
  </speakers>
  <track>Track 4</track>
  <location>
    <name>2520D (Seminar room)</name>

I’m showing the end of one element, one whole one, and the start of a third. There is a lot of extraneous information in the document that I don’t want (for now). As a result, it’s not as simple to query this as I’d thought before. Especially as I’ll want to capture each session title and insert it into a database.

I decided to use a SelectNodes to get to the <event> nodes and then loop through them. The code looks like this:

# get the event node
$sessions = $doc.SelectNodes("//event")

# loop through the various //event nodes
foreach ($session in $sessions) {

Note that this is inside of the code from the previous post.

Inside of this loop, I decided to create another loop. Initially I didn’t, but that made it more difficult to determine the end of the event node and capture the values, especially the speakers. As a result, I have a sub loop at well:

# probably a better way, but I wanted to loop through the various elements and only pick out certain ones
foreach ($detail in $session.ChildNodes) {

If anyone has a better way, let me know. I’ll have all the code below, but this technique allows me to look for specific nodes. I know I could query for them, but since I’m looking for a few specific items, I thought I’d do this rather than multiple queries later.

Get the Title

I actually need the title and the speaker child node, but I’m doing titles only here. Here’s the whole node loop code:

foreach ($detail in $session.ChildNodes) {

  # If we’re on the title node, get the value
  if ($detail.Name -eq "title") {
    $title = $detail.’#text’
   }

  if ($detail.Name -eq "speakers") {
    #placeholder
   }
#end foreach for $detail
}

Here if I have the title element in the foreach loop, I capture it. This allows me to use this variable later. I’ll go into the speaker code later, but for now, I left a placeholder.

That’s really it. At the end of the outer foreach, I write out the $event and $title variables. This gives me a nice output to the screen. From here I can easily substitute some ADO code to send this to SQL Server instead of the write-host, but that’s a good programming technique for me to see if I’ve got the data I want.

sqlsatloop_d

As you can see, there are sessions that I don’t want, but there’s nothing in the data for me to tag them as non-educational sessions. I’m not sure I care, since the speakers associated with these won’t impact my results for reports, so I’ll leave them.

Next Steps

From here I need to extract the speakers before I insert data into SQL Server. That will be the next step before I create the database and then insert data.

The Code

Here’s the entire code:

#ViewXML_Basic
# View XML file data from a website

$debug = 0;
# counter for events
$i = 1

#when do we stop?
$loopend = 400
$baseURL = "E:\SQLSatData\SQLSat"
$loop = 1
$doc = New-Object System.Xml.XmlDocument

do {
#start large loop

  # get the filename
  $sourceURL = $baseURL + $i + ".xml"

  # debug information
  if ($debug -eq 2) {
    Write-Host $sourceURL
    }

  #test the path first. If it exists, load the XML
  if (Test-Path $sourceURL) {
    $doc.Load($sourceURL)

    #trap the event number. This will be the ID I use in the database table.
    $event = "SQL Saturday #" + $i

    # get the event node
    $sessions = $doc.SelectNodes("//event")

    # loop through the various //event nodes
    foreach ($session in $sessions) {
 
    # probably a better way, but I wanted to loop through the various elements and only pick out certain ones
    foreach ($detail in $session.ChildNodes) {

      # If we’re on the title node, get the value
      if ($detail.Name -eq "title") {
        $title = $detail.’#text’
       }

      if ($detail.Name -eq "speakers") {
        #placeholder
       }
     #end foreach for $detail
     }

    write-host $event ": " $title
   
    # placeholder – insert into table here. $i, $title

    $title = ""
    $speakers = ""

   #end foreach for $sessions   
   }

   # end test path
   }
  # increment loop
  $i++

#end outer loop
} while ($i -lt $loopend)

write-host "end"


Filed under: Blog Tagged: powershell, SQL Saturday, syndicated, xml

The Voice of the DBA

Steve Jones is the editor of SQLServerCentral.com and visits a wide variety of data related topics in his daily editorial. Steve has spent years working as a DBA and general purpose Windows administrator, primarily working with SQL Server since it was ported from Sybase in 1990. You can follow Steve on Twitter at twitter.com/way0utwest

Comments

Leave a comment on the original post [voiceofthedba.wordpress.com, opens in a new window]

Loading comments...