SQLServerCentral Article

HTTP Requests Using SQLCLR

,

Introduction

There is no shortage of a need for connecting to data in various systems. These integrations are often done by pulling data from web APIs, specifically over HTTP. To get this data available in another database, usually a tool such as SSIS or another programming language is involved to utilize their HTTP libraries to connect to the API and load its data into a database.

This article aims to remove the need for other programming languages or tools by utilizing SQL Server's CLR integration to create a function that allows sending HTTP requests and receiving the responses directly in the database. It still utilizes another programming language to do the HTTP request, but it's a single object that is created once and can then be used to query the internet (literally!) from SQL. This has a lot of potential to cut back on development time and also simplify architecture.

If the idea of this being a native function in SQL Server that doesn’t need CLR integration is appealing, feel free to vote for this feature suggestion: https://feedback.azure.com/forums/908035-sql-server/suggestions/34429699-http-request-function

APIs typically return data in either JSON or XML formats and SQL Server already has rich features for working with data in these formats. Sometimes things can get a little more challenging if the response is CSV data, but another CLR function to convert CSV to JSON or XML could alleviate that.

Also, to simplify working with CSV in SQL Server this would be another great feature suggestion to vote for: https://feedback.azure.com/forums/908035-sql-server/suggestions/34429636-opencsv-function

Use Cases/Examples

More detail on the CLR function is found later in this article, but it might be useful to understand usage before getting into how it works. Here is an example to retrieve Stack Overflow questions from their API.

-- Query the Stack Overflow API and get a response
DECLARE @response XML = 
    [dbo].[clr_http_request]
        (
            'GET', 'http://api.stackexchange.com/2.2/questions?site=stackoverflow', 
            NULL, NULL, 300000, 1, 0
        );
-- Extract just the body of the response (expecting JSON)
DECLARE @response_json NVARCHAR(MAX) = @response.value('Response[1]/Body[1]', 'NVARCHAR(MAX)');
-- Parse the JSON into a tabular format
SELECT 
    B.[question_id],
    B.[title],
    B.[tags],
    B.[is_answered],
    B.[view_count],
    B.[answer_count],
    B.[score]
FROM OPENJSON(@response_json) WITH ([items] NVARCHAR(MAX) AS JSON) A
CROSS APPLY OPENJSON(A.[items]) WITH 
    (
        [question_id] INT,
        [title] NVARCHAR(MAX),
        [tags] NVARCHAR(MAX) AS JSON,
        [is_answered] BIT,
        [view_count] INT,
        [answer_count] INT,
        [score] INT
    ) B;

The results look like this:

question_id title tags is_answered view_count answer_count score
52088338 QuickCheck - Haskell (generate a random string-date) ["haskell","quickcheck"] 0 1 0 0
52086832 Cross Database Join Collation Issue ["sql-server","union","collation"] 0 12 0 0
52088335 Ubuntu / NoMachine ["ubuntu","rdp","nomachine"] 0 2 0 0
52088005 Exotic GROUP BY In MySQL ["mysql","indexing"] 1 20 1 2
52048980 Apache modrewrite htaccess redirect [".htaccess","mod-rewrite"] 0 16 1 0

To better understand what is happening, here is a breakdown of each step.

This first part is calling the CLR HTTP Request function, sending a “GET” request to Stack Overflow’s “questions” API endpoint. The function returns XML describing the response. Detail on each of the parameters and the returned XML are described in more detail later in this article.

-- Query the Stack Overflow API and get a response
DECLARE @response XML = 
    [dbo].[clr_http_request]
        (
            'GET', 'http://api.stackexchange.com/2.2/questions?site=stackoverflow', 
            NULL, NULL, 300000, 1, 0
        );

The second part is querying the @response XML to retrieve the body of the response. For more information on querying XML data, specifically the “.value()” function, check out https://docs.microsoft.com/en-us/sql/t-sql/xml/value-method-xml-data-type

-- Extract just the body of the response (expecting JSON)
DECLARE @response_json NVARCHAR(MAX) = @response.value('Response[1]/Body[1]', 'NVARCHAR(MAX)');

In this example, the response looks something like this:

{
   "items":[
      {
         "tags":["haskell", "quickcheck"],
         "is_answered":false,
         "view_count":1,
         "answer_count":0,
         "score":0,
         "question_id":52088338,
         "title":"QuickCheck - Haskell (generate a random string-date)"
      },
      {
         "tags":["sql-server", "union", "collation"],
         "is_answered":false,
         "view_count":12,
         "answer_count":0,
         "score":0,
         "question_id":52086832,
         "title":"Cross Database Join Collation Issue"
      }
   ]
}

The last step is to examine the format of the JSON and use SQL Server’s OPENJSON functionality to parse it into a tabular format. For more information on OPENJSON, see https://docs.microsoft.com/en-us/sql/t-sql/functions/openjson-transact-sql?view=sql-server-2017

-- Parse the JSON into a tabular format
SELECT 
    B.[question_id],
    B.[title],
    B.[tags],
    B.[is_answered],
    B.[view_count],
    B.[answer_count],
    B.[score]
FROM OPENJSON(@response_json) WITH ([items] NVARCHAR(MAX) AS JSON) A
CROSS APPLY OPENJSON(A.[items]) WITH 
    (
        [question_id] INT,
        [title] NVARCHAR(MAX),
        [tags] NVARCHAR(MAX) AS JSON,
        [is_answered] BIT,
        [view_count] INT,
        [answer_count] INT,
        [score] INT
    ) B;

Note, this could all be shortened down to a single SELECT statement if desired, which would look like this:

SELECT 
    B.*
FROM OPENJSON
    (
        [dbo].[clr_http_request]
            (
                'GET', 'http://api.stackexchange.com/2.2/questions?site=stackoverflow', 
                NULL, NULL, 300000, 1, 0
            ).value('Response[1]/Body[1]', 'NVARCHAR(MAX)')
    ) WITH ([items] NVARCHAR(MAX) AS JSON) A
CROSS APPLY OPENJSON(A.[items]) WITH 
    (
        [question_id] INT,
        [title] NVARCHAR(MAX),
        [tags] NVARCHAR(MAX) AS JSON,
        [is_answered] BIT,
        [view_count] INT,
        [answer_count] INT,
        [score] INT
    ) B;

Okay, Stack Overflow was rather easy since it's a basic REST endpoint with no authentication. What if we're trying to reach an API that requires authentication and most of the documentation describes using an SDK/code library that obviously isn't available here?

That does a decent job describing Google AdWords' API. This example will get an access token and use that to pull a performance report from Google AdWords.

-- Authentication variables
DECLARE @refresh_token VARCHAR(500) = '...';
DECLARE @client_id VARCHAR(500) = '...';
DECLARE @client_secret VARCHAR(500) = '...';
DECLARE @client_customer_id VARCHAR(500) = '...';
DECLARE @developer_token VARCHAR(500) = '...';
-- Use the AdWords Query Language to define a query for the Keywords Performance Report and specify desired format
DECLARE @awql VARCHAR(MAX) = '
    SELECT CampaignId, CampaignName, AdGroupId, AdGroupName, Id, Criteria, Device, Date, Impressions, Clicks, Cost, AveragePosition 
    FROM KEYWORDS_PERFORMANCE_REPORT WHERE Impressions > 0 DURING 20180801,20180805
';
DECLARE @fmt VARCHAR(50) = 'XML';
-- Get access token
DECLARE @access_token VARCHAR(500) = JSON_VALUE(
    [dbo].[clr_http_request]
        (
            'POST',
            'https://www.googleapis.com/oauth2/v4/token',
            CONCAT('grant_type=refresh_token&refresh_token=', @refresh_token, '&client_id=', @client_id, '&client_secret=', @client_secret),
            NULL,
            300000,
            0,
            0
        ).value('/Response[1]/Body[1]', 'NVARCHAR(MAX)'),
    '$.access_token'
);
-- Get report
DECLARE @report_xml XML =
    CAST(REPLACE(
        [dbo].[clr_http_request]
            (
                'POST',
                'https://adwords.google.com/api/adwords/reportdownload/v201802',
                CONCAT('__fmt=', @fmt, '&__rdquery=', @awql),
                CONCAT('
                    <Headers>
                        <Header Name="Authorization">Bearer ', @access_token, '</Header>
                        <Header Name="developerToken">', @developer_token, '</Header>
                        <Header Name="clientCustomerId">', @client_customer_id, '</Header>
                    </Headers>
                '),
                300000,
                1,
                0
            ).value('/Response[1]/Body[1]', 'NVARCHAR(MAX)'),
        '<?xml version=''1.0'' encoding=''UTF-8'' standalone=''yes''?>',
        ''
    ) AS XML);
-- Parse report XML
SELECT 
    A.[row].value('@campaignID', 'BIGINT') [campaign_id],
    A.[row].value('@campaign', 'VARCHAR(500)') [campaign_name],
    A.[row].value('@adGroupID', 'BIGINT') [ad_group_id],
    A.[row].value('@adGroup', 'VARCHAR(500)') [ad_group_name],
    A.[row].value('@keywordID', 'BIGINT') [keyword_id],
    A.[row].value('@keyword', 'VARCHAR(500)') [keyword],
    A.[row].value('@device', 'VARCHAR(50)') [device],
    A.[row].value('@day', 'DATE') [date],
    A.[row].value('@impressions', 'INT') [impressions],
    A.[row].value('@clicks', 'INT') [clicks],
    A.[row].value('@cost', 'BIGINT') [cost],
    A.[row].value('@avgPosition', 'FLOAT') [average_position]
FROM @report_xml.nodes('/report/table/row') A ([row]);

Results look like this:

campaign_id campaign_name ad_group_id ad_group_name keyword_id keyword device date impressions clicks cost average_position
1234 Campaign A 51423 Cool Stuff Ad Group 6211415 cool stuff Computers 8/5/2018 1430 229 3210.631846 2.4
1234 Campaign A 51423 Cool Stuff Ad Group 6211415 cool stuff Computers 8/3/2018 588 162 680.4804449 1.7
1234 Campaign A 51423 Cool Stuff Ad Group 6211415 cool stuff Computers 8/4/2018 368 29 742.1357105 3.3
1540 Campaign B 62130 Awesome Stuff Ad Group 7133217 awesome stuff Mobile devices with full browsers 8/5/2018 1563 563 1094.902162 1.8
1401 Campaign C 61050 Mediocre Stuff Ad Group 6321354 mediocre stuff Computers 8/1/2018 1236 190 643.7945118 2.2

The next steps could be to take this code, wrap up the two calls to [dbo].[clr_http_request]() in a scalar function to retrieve the report XML, and create a table-valued function to do the parsing. Then one could easily query the API with a query like this:

SELECT * FROM [dbo].[ufn_parse_adwords_xml]([dbo].[ufn_get_adwords_report](@awql));

Note: This example is only intended to give an idea of what is possible; explaining in detail would be outside of the scope of this article.

There are many APIs out there to query. Here are several other examples:

  • Other Ad performance reports (Bing/Yahoo!/Facebook/etc.)
  • Affiliate networks (commissions, etc.)
  • Currencies
  • Time zones
  • User agent parsers
  • SaaS applications, like ticketing systems such as Jira

Or maybe one could use this for testing automation or monitoring by querying a website to simply check for availability. Here's a basic example just getting a response from Google’s home page:

SELECT [dbo].[clr_http_request]('GET', 'https://www.google.com/', NULL, NULL, 10000, 0, 0);

It would certainly be interesting to see what other use cases are out there. Please share ideas!

The CLR Function

This is a SQL Server CLR function that calls an assembly written in C#, utilizing its HttpWebRequest class to make an HTTP Request and return the response.

Input Parameters

These are the parameters that can be passed into the function:

  • requestMethod (string)

    Most often "GET" or "POST", but there are several others used for various purposes.

  • url (string)

    The URL attempting to connect to, such as an API endpoint

  • parameters (string)

    If a GET request, these will just get added into the query string. In that case you could just include them in the url parameter and pass NULL for parameters.

    Otherwise, these parameters will be converted to a byte array and added to the content of the HTTP request.

    Format of this parameter matches that of a URL query string where you have key=value pairs separated by "&":

        param1=A&param2=B

  • headers (string, in XML format)

    This allows you to set headers for the HTTP request. They are passed as XML following this format:

                    <Headers>

                                    <Header Name="MyHeader">My Header's Value</Header>

                                    <Header Name="…">…</Header>

                                    <Header Name="…">…</Header>

                    </Headers>

  • timeout - (integer)

    Allows a timeout in milliseconds to be set.

  • autoDecompress (boolean)

    Some APIs may compress their results (Stack Overflow is an example) and passing true here will decompress the response (if it is compressed).

  • convertResponseToBase64 (boolean)

    If the resource delivers binary data, things will break down in saving that data to the XML response for this function. This parameter may be used to resolve that issue by converting the binary data to a Base64 string. Once the Base64 string is in SQL Server it can be decoded back to binary using some magic from the XML datatype. Here is an example:

CAST(@string AS XML).value('.', 'VARBINARY(MAX)')

Results

The result from this function is an XML document generated from the properties available in the HttpWebResponse class. This is the structure of that XML. For more information on these, see the documentation on HttpWebResponse (https://docs.microsoft.com/en-us/dotnet/api/system.net.httpwebresponse)

  • Response - this is the root element

    • CharacterSet

    • ContentEncoding

    • ContentLength

    • ContentType

    • CookiesCount

    • HeadersCount

    • IsFromCache

    • IsMutuallyAuthenticated

    • LastModified

    • Method

    • ProtocolVersionResponseUri

    • StatusCode

    • Server

    • StatusNumber

    • StatusDescription

    • SupportsHeaders

    • Headers

      • Header - each header will get its own node here

        • Name

        • Values - a header can have multiple values in C#'s HttpWebResponse

          • Value

    • Body - this will contain the content from the response

Overview of C# code flow

It would be good to look at the source code to get a deeper understanding of what is happening, but here is a general overview of the flow:

  • Creates an HttpWebRequest object from the url

  • Parses the provided Headers XML to set the HTTP Request's headers

  • Sets the request method, timeout, and automatic decompression configurations based on inputs

  • Adds in parameters (for non-GET requests) by converting the string to a byte array and writing it to the request stream

  • Makes request and retrieves response

  • Converts response into format described above and returns it

Compiling the C# is a straight-forward process. With the solution open in Visual Studio (attached in ClrHttpRequest.zip), right-click on the "ClrHttpRequest" project and choose “Build”. This will create the "ClrHttpRequest.dll" file in bin\Debug, which can then be used with clr_http_request.sql.

A couple notes:

  1. Visual Studio 2017 Community Edition with SQL Server Data Tools was used to develop this. You can get Visual Studio at https://visualstudio.microsoft.com/downloads/ and while installing you may choose to install SQL Server Data Tools under “Data storage and processing”. For more information see https://docs.microsoft.com/en-us/sql/ssdt/download-sql-server-data-tools-ssdt

  2. This is currently compiled with .NET 4.5 as the target framework. That will work with SQL Server 2012 and later (but examples in this article using JSON functionality will only work in SQL Server 2016 and later). It should compile to .NET 3.5 just fine if you need to install on an earlier version of SQL Server and changing out some things like XElement with XmlDocument could bring it all the way down to .NET 2.0 if needed in SQL Server 2005.

Creating the function

To create the function, follow these steps using files found in the attached ClrHttpRequest.zip:

  1. Copy ClrHttpRequest.dll to C:\ on the machine running the SQL Server instance

  2. Run code in clr_http_request.sql to create the function (tweaking as necessary to adjust which database should be used).

    Note: This script includes code to change configuration of the SQL Server instance by enabling the CLR integration.

That’s it! Should be ready to use now.

Conclusion

I've found this to be an invaluable tool in quickly getting to data and exploring APIs. Depending on needs, something like SSIS will still be a superior tool, but for many needs this may provide a simpler alternative by reducing the tools and skillsets needed to create data integrations.

Resources

Rate

4.58 (12)

You rated this post out of 5. Change rating

Share

Share

Rate

4.58 (12)

You rated this post out of 5. Change rating