Connecting ADF to take information from sharepoint and from Confluence

  • Dears,

    Hope this message finds you well

    Using Azure Data Factory,

    Issue 1:

    I want to connect to Confluence to extract information from confluence pages into HTML format and place them into a folder in my landing zone (Azure datalake).

    Confluence Cloud REST API via ADF HTTP or REST Connector

    What do you think is the best approach and why?

    Issue 2:

    I want as well to connect to sharepoint to extract the binary files (word and pdf files) and place then, in their original format, into a folder in my landing zone (Azure datalake ).

    For that I am thinking in using

    Azure Data Factory + HTTP Linked Service to SharePoint REST

    or

    Microsoft Graph API via ADF Web Activity

    What do you think is the best approach and why?

    Thank you very much,

    Pedro

     

  • Thanks for posting your issue and hopefully someone will answer soon.

    This is an automated bump to increase visibility of your question.

  • First, I'll say I've not done this before, but wanted to give my opinion on how I'd approach this problem.

    I'd say test it out. For issue 1, I'd use the built-in Confluence connector if it exists. Generally, when a connector is built for a specific application, it is optimized for that application. So using that connector will usually give you better options and sometimes even better performance. I also find that using the built-in connectors often simplifies things as it will provide you with options for the different REST calls without needing to look up the API documentation. Now, if you have some advanced requirements, then you may need to use the REST API if some options are missing from the built-in connector.

    BUT for issue 1 - I would say pick an interesting subsection of your confluence install and try both methods and see which you prefer and which works better for your specific use case.

    For issue 2, my personal opinion, this is a bad idea unless it is required for some business need. The reason it is a bad idea is that you now have 2 copies of the files. If they get out of sync, I expect the Sharepoint one would be the master file, but if you aren't syncing the data constantly, someone will be looking at stale data and be surprised it's not real time. But if this is a requirement for what you are doing, like my opinion on issue 1, I would test both methods. Try them out in your environment with an interesting subset of data.

     

    With both issues, I think it's a bad idea to duplicate data like this. It can lead to a few different problems that I can think of. Sync errors will result in the ADF copy of the data being stale. Mind you, the data in ADF will always be stale as the "live" data may be constantly changing. Plus you can get issues with data modification while the sync occurs.

    One thing that may be helpful to know is what is the end goal for this? Is it a backup solution for SharePoint and Confluence? If so, I would advise against this. Each update to SharePoint and Confluence will require re-visiting and testing your ADF stuff as they MAY have made changes to the API calls that could break your work.

    I would also recommend doing this as 2 separate projects so you can ensure that everything works as expected and you know when one is complete and can move onto the next.

    Also - one thing to watch out for with Confluence is links. Any pages that link to a Confluence page MAY (likely will) point back to your confluence site NOT your ADF datalake. And any file references that are attached to a confluence page or addons in Confluence (links to jira tickets for example, or some graphics and diagrams) MAY (likely will) break in the ADF data lake. Of the 2, I think that SharePoint will be easier but also still a lot of work. Plus with the SharePoint stuff you will lose access control when you move it to ADF, so there is a business risk there too...

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • I am not sure Azure Data Factory is the best approach for something like this.  I would look at Logic Apps or Azure Functions to implement this instead.  I know Logic Apps has a connector for SharePoint, not sure about Confluence but can use the API.

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply