Troubleshooting a pipeline error

  • I am creating a pipeline in a data factory per this page's instructions, https://docs.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-change-data-capture-feature-portal. I am a beginner with this and am trying to follow the instruction as closely as possible. The change, in my case, is that I am using a VM SQL installation, also in Azure, but not a managed instance. I have successfully "previewed" the data from the Lookup object, and it returns a value of 3, for the number of rows that have changed.

    I am at step 8 of the "Create a pipeline to copy the changed data" section, run in debug mode, and when I run the pipeline I get a message that the pipeline failed. No useful troubleshooting information at all. As I was gathering info for this post I viewed the code of the pipeline, cancelled out of that, then attempted to run the pipeline again in debug mode. Now I get "HasChangedRows

    The output of activity 'GetChangeCount' can't be referenced since it is either not an ancestor to the current activity or does not exist"

    Where do I go from here?

    My code:

    {
    "name": "IncrementalCopyPipeline",
    "properties": {
    "activities": [
    {
    "name": "GetChangeCount",
    "type": "Lookup",
    "dependsOn": [],
    "policy": {
    "timeout": "7.00:00:00",
    "retry": 0,
    "retryIntervalInSeconds": 30,
    "secureOutput": false,
    "secureInput": false
    },
    "userProperties": [],
    "typeProperties": {
    "source": {
    "type": "SqlServerSource",
    "sqlReaderQuery": "DECLARE @from_lsn binary(10), @to_lsn binary(10); \nSET @from_lsn =sys.fn_cdc_get_min_lsn('dbo_Acknowledgement_Timeliness'); \nSET @to_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal', GETDATE());\nSELECT count(1) changecount FROM cdc.fn_cdc_get_net_changes_dbo_Acknowledgement_Timeliness(@from_lsn, @to_lsn, 'all')",
    "queryTimeout": "02:00:00",
    "partitionOption": "None"
    },
    "dataset": {
    "referenceName": "ZTWSHRDSQL08C_TestReplicationToDataLake",
    "type": "DatasetReference"
    }
    }
    },
    {
    "name": "HasChangedRows",
    "type": "IfCondition",
    "dependsOn": [],
    "userProperties": [],
    "typeProperties": {
    "expression": {
    "value": "@greater(int(activity('GetChangeCount').output.firstRow.changecount),0)",
    "type": "Expression"
    },
    "ifTrueActivities": [
    {
    "name": "Wait1",
    "type": "Wait",
    "dependsOn": [],
    "userProperties": [],
    "typeProperties": {
    "waitTimeInSeconds": 1
    }
    }
    ]
    }
    }
    ],
    "annotations": []
    }
    }

    Thanks,

    Chris

    Learning something new on every visit to SSC. Hoping to pass it on to someone else.

  • Finally stumbled across the solution. You have to connect the objects for with a workflow, something that the documentation on the MS page did not mention. Now to work through the rest of it.

    Microsoft: Halfway on every solution.

     

    Chris

    Learning something new on every visit to SSC. Hoping to pass it on to someone else.

  • Thanks, how did you couple it with a workflow?

  • Amazing... you had to write all of that for a pipeline?  Guess I'm glad I don't have to work with such a thing.

    Since I don't know about pipelines, would it have been possible to have your code call a stored procedure instead of inlining the code like you did?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • No, not much of it was written. Just parameter values had to be entered. This is the code behind the GUI. Kind of like an SSIS GUI that generates JSON code for you. I am a beginner at this and do not know how it will act returning record sets from procedures, so I'm just going basic at this time. A better design may be thought of as I get more experience. How this acts, nothing like SSIS, is difficult enough to deal with right now. It's taking a lot of googling to find out how to make the pieces work.

    I connected the objects by clicking on the little green tab of a given object and dragging it onto the next object.

     

    Thanks,

    Chris

    • This reply was modified 4 months, 2 weeks ago by  Stamey.

    Learning something new on every visit to SSC. Hoping to pass it on to someone else.

  • Jeff Moden wrote:

    Amazing... you had to write all of that for a pipeline?  Guess I'm glad I don't have to work with such a thing.

    Since I don't know about pipelines, would it have been possible to have your code call a stored procedure instead of inlining the code like you did?

     

    Yes you can call stored procedures to get or save data.  Data Factory pipelines are typically built in the web gui so a lot is drag/drop and then specify the parameters.  You can get into the json behind it if you want but need to be careful editing.  There are cases I have found it quicker to edit the json.  Fortunately we have all of ours connect to a git repo so if you err badly you can get back to where you were.

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply