• Sachin Dedhia (1/5/2009)


    Not sure why the approach is RBAR approach? In almost all cases, ETL stuff would involve transformation that work on row level data.

    Being an RBAR code process is strictly separate from the fact that this may or may not be your fastest execution path for your given needs. I was not speaking to speed, but rather flexibility and re-useability. Your code component is fast, but not dynamic.

    An ETL plan doesn't have to process via SSIS one row at a time, nor one column at a time, especially if you are dealing with something that can and may change (such as number of course columns). It is usually more prudent to seek instead for an approach where one does not have to modify code and re-deploy to the package server to process the next file, or worse yet, the nasty scenario of having a different package for each source when all are doing the same type of task.

    If your incoming format will remain stable (a rare thing indeed), then it likely doesn't matter. However, if it can change, then ETL-ing into a database, and using the SQL engine to process the matches by making SSIS call the appropriate procedures, etc., is far more flexible, and still stays within the realm of SSIS ETL.

    I have also learned (the hard way) that the data is better kept for other purposes. Rare indeed are the projects I've come accross that serve an "enrichment" or "counts" only purpose (one example of enrichment is with data cleansing projects against a third-party tool, during importing and normalizing incoming lead data).

    Your case may be just such, however, so your mileage may vary. But in principle, the approach as it stands has custom code, making it flex only by brute code force, even for simple changes such as more (or less) columns of the same type, giving it a high probability for failure over time.

    Output (usually the fastest step in the process) doesn't preclude one appending the current file, writing a new one in-process, or even an SSIS task that stuffs data from the database to a file.

    I have to admit that dumping en-masse notes about all the courses in the row makes me shudder and question the process as a whole.

    I do think the article is useful however on several points, as your article was about the use of the component, and not necessarily the particular data process.