Unwanted parameter truncation

  • pdanes (12/19/2012)


    Grant Fritchey (12/19/2012)


    Whatever your data structures are, so should your parameters be, both in data type and length. What do the structures look like?

    Well, the data structure may be a varchar(1000) containing, for instance, the locality where a particular fossil specimen was found. I may want to find all specimens that were found somewhere in Germany. My parameter then might look like 'Germany|Deutschland|GDR|Nemecko'. I would parse the parameter to generate a WHERE clause like

    WHERE Locality Like '%Germany%' OR Locality Like '%Deutschland%' OR Locality Like '%GDR%' OR Locality Like '%Nemecko%'

    The particular parameter that gave me my latest ulcer was:

    '1E,D,E,Es,F,Fs,G,Gs,H,Hs,J,K,Ks,L,Ls,Lc,M,Me,N,Ns,O,Os,Ob,Oc,P,Ps,Pa,Pb,Pc,R,Ra,S,Ss,Sb,Sc,T,Ts,Ta,Tb,Tc,Ua,Ub,Uc,Va,Z,Zs'

    It gets parsed into a table variable that then gets matched against another table. This combination of letters is exhaustive - there are 45, and extremely unlikely to be any more, at least any time soon. They're title codes of historical catalogs, and not one has been added in the fifteen years I've been working with this institution. So, 45 * 2 = 90, and some are only one letter instead of two, varchar(100) should be safe. Except for the commas... Varchar(max), or at least varchar(4000) would have completely prevented a whole bunch of frustration. So would have not making the mistake of overlooking the commas - if you know some trick for preventing stupid mistakes, I'd certainly welcome that.

    Yikes! Performance is going to be a horror doing stuff like that. That's exactly what a normalized structure is for. But if you have to deal with that stuff, then I'm back to the same place, if the structure is a VARCHAR(4000), so should the parameters be. But man, you really are going to be looking at HUGE performance issues parsing lots of strings like that, especially for searches.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • pdanes (12/19/2012)


    GilaMonster (12/19/2012)


    I'm going to be unpopular here but...

    Dependency analysis?

    Regression testing?

    Unit tests?

    Sure, making all parameters and variables varchar(max) would work. Shall we make all table columns varchar max too (if not, then you might get silent truncation inserting into a table)?

    Come on, Gail, I'm serious.

    Err.. So am I.

    If I'd ever submitted code into production that broke because I didn't change the length of variables in a dependent procedure, I'd have spent a good portion of the next day explaining to the change control manager and my manager why I managed to miss a dependency in my analysis, why my testing hadn't caught the error and what I was going to do to ensure that such a mistake didn't happen again.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (12/19/2012)


    pdanes (12/19/2012)


    GilaMonster (12/19/2012)


    I'm going to be unpopular here but...

    Dependency analysis?

    Regression testing?

    Unit tests?

    Sure, making all parameters and variables varchar(max) would work. Shall we make all table columns varchar max too (if not, then you might get silent truncation inserting into a table)?

    Come on, Gail, I'm serious.

    Err.. So am I.

    If I'd ever submitted code into production that broke because I didn't change the length of variables in a dependent procedure, I'd have spent a good portion of the next day explaining to the change control manager and my manager why I managed to miss a dependency in my analysis, why my testing hadn't caught the error and what I was going to do to ensure that such a mistake didn't happen again.

    This isn't a production system, it's my development system, and my testing IS what caught the mistake. I was hoping for some advice from more experienced people on how to keep from making such mistakes in the first place, or to detect them sooner, especially mistakes like this, where the software doesn't warn you about a possible problem.

    And not all of us are fortunate enough to have such a structured management system and megabucks software to hold our hands. I am the entire IT department for this show, developer, DBA, production, hardware selection and maintenance - everything having anything to do with computers lands in my lap.

    And please don't tell me that "Shall we make all table columns varchar max too" was a serious suggestion.

  • Grant Fritchey (12/19/2012)


    pdanes (12/19/2012)


    Grant Fritchey (12/19/2012)


    Whatever your data structures are, so should your parameters be, both in data type and length. What do the structures look like?

    Well, the data structure may be a varchar(1000) containing, for instance, the locality where a particular fossil specimen was found. I may want to find all specimens that were found somewhere in Germany. My parameter then might look like 'Germany|Deutschland|GDR|Nemecko'. I would parse the parameter to generate a WHERE clause like

    WHERE Locality Like '%Germany%' OR Locality Like '%Deutschland%' OR Locality Like '%GDR%' OR Locality Like '%Nemecko%'

    The particular parameter that gave me my latest ulcer was:

    '1E,D,E,Es,F,Fs,G,Gs,H,Hs,J,K,Ks,L,Ls,Lc,M,Me,N,Ns,O,Os,Ob,Oc,P,Ps,Pa,Pb,Pc,R,Ra,S,Ss,Sb,Sc,T,Ts,Ta,Tb,Tc,Ua,Ub,Uc,Va,Z,Zs'

    It gets parsed into a table variable that then gets matched against another table. This combination of letters is exhaustive - there are 45, and extremely unlikely to be any more, at least any time soon. They're title codes of historical catalogs, and not one has been added in the fifteen years I've been working with this institution. So, 45 * 2 = 90, and some are only one letter instead of two, varchar(100) should be safe. Except for the commas... Varchar(max), or at least varchar(4000) would have completely prevented a whole bunch of frustration. So would have not making the mistake of overlooking the commas - if you know some trick for preventing stupid mistakes, I'd certainly welcome that.

    Yikes! Performance is going to be a horror doing stuff like that. That's exactly what a normalized structure is for. But if you have to deal with that stuff, then I'm back to the same place, if the structure is a VARCHAR(4000), so should the parameters be. But man, you really are going to be looking at HUGE performance issues parsing lots of strings like that, especially for searches.

    The Germany example is a little extreme - most of the stuff I work with is much tidier than that. But what about the second example? I"m matching against a varchar(2) field, but my parameter string is 123 characters long. And that works very nicely, BTW. I use a table function to break up the string, then join to the resulting table variable. Response time is pretty much zero.

  • I would say that in the case where a parameter is passed to the stored proc that does not match the size of a particular column (i.e. a delimited string), I would set it to something very large or have the app validate it and set the size accordingly. Basically, make the app aware of the limitation on the parameter. Personally, I may just use a VARCHAR(MAX) because this is not getting stored as a column in a table. However, as a matter of best design I would have the app parse this and pass it to the DB as a table with a VARCHAR(2) column.

    Jared
    CE - Microsoft

  • SQLKnowItAll (12/19/2012)


    I would say that in the case where a parameter is passed to the stored proc that does not match the size of a particular column (i.e. a delimited string), I would set it to something very large or have the app validate it and set the size accordingly. Basically, make the app aware of the limitation on the parameter. Personally, I may just use a VARCHAR(MAX) because this is not getting stored as a column in a table.

    Certainly, I would never use varchar(max) in a table design. (Well, maybe never - I can't think of anything where I would want to do something like that, but you never know.) But doesn't the argument against max apply to variables as well? I don't remember all the details of the discussion, but the gist was that max is evil pretty much anywhere. If not, I may just go back to using it for text parameters, at least in the initial design stages. I can always go back and shorten it later, when I know roughly how much data to expect.

    However, as a matter of best design I would have the app parse this and pass it to the DB as a table with a VARCHAR(2) column.

    Don't you get a performance hit from passing a table over the network, rather than a single string? And I think the server would parse it a lot faster than the user's machine.

  • It really depends on what you are doing with it. When the parameter is passed in and is 9000 characters, where does it get stored? It is not like it is going into a column of VARCHAR(MAX) that will now be forced (if it hasn't already) to store this column on a separate page for the table. This is a string, a variable, a piece of information stored on its own. I can't find anything offhand, but I'm sure Gail would be able to explain where this initial string gets stored and how SQL treats it internally (i.e. does it create a table of 1 column to store it in, does it simply reside in the buffer pool spanning multiple buffers, outside the buffer pool, etc.). In this case it is being changed to a form that is set based. I don't think the MAX general guidelines apply here. Of course, I have been wrong before... Again, I would not pass this to a stored proc if I was in the initial design phase. I would parse it with the app.

    Jared
    CE - Microsoft

  • Don't you get a performance hit from passing a table over the network, rather than a single string? And I think the server would parse it a lot faster than the user's machine.

    The amount of data going through the tunnel is essentially the same, if not less (no more commas). SQL Server can be very efficient at parsing delimited data in some cases, but I can;t say I have ever seen it be faster than a simple C# function call.

    Jared
    CE - Microsoft

  • pdanes (12/19/2012)


    Don't you get a performance hit from passing a table over the network, rather than a single string? And I think the server would parse it a lot faster than the user's machine.

    The server may be faster than the user's machine, but then the server is serving multiple users, the user's machine just one.

    I'd also recommend parsing at the client and using a table-valued parameter, if at all possible. Strongly typed, not going to run over any particular string size, not requiring any additional processing on the server (it's a table variable on the server)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • You could send a fixed terminating character on the end of your variable parameters and check to see if it is actually the last character in the parameter. If all is well, just replace() it with '' and proceed.

    Or you could check the datalength() of the parameter and if it is exactly the max length of the variable, that could send up a red flag (given that the variable should be oversized to start with).

    Or pass in the parameter length as a separate parameter.

    All of these will slow your proc, of course, or cause some additional work on the front end.


    And then again, I might be wrong ...
    David Webb

  • SQLKnowItAll (12/19/2012)


    Don't you get a performance hit from passing a table over the network, rather than a single string? And I think the server would parse it a lot faster than the user's machine.

    The amount of data going through the tunnel is essentially the same, if not less (no more commas). SQL Server can be very efficient at parsing delimited data in some cases, but I can;t say I have ever seen it be faster than a simple C# function call.

    True, no more commas, but doesn't a table carry some overhead? Probably not much, though, and a VBA Split call is pretty quick. Shoveling the array into a table would be a little more work, but it shouldn't be much. I've been trying to offload as much work onto the server as possible, not only for speed, but also with idea of getting more practice at T-SQL coding. But maybe I should look more at what fits well, even if it might not be the most efficient way possible. Certainly, spending days chasing stupid coding mistakes is not a model of peak efficiency.

  • GilaMonster (12/19/2012)


    pdanes (12/19/2012)


    Don't you get a performance hit from passing a table over the network, rather than a single string? And I think the server would parse it a lot faster than the user's machine.

    The server may be faster than the user's machine, but then the server is serving multiple users, the user's machine just one.

    It's extremely lightly loaded now, but I do expect we'll be putting more stuff on it soon - should probably be thinking more about that.

    I'd also recommend parsing at the client and using a table-valued parameter, if at all possible. Strongly typed, not going to run over any particular string size, not requiring any additional processing on the server (it's a table variable on the server)

    That's probably a better way to go for stuff that ends up in table form anyway. It won't solve the problem of string parameters, but it will at least reduce the number of cases when they get used, which just may preserve a little of my sanity. Thanks.

  • pdanes (12/19/2012)


    GilaMonster (12/19/2012)


    pdanes (12/19/2012)


    Don't you get a performance hit from passing a table over the network, rather than a single string? And I think the server would parse it a lot faster than the user's machine.

    The server may be faster than the user's machine, but then the server is serving multiple users, the user's machine just one.

    It's extremely lightly loaded now, but I do expect we'll be putting more stuff on it soon - should probably be thinking more about that.

    I'd also recommend parsing at the client and using a table-valued parameter, if at all possible. Strongly typed, not going to run over any particular string size, not requiring any additional processing on the server (it's a table variable on the server)

    That's probably a better way to go for stuff that ends up in table form anyway. It won't solve the problem of string parameters, but it will at least reduce the number of cases when they get used, which just may preserve a little of my sanity. Thanks.

    Of course... You could use Jeff's splitter function http://www.sqlservercentral.com/articles/Tally+Table/72993/

    Jared
    CE - Microsoft

  • David Webb-200187 (12/19/2012)


    You could send a fixed terminating character on the end of your variable parameters and check to see if it is actually the last character in the parameter. If all is well, just replace() it with '' and proceed.

    Or you could check the datalength() of the parameter and if it is exactly the max length of the variable, that could send up a red flag (given that the variable should be oversized to start with).

    Or pass in the parameter length as a separate parameter.

    All of these will slow your proc, of course, or cause some additional work on the front end.

    I've thought of some of those, but not the one about checking for an exact length as a warning. That might be the simplest to check in code.

    But I just though of another way to pass parameters. What about a separate, one-record table? That could be defined in SQL Server and made read-write accessible to the application. The application could load values into strongly typed fields, it WOULD get warnings about length overruns and the procedure could pass values back the same way (maybe using another such table, to keep the confusion down). It sort of feels like a global variable, which is often discouraged in procedural programming, but even there it can be a very useful tool, if used carefully. The table could even incorporate business logic, to keep the application from loading parameters that may be legitimate separately, but nonsense in certain combinations. I don't recall ever seeing that suggested anywhere, but this must have occurred to other people.

  • pdanes (12/19/2012)


    But I just though of another way to pass parameters. What about a separate, one-record table?

    Multiple users.

    Maintenance (removing unneeded rows, what if the proc fails before it does that?)

    Been there, done that, not quite as simple as it sounds.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 15 posts - 16 through 30 (of 52 total)

You must be logged in to reply to this topic. Login to reply