Multipe letters into one

  • All,

    I have read and asked lot of questionss about "Multiple space into one" here. I just thought the below scenario which came suddenly in my mind.

    Incase if the user entered wrongly or if the file contain wrong data as below

    kaarthi

    jefff moden

    steevve

    gillla

    barrry youuung

    mattt milller

    Joeee celkko

    lowweel

    maaark

    biiit buucket

    graant fritchey

    phhhill factor

    ChriiisM

    We all know their names. 🙂

    Expected Output:

    karthi

    jef moden

    steve

    gila

    rbary young

    mat miler

    Joe celko

    lowel

    mark

    bit bucket

    grant fritchey

    phil factor

    ChrisM

    For no, I just removed all the multiple letters into one. But this may also cause some issue if some one has really two letters in their name.

    I think we have to handle this one too.

    karthik

  • Actually, I don't recommend handling this at all. It will break properly spelled names.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (10/1/2012)


    Actually, I don't recommend handling this at all. It will break properly spelled names.

    +1


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Jeff Moden (10/1/2012)


    Actually, I don't recommend handling this at all. It will break properly spelled names.

    +1

    Just because you can do something doesn't mean you should. Doing this sort of data validation within the database will only result in a mess. Even if you have a valid list of names to check against and write the code to "figure out" what they really meant, the chance would remain that two distinct names could be identical after all duplicate letters are removed.

    Furthermore, it's not a good architectural philosophy. The application that is the transit between the human who can't spell and the database is the appropriate location for the validation. That is where the 'did you really mean X?' conversation can reliably occur. It is then the database's job to store that data and retrieve it on demand.

    └> bt



    Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • +1

    GIGO:w00t:

  • While I agree its not a practical solution, as there are names (Yvonne, Pattinson, Lee, off the top of my head) where consecutive characters can be the same and so you will be destroying potentially valid names

    However, I can see its use as a test question in a technical interview for an SQL developer just to see how they think and solve a complex SQL problem.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • Jason-299789 (10/2/2012)


    While I agree its not a practical solution, as there are names (Yvonne, Pattinson, Lee, off the top of my head) where consecutive characters can be the same and so you will be destroying potentially valid names

    However, I can see its use as a test question in a technical interview for an SQL developer just to see how they think and solve a complex SQL problem.

    I try not to ask "oolies" during an interview. Asking improbable questions just ticks off the good ones and makes them think the interviewer is just showing off. Ask practical questions. There are plenty of them.

    If you want to break the ice on "the next level" of T-SQL programming, explain how important counting is in T-SQL and then ask them to write a script that will count from 1 to 100. You'll be amazed at how many people still resort to a While Loop or a (gasp!) Recursive CTE.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • As always a Fair point Jeff.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • I do agree all your points. I just thought this one while reading/replying to my another post. As all of you mentioned, it will break the original name into some useless one. I wanted to know the different opinion on this topic from the sql masters. Thats all the intention 🙂

    Jeff,

    Coming back to your question, count 1 to 100,

    select sum(n) from tally where n < 101

    am i right 🙂

    karthik

  • karthik M (10/2/2012)


    Jeff,

    Coming back to your question, count 1 to 100,

    select sum(n) from tally where n < 101

    am i right 🙂

    How about:

    SELECT 50*101

    instead?

    Seriously, I don't think that was the answer to Jeff's question, but if it was mine's probably faster. 😀


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Karthick,

    Jeff was making valid point about about how to create the Tally table in the first place, most people will write a while loop, or a recursive CTE rather than using a cross join between two of the sys tables in the db.

    I suspect Jeff wants to see the developer do the cross join rather than the CTE or a While loop. 😀

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • yeah..n(n+1)/2..

    karthik

  • karthik M (10/2/2012)


    Coming back to your question, count 1 to 100,

    select sum(n) from tally where n < 101

    am i right 🙂

    Apparently you didn't run your own code to test to see if it does the job, Karthik.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jason-299789 (10/3/2012)


    Karthick,

    Jeff was making valid point about about how to create the Tally table in the first place, most people will write a while loop, or a recursive CTE rather than using a cross join between two of the sys tables in the db.

    I suspect Jeff wants to see the developer do the cross join rather than the CTE or a While loop. 😀

    Or a cascading cross-joined CTE Itzik Ben-Gan style. Or a SELECT from a Tally table.

    Bonus points if the Developer says, "It Depends" and then demonstrates and explains all 3 without further prompting. Even more points for the tangent of using IDENTITY vs ISNULL(ROW_NUMBER()) on the creation of a physical Tally Table.

    If you know this stuff, you should be able to talk about it spontaneously including some pros and cons. If you've only memorized it, you might only come up with the cross join and offer no extra information without me having to ask a thousand questions to find out what you know.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I've answered the counting question before. I didn't get the job in the end, but I suspect it was due to experience rather than interview technique (when I asked them if there was anything specific that had stopped them from hiring me, I didn't get an actual answer). Anyway, this is how I answered the counting question: -


    That depends on whether or not you have a numbers or tally table already defined. If you do then: -

    --== TALLY ==--

    SELECT N

    FROM Tally

    WHERE N > 0 AND N <= 100;

    Which would do an index seek on what would hopefully be an index with a fill factor of 100 since the data in the table should be static.

    If there isn't already a numbers or tally table defined, I'd prefer to add one as they can be extremely useful in solving lots of different issues.

    If, for one reason or another, adding a tally or numbers table is not possible then we could do it in a few ways.

    The most common answer is probably to do a loop. This is also the worst performing solution, as SQL is designed to be set-based. However, in the interests of completeness: -

    --== WHILE LOOP ==--

    DECLARE @HOLDER TABLE (N INT);

    DECLARE @COUNTER INT = 0;

    WHILE @COUNTER < 100

    BEGIN

    SET @COUNTER = @COUNTER + 1;

    INSERT INTO @HOLDER

    SELECT @COUNTER;

    END

    SELECT N

    FROM @HOLDER;

    The next way to do it would be a recursive CTE. I've found these to be as bad as loops for performance scaling, so I wouldn't generally advise that they be used. Again, in the interest of completeness: -

    --== RECURSIVE CTE ==--

    WITH CTE(N) AS (

    SELECT 1

    UNION ALL

    SELECT N+1

    FROM CTE

    WHERE N+1 <= 100)

    SELECT N

    FROM CTE;

    Finally, we could use a cross join method to get the cartesian product of a set of 10, then use the row number function. Either we could use the method that has been made popular by Ben-Gan: -

    --== CASCADING CROSS JOIN ==--

    WITH CTE(N) AS (SELECT 1 FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1))a(N)),

    CTE2(N) AS (SELECT 1 FROM CTE X, CTE Y)

    SELECT ROW_NUMBER() OVER(ORDER BY(SELECT NULL))

    FROM CTE2;

    Or, since we know it's only 100, we could just do a cross join on two sets of 10: -

    --== CROSS JOIN ==--

    SELECT ROW_NUMBER() OVER(ORDER BY(SELECT NULL))

    FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1))a(N)

    CROSS JOIN (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1))b(N);

    So, my preference would be a numbers or tally table, but if I can't use one then I'd go with a cross join method.


    Looking back, perhaps I should've gone into the creation of a tally table. Never mind, if I'd got that job then I wouldn't have my current position which I'm enjoying a lot 😀


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Viewing 15 posts - 1 through 15 (of 40 total)

You must be logged in to reply to this topic. Login to reply