extract number from string

  • I am able to extract a number from a string but there is a case where it is not pulling the data correct. I want to pull the entire number with or without dashes.

    It is extracting the number correct from the string below when there is a dash in the number, but when there is no dash it does not pull the entire number:

    Contrato Promocional 2010-241772 Descrico CDR TD

    Contrato Promocional 2010 241408 Perlodo de Compr

    results:

    2010-241772

    2010

    code:

    SELECT *, Substring(FullText,PATINDEX('%Contrato%',FullText), 50) as Contract2,

    Left(SubString(Substring(FullText,PATINDEX('%Contrato%',FullText), 50), PatIndex('%[0-9.-]%', Substring(FullText,PATINDEX('%Contrato%',FullText), 50)), 50), PatIndex('%[^0-9.-]%', SubString(Substring(FullText,PATINDEX('%Contrato%',FullText), 50), PatIndex('%[0-9.-]%', Substring(FullText,PATINDEX('%Contrato%',FullText), 50)), 8000) + 'X')-1) as Contract3 ,

    Substring(FullText,PATINDEX('%Fornecedor%',FullText), 100) as VendorNumber1

    FROM [Sonae].[dbo].[Images_Local]

    where [FULLTEXT] is not null

    AND [FULLTEXT] Like '%Contrato%'

    I also want to pull the numbers out and just have the vendor name from the string below into Vendor1 (see code above)

    The data in that column looks like this:

    Fornecedor 14105 - SASISUNG ELECTRONICA PORTUGUESA SA Promocao Moeda EUR - EURO Negociador 1-!ENR

    Fornecedor PromocAo MODELO CONTINENTE HIPERMERCADOS, SA 14105 - SAMSUNG ELECTRONICA PORTUGUESA SA

    Fornecedor 14628 - LG ELECTRONICS PORTUGAL,SA Moeda EUR - EURO Presença MODELO CONTINENTE HIPERME

    Fornecedor PANASONIC IBERIA S.A. - SUCURSAL PORTUGAL Mo contempla financeiro nem rappel. Ambos os

    I want to these results into 2 columns:

    VendorNumber1 VendorName1

    14105 SASISUNG ELECTRONICA PORTUGUESA SA

    14105 SAMSUNG ELECTRONICA PORTUGUESA SA

    14628 LG ELECTRONICS PORTUGAL,SA

    N/A PANASONIC IBERIA S.A.

  • SELECT Fulltext, Contract2,

    Substring(Contract2, Contract3_Start, Contract3_Length) AS Contract3

    FROM (

    SELECT 'Contrato Promocional 2010-241772 Descrico CDR TD' AS Fulltext UNION ALL

    SELECT 'Contrato Promocional 2010 241408 Perlodo de Compr'

    ) AS test_data --[Sonae].[dbo].[Images_Local]

    cross apply (

    SELECT Substring(FullText, PATINDEX('%Contrato%',FullText), 50) as Contract2

    ) as assign_alias_names_1

    cross apply (

    SELECT PatIndex('%[0-9.-]%', Contract2) as Contract3_Start

    ) as assign_alias_names_2

    cross apply (

    SELECT PatIndex('%[^ 0-9.-]%', Substring(Contract2 + 'X', Contract3_Start, 8000)) - 2 as Contract3_Length

    ) as assign_alias_names_3

    WHERE [FULLTEXT] is not null

    AND [FULLTEXT] Like '%Contrato%'

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Your too quick for me... I just edited my original post. Please see new requirement.

    thanks

  • I did part of the job for you. You should be able to complete the rest.

    CREATE TABLE table1(

    Col1 varchar(8000))

    INSERT INTO table1

    VALUES

    ('Fornecedor 14105 - SASISUNG ELECTRONICA PORTUGUESA SA Promocao Moeda EUR - EURO Negociador 1-!ENR'),

    ('Fornecedor PromocAo MODELO CONTINENTE HIPERMERCADOS, SA 14105 - SAMSUNG ELECTRONICA PORTUGUESA SA '),

    ('Fornecedor 14628 - LG ELECTRONICS PORTUGAL,SA Moeda EUR - EURO Presença MODELO CONTINENTE HIPERME'),

    ('Fornecedor PANASONIC IBERIA S.A. - SUCURSAL PORTUGAL Mo contempla financeiro nem rappel. Ambos os')

    SELECT Vendor

    FROM table1

    CROSS APPLY ( SELECT REPLACE( STUFF( Col1, 1, PATINDEX( '%[0-9]%', Col1), ''), 'Fornecedor ', '') String) AS something

    CROSS APPLY ( SELECT LEFT( String, MIN(Position)) Vendor

    FROM (VALUES(NULLIF( PATINDEX( '%[^a-zA-Z]SA[^a-zA-Z]%', String + ' '), 0) + 2),

    (NULLIF( PATINDEX( '%[^a-zA-Z]S.A.[^a-zA-Z]%', String + ' '), 0) + 4))x(Position) ) s1

    GO

    DROP TABLE table1

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Thanks, this is very helpful

  • When I changed the fourth line to point to the Table/column it gives me an error msg. I forgot to mention, FullText has a lot more text in the column. I ran OCR on multipage tif's and put the text in the FullText column.

    Msg 537, Level 16, State 5, Line 2

    Invalid length parameter passed to the LEFT or SUBSTRING function.

    SELECT Fulltext, Contract2,

    Substring(Contract2, Contract3_Start, Contract3_Length) AS Contract3

    FROM (

    SELECT Fulltext From [Sonae].[dbo].[Images_Local]

    ) AS test_data

    cross apply (

    SELECT Substring(FullText, PATINDEX('%Contrato%',FullText), 50) as Contract2

    ) as assign_alias_names_1

    cross apply (

    SELECT PatIndex('%[0-9.-]%', Contract2) as Contract3_Start

    ) as assign_alias_names_2

    cross apply (

    SELECT PatIndex('%[^ 0-9.-]%', Substring(Contract2 + 'X', Contract3_Start, 8000)) - 2 as Contract3_Length

    ) as assign_alias_names_3

    WHERE [FULLTEXT] is not null

    AND [FULLTEXT] Like '%Contrato%'

  • SELECT Fulltext, Contract2,

    Substring(Contract2, Contract3_Start, Contract3_Length) AS Contract3

    FROM (

    SELECT Fulltext From [Sonae].[dbo].[Images_Local]

    ) AS test_data

    cross apply (

    SELECT Substring(FullText, PATINDEX('%Contrato%',FullText), 50) as Contract2

    ) as assign_alias_names_1

    cross apply (

    SELECT PatIndex('%[0-9.-]%', Contract2) as Contract3_Start

    ) as assign_alias_names_2

    cross apply (

    SELECT CASE WHEN Contract3_Start <= 2 THEN 0 ELSE

    PatIndex('%[^ 0-9.-]%', Substring(Contract2 + 'X', Contract3_Start, 8000)) - 2 END as Contract3_Length

    ) as assign_alias_names_3

    WHERE [FULLTEXT] is not null

    AND [FULLTEXT] Like '%Contrato%'

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • It'll be interesting when an actual vendor name also contains dashes. 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (10/14/2014)


    It'll be interesting when an actual vendor name also contains dashes. 😛

    In that case, presumably we would search first for a digit only, "%0-9%", then do the search for digits with dashes, etc. after that ;-).

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • This is data I OCR'd and The contract number can show up multiple times in the file. It is a multi-page tif image. Sometimes the contract number is crossed out or not readable on the first page, so in my data it is not pulling the entire contract. I've seen tif images where the contract is readable on the second or third page and it is coming in fine in the FullText column, but the Patindex is searching and finding the first "%Contrato%". If the first contract number is not readable and only brings in say 4 chars or less, how do I make it search for the next contract to pull in the number from the next position in the text?

  • Do you have some type of id or other unique/key value on the row?

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • there is a column called ID, which is a unique column.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply