Deduplication of Photo and Finger Print Data

  • Hi friends

    In one of our project there is a requirement of finding duplicates among enrolled photo and bio-metirc data (finger print) , both Photo and Finger Print is availble as SQL Server Tables.

    The volume is pretty large , (arround 1000000*4=1000000 ).

    Could some of you help me to find out a soltion to this deduplication issue

    Thanking you

    regards

    john

  • are they real duplicates, as in the image is in the database twice, or two different images of the the same print?

    if they are the same image twice, then you could use binary_checksum to check for possible duplicates;

    then if you group by the binary_checksum having count >1, those that appear in the SELECT might be duplicates...but it's not guaranteed.

    select * from (

    select row_number() over(partition by binary_checksum(yourimage) order by 1) AS RW,*

    from yourtable) MyAlias

    where RW >1

    SELECT * FROM yourtable

    WHERE binary_checksum(yourimage) in

    (select binary_checksum(yourimage)

    from yourtable

    group by binary_checksum(yourimage)

    having count(*) > 1)

    i've written an outside program to get the filesize and the CRC of any file/image on disk, and then put that info into a database in order to find duplicates...that has worked for me, and been 100% accurate in the past, as it seems the combo of size and CRC is a perfect indicator of duplications.

    maybe the datalength() of the image and the checksum would be an excellent indicator for IMAGE datatypes

    if it is two different images, that might be the same, then you'd have to use some kind of algorythm related to identifying the significant points for each print, stick that in the same row of each image, and then group by that similar to the checksum method above...not 100% accurate, but a good indicator.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • I think Lowell has some good ideas. I'd use the checksum to get an initial list, but then I'd have some program run through and compare the binary data to be sure it's really the same.

  • Hi

    Friedns the check sum comparison is a good idea ,

    But my problem is diffrent and so it is posted under NOT In SQL Section

    See the data is comming through an Enrollment program ,

    Capturing Photo finger print as wel as Finger print ,

    My requuirement is to identify accedental or intentional Duplication in enrollment , I think some third party engines are there ,those engines supposed to mach facial and finger images ,

    I am a pure database guy and so familiar with those ideas

    regards

    john

  • I'm guessing you are just coming on board with a shop that does this? I would say they must have that software in their shop already; I would lean more towards calling sister or similar shops and asking what they used...learning by others experience.

    Both types of recognition software are going to do this: analyze the image, and come up with a value, or number of values, which can be placed in the database.

    Once that value is in the database, that is what is used for checking for best matches...

    i googled "fingerprint recognition software" and found lots of software, one which claims scan 40K fingerprints a second. you could google similarly for facial recognition software.

    heres one of the first links to free versions:

    http://www.freedownloadmanager.org/downloads/fingerprint_recognition_software/

    so for example, say some software produces a has for my fingerprint, along with some other high level data, arch vs swirl type fingerprints for example. saving that along side a difital image of the print, and repeating for all the prints in your database, would give you the raw data you could use to check for possible dupes whenever a new record is inserted.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply