• Robert.Sterbal (8/27/2012)


    It seems like gene sequences often go in the thousands in lengths, and any change could be a mutation. 90% of all are genes are um... not used in encoding.

    The other part of this is that the genes (think of them as developer code) then spawn protein building (think of that as running code in production)

    I have a basic understanding of DNA/RNA, expression, and protein synthesis. Know the difference between myosis and mitosis (though I'm not actually sure I spelled either one correctly, I know what they are). But that's not enough to have any advice on coding and database options with regard to them.

    I'd probably use a breadcrumb hierarchy of one sort or another, just based on the idea of what-spawns-what. HierarchyID is essentially a breadcrumb, but encoded for use by some specific methods. If its depth limitations will work for the kind of data you're talking about, then it should be fine. But sequencing thousands of items, assumed to be AT/GC bonds, would mean a hierarchy thousands of levels deep, and the limits on the SQL Server HierarchyID datatype wouldn't allow that. Doesn't pair-1 of human chromosomes have hundreds of millions of pairs, all by itself?

    But, again, I'm not even sure that mapping AT/GC sequences is what "sequencing" means. I assume so, and that it would take 1 hierachy node per bond-pair, and those are assumptions I'm not qualified to make.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon