A Neural Network in SQL Server

  • sherifffruitfly

    Ten Centuries

    Points: 1198

    sknox (10/27/2009)On a serious note, sometimes programming isn't about performance, or even about successfully accomplishing a task, but about learning.

    "Learning" also encompasses choosing a more appropriate platform/language to implement desired functionality over a lesser platform/language for that task.

    Asking what benefits/detriments a particular choice of platform/language has for a given task is perfectly appropriate, even in a learning context.

  • Jonathan AC Roberts

    SSCoach

    Points: 17269

    sknox (10/27/2009)


    jacroberts (10/27/2009)


    I can see that you can do this using SQL server but I fail to see any benefit to using SQL Server rather than wirting some code using a normal computer language like C.

    Surely it is much slower using SQL Server?

    First, I take offense at your characterization of SQL as not a "normal computer language" -- unless, of course, you mean that it's above normal. Second, I laugh at your characterization of C as normal in any way. 😛

    On a serious note, sometimes programming isn't about performance, or even about successfully accomplishing a task, but about learning. From the introduction to the article, it's clear that Silvia wasn't trying to create the most efficient or useful neural network, but more interested in how two programming concepts (a relational database and a neural network) might be brought together.

    Also, bear in mind that many organizations can justify a SQL programmer where they can't justify a .NET or C programmer -- and they may benefit greatly from a neural network in SQL.

    Does anyone think SQL is a normal programming language??? It's non-procedural for a start. You seem to be very odd, being able to be offended by statements about SQL! And 'normal' doesn't mean good it was used as standard or more normally used for this type of application.

    C, and it's variants C++ and C# are probably the most widely used language in the world with everything from operating systems to applications written in it and even, dare I say, SQL Server which is probably written in C++.

  • cs_troyk

    SSCertifiable

    Points: 5381

    Nicely done and interesting article. I had actually considered writing a similar article, but you have beaten me to the punch 🙂

    I like your use of Analysis Services as a feature selection tool. This is a good way for someone interested in the concepts to get a jump start. I had a little difficulty interpreting your ER diagram as some of the lines are crossing. Would it be possible to see a picture of a cleaned-up version? David Hay has some excellent guidelines for diagramming conventions here: http://www.essentialstrategies.com/publications/modeling/makingrd.htm

    For those asking about performance/appropriateness of the implementation, my experience is that there are some types of predictive analytics processing that benefit greatly (both from performance and code maintenance standpoints) via implementation within SQL. Without seeing the full implementation, I'm not sure if this is such a case, but it's a good idea to think in this direction.

    TroyK

  • g.ciccotti

    SSC Enthusiast

    Points: 118

    Interesting post.

    Just a few suggestions:

    NNs works better, as someone said, with real numbers not with integers. NNs is a non-linear interpolation so the result will be spreaded in the range for that variable you want to predice.

    If you force the result to integer or worst to binary, you lost the richness of the prediction and obtain ambigous data. In the case you've presented the NN can give you back the PoP value in the range 0% and 100% exactly as a professional weather forecasting system could do.

    So NNs are not well suitable, due their intrinsic nature, for discrete systems.

    The input data are the most important thing, NNs learn only from them and if the information are not representive enough of the system, you will never obtain good results.

    In other words the input data have to be wide enough to cover all the situation in that the system could work.

    If your NN gives you bad results you should look at the input data as the first thing, and ask if the data give to the NN all the info it needs as quantity of different "situations" and if some input was not lost.

    When you try to model real system as you did, the input problem is often challenging and is "the problem".

    For example I think that the data of the region are not enough and data from neighboring regions could be have an influence on the next day's Pop (because the clouds are moving).

    Or again, 5 years of data is a too small set.

    But your approach is interesting and SQL server could be an useful to prototype a NN prediction system and then when well tested, develop an external software to achieve better performance.

    Giuseppe

  • Silvia Cobialca

    Old Hand

    Points: 390

    Giuseppe-

    You have very good points there, and it's right exactly what I experienced while working with the program. I didn't use an integer for PoP, then. What I did after the program returned the value was range the results and give them a 0 or a 1 according to their values.

    Silvia

  • g.ciccotti

    SSC Enthusiast

    Points: 118

    Silvia,

    perhaps could be more simple to predice the mm of rain really fallen in the next day.

    And set Pop=1 if the mm was grater than 0.

    Wait to read more of your work.

    Giuseppe

  • dalewilbanks

    Valued Member

    Points: 62

    I Love C, C#, but often write > 50% of application code in SQL, because it is the fastest way to manipulate data, period. If the data can be calculated within the database using SQL, rather than having to retrieve it, you have a magnitude less of network I/O, which is typically a large bottleneck for an application.

    'If' you are going to persist your Neural Net state in the database with every calculation, using SQL might be the fastest NeuralNet implementation, due to minimal network I/O. Not saying that is the best approach, just saying it should be considered.

  • Phil Paxton

    SSC Veteran

    Points: 234

    Excellent article. It's a nice break from "standard" languages, even though that's how I get the most money.

    There's a quote from Celko which (paraphrased) to be good with SQL is to forget how to program.

    I figure that's almost like what XSLT 1.0 (which I miss because it would separate the men from the boys) was for a lot of people: you can't look at it procedurally.

    Is there a better version of the "Analysis Services" diagram?

    I cranked the resolution to practically nothing and used my reading glasses to no joy.

  • timo-947179

    SSC Journeyman

    Points: 98

    Very good article, thank you!

    About the classification matrix: am I reading it wrong or is there an error in the explanation?

    Also, there were 243 cases when the network predicted that it was going to rain and it didn't rain (a false negative)

    Shouldn't it be:

    Also, there were 243 cases when the network predicted that it was NOT going to rain and it DID rain (a false negative)

  • dray

    SSC Enthusiast

    Points: 107

    I've lurked too much - gleaned much from SQLCentral but pretty much keep quite. I was searching for Neural Nets (NN) using a SQL system for main processing and was pleasantly surprised.

    Loved the article, sorry to resurrect a thread here, but I've been working on a neural net in sql, but using a simple feed forward where layer 0 is the input and layer n+1 is the output. The whole network could be expressed in 2 tables, one for the neurons, the other for the links between neurons. The Neuron table keeps state from the last feed-forward iteration, a sigmoid shape bias, and a threshold. The NeuralLink table links the neurons and holds a weight per neuron.

    I was trying to think in sets when constructing the model. Each neuron will know the layer it is in and by anchoring on the neurons from N1 to Nx, each set of connections are called in and the connected neuron weights are sum'd and saved per neuron.

    The problem is - I've so far been unable to unravel the loop to support a recursive CTE because a summation will need to occur before moving onto the next layer. I have to use a while loop to iterate through the layers. (from 1 to x - forward fashion). Essentially, sequential processing must occur at the time being.

    The back propagation routine for learning is just done in reverse, Nx to N1

    The training data and corresponding expected correct responses can be added into 2 tables and the relational nature immediately allows a pathway for easy training using sets.

    This system may work surprisingly well. Usually the layers aren't the issue, its the sheer number of interconnections. If I can take advantage of proper indexing and set based procedures, the SQL solution with large numbers of neurons may end up working because SQL is tuned to working with large amounts of relational data better than my C# code. The Network and the processing code looks a lot less complex, since most of the programming is spent re-building RI into the objects needed to support NN (Neural Nets). I've attempted to make the Neural Net a data processing set system.

    To make use of it, update and populate the input layer and call the stored procedure telling it what neural net you want to process. Read output from the last layer in the network for your answers.

    Due to the rational layout, several neural networks could be created sharing the sample inputs and outputs. Support to deposit the output data per network with epoch encoding would allow for several networks to process the same input data, potentially allowing for some interesting parallelism.

    A Neural Net on SQL would still be quite intensive - albeit predicatively spikey. Its been a fun project. Getting the back-prob to work without a functional calculus background has been tricky.

    I work for a soil and water testing lab, and develop databases, web services, intra-business programs (LIMS , analysis, reporting, etc) and other software development. I've made for the company a multiple pH probe system that collects acquired data and determines when the probes are stable and controls other auto-processing aspects.

    I have 120 million or so datapoints collected (ph behaviors) and would like to see if I can unleash a neural net on this data and see what kind of stuff would come up.

    Some thoughts would be see if there could be a minimal number of samples to predict the end point pH, or determine when something is sampled wrong etc.

    Mainly its to expand more useless knowledge 😀

    Interesting to see others messing around with NN in SQL.

    Thanks,

    Dave

  • Silvia Cobialca

    Old Hand

    Points: 390

    Thanks Dave for the ellaboration of all your experiences with a subject similar to mine. I also did it just for fun and let me tell you it certainly was a lot of fun and experience with coding.

    I do have still some pending stuff left on this one and that is a generalization of the model I did. Have it on standby due to time constraints but will revisit this one in the near future

Viewing 11 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply