This article describes a generic stored procedure for SQL Server that accepts two tables inputs and returns an R script to create an AT&T network diagram. The inputs are a set of 'Node' names and labels as a table input, and 'Edges' (nodes that refer to nodes) that are also also a table input.
There are two great things about the R packages used. First, they support network hierarchies with multiple edges for each node, something many similar R packages (Data Tree) and SQL Server features (recursive queries/hierarchy data types) don't do well. The diagrams also scale well, when there are thousands of nodes the graph is still laid out nicely in most cases.
The diagram below shows a machine generated network topology consisting of 36 nodes, each node is referenced by the nodes immediately left, above left and below left.
The diagram below shows a subset of the same network, 'I19' and 'H24' were selected as the anchors then only nodes referred to or referred to by them were included. Labels can be applied to each node, there are various attributes that can be set for both nodes and edges, just the anchor node fill colour was set below.
Simple Example Execution
Execute the script below, after running the 01-sp_GraphNELGene setup:
-- Simple hierarchy USE tempdb; GO DECLARE @Node AS GraphNELNodeTableType; DECLARE @NodeReferedTo AS GraphNELNodeRefersToTableType; DECLARE @AnchorNodes GraphNELAnchorNodeTableType; INSERT INTO @Node (NodeName, NodeLabel) SELECT '1','Kirk'; INSERT INTO @Node (NodeName, NodeLabel) SELECT '2','Spock'; INSERT INTO @Node (NodeName, NodeLabel) SELECT '3','Scotty'; INSERT INTO @NodeReferedTo SELECT '1','2'; INSERT INTO @NodeReferedTo SELECT '2','3'; INSERT INTO @NodeReferedTo SELECT '3','1'; INSERT INTO @AnchorNodes SELECT '1'; EXECUTE dbo.sp_GraphNELGene @Node ,@NodeReferedTo ,@AnchorNodes ,@IncludeLabels = 1 GO
The stored procedure generates and returns the R script below. Run the #commented out lines, SetRepositories(), once on a server before this one time install of packages
#SetRepositories() #source("https://bioconductor.org/biocLite.R") #biocLite() #install.packages("graph") #install.packages("graphNEL") #install.packages("Rgraphviz") rm(list = ls()) library(graph) library(Rgraphviz) NodeName <- c( "n1", "n2", "n3" ) EdgeName = list( n1 = list(edges = c(2)), n2 = list(edges = c(3)), n3 = list(edges = c(1)) ) NodeLabel <- c( "Kirk", "Spock", "Scotty" ) gR <- graphNEL(nodes = NodeName, edgeL = EdgeName, edgemode = "directed") nAttrs <- list() NodeLabel <- NodeLabel[1:numNodes(gR)] names(NodeLabel) = nodes(gR) nAttrs$label <- NodeLabel nAttrs$fillcolor <- c(n1 = "grey") eAttrs <- list() attrs <- list(graph=list(rankdir="LR"),node=list(shape="ellipse", fixedsize=FALSE, color="grey"),edge=list(color="grey")) plot(gR, nodeAttrs = nAttrs, edgeAttrs=eAttrs, attrs=attrs)
When this is run in R or Visual Studio it generates the graph below:
Example of Attribute Settings, Node Selection and Scale
This script is quick to produce very large graphs and PDF documents even on a small laptop, but the Bioconductor Organisation and SQLServerCentral both have 150 KB image size limits, so the sample diagrams are small. The small example diagram below shows a subset of a 200 node computer generated network anchored at node ‘I110’, Labels have been included, ‘Referred To By’ nodes have been excluded so image size < 150 KB:
The Bioconductor packages were developed for 'Gnome mapping in wet lab environments' but they might be useful for other purposes. The level of complexity, scaling and the different network types and relationships they support is really impressive.
The R script worked in CRAN R Studio once setRepositories() was correct, and in Visual Studio with the R tools installed, the main limitations were:
- The labelling of nodes when the diagrams scale, there’s not much space for text descriptions inside each node, the shorter the label the better .
- R and Visual studio can be slow parsing large R scripts, they execute quickly though.
- Much like recursive queries and hierarchy data types not working well with these network topolgied, the R Server installed with SQL Server doesn't easily lend itself to the execution of the R scripts created by this procedure. To automate the process and publish the graphs, it's easier to save the R script to a text file then execute it in a windows batch file or Power Shell.
- Generates a 1,000 node, 3,000 edge diagram with multiple anchor nodes, using R Studio, SQL Express on a Microsoft Surface Pro 4 in less than 5 minutes. – PDF file here, try using the PDF ‘Find’ and ‘Zoom’ features to navigate the document – https://paulbrewer.files.wordpress.com/2018/08/bioconductor_example5.pdf
Source Code and Example Execution
- 01 – Creates the GRAPHNEL table types and stored procedure in the tempdb database
- 02 – Creates the R scripts to produce the first two graphs in this post
The stored procedure (sp_GraphNELGene) accepts the following parameters:
- Nodes – Table variable with a list of Node Names and Labels
- Node Refers To – Table variable with a list of Node Names and the name of the Referred to Node
- Anchor Nodes – Table variable with a list of the Anchor Node Names
- Include Labels – Defaults to No, Displays Node Label rather than Node Name in the graph
- Include Refers To – Defaults to Yes
- Include Referred To By – Defaults to Yes
- PDF File – Path and file name for generated graph