Is this a use case for Graph Database?

  • Hi SSC,

    I have been asked a few days ago if tracking shipped items is something for a Graph Database and my initial response was something along the lines of "oh, yeah! Definitely, you can visualize shortest paths and things..."

    After having a look at the available data I'm not entirely sure anymore but first things first:

    Sample data is at the end, a few records which are fine and one sample record with let's say issues, the record with issues would be 0857127151 but let me explain why it is. Imagine you are manufacturing and shipping from and to different sites because you might have certain tools or skills only available in one location. So when the product arrives at a site it's considered Type "EI", local processing is considered "VF" and departure for the next manufacturing step or delivery (outgoing) to the customer is "AU". You would expect that at each site you would generate one of each those events if everything goes according to plan - I would assume according to plan is somewhere between one and three days but I have no concrete answer to this.

    Now when things go wrong the current status might be changed to locked until the issues are resolved, this is most likely happen to processing "VF" when for example you miss a certain part because there is a delay somewhere in your supply chain the record will change "Locked" to true. Once your issue is resolved a new entry is generated when the issue is resolved and processing continues at the same site. So if for example there is an issue with outgoing delivery (AU) I would not expect as next entry by date to see arrival (EI) as next event but rather another outgoing delivery which is not locked before seeing an arrival event. An example record with such issue is 0057100297, the first locked event makes sense, the second one does not.

    We have about 10% of all records with such issues (anyone who wants to argue get better data - it's human entry errors …) and we would like to be able to detect OrderNumbers with issues and instead of loading into the DWH, send these back for review. Yes I was told at which amount of rows we're looking and all I could say to that was "so you're going to get yourself A LOT of students for this, huh?"

    I was thinking of Graph Database because you could eventually be able to determine where most delays (Locked) happened rather easy in a PowerBI Report, I might still be wrong and not see issues on the horizon with this.

    Enough storytelling for now here's some code!

     

    USE [TestDB]
    GO


    SET ANSI_NULLS ON
    GO

    SET QUOTED_IDENTIFIER ON
    GO

    CREATE TABLE [dbo].[GT_Internal_Segments](
    [OrderNumber] [nvarchar](10) NULL,
    [ShippingPath] [nvarchar](500) NULL,
    [Import_ID] [int] NULL,
    [UnloadKey] [nvarchar](9) NULL,
    [UnloadKey_Sequence] [int] NULL,
    [isFinal] [bit] NULL
    ) ON [PRIMARY]
    GO


    USE [TestDB]
    GO


    SET ANSI_NULLS ON
    GO

    SET QUOTED_IDENTIFIER ON
    GO

    CREATE TABLE [dbo].[GT_Internal_Segments_current](
    [OrderNumber] [nvarchar](10) NULL,
    [Location_From] [nvarchar](9) NULL,
    [Location_To] [nvarchar](9) NULL,
    [Date_EI] [datetime] NULL,
    [Date_VF] [datetime] NULL,
    [Date_AU] [datetime] NULL,
    [isLastReportedEvent] [bit] NULL,
    [isMovement] [bit] NULL,
    [isFinal] [bit] NULL,
    [Locked] [nvarchar](1) NULL,
    [Sequence] [int] NULL
    ) ON [PRIMARY]
    GO

    USE [TestDB]
    GO


    SET ANSI_NULLS ON
    GO

    SET QUOTED_IDENTIFIER ON
    GO

    CREATE TABLE [dbo].[GT_Internal_Segments_passed](
    [OrderNumber] [nvarchar](10) NULL,
    [Location_From] [nvarchar](9) NULL,
    [Location_To] [nvarchar](9) NULL,
    [Date_EI] [datetime] NULL,
    [Date_VF] [datetime] NULL,
    [Date_AU] [datetime] NULL,
    [isLastReportedEvent] [bit] NULL,
    [isMovement] [bit] NULL,
    [Locked] [nvarchar](1) NULL,
    [Sequence] [int] NULL
    ) ON [PRIMARY]
    GO

    USE [TestDB]
    GO


    SET ANSI_NULLS ON
    GO

    SET QUOTED_IDENTIFIER ON
    GO

    CREATE TABLE [dbo].[GT_FZGTERMINE](
    [OrderNumber] [nvarchar](10) NULL,
    [UnloadKey] [nvarchar](9) NULL,
    [HubName] [nvarchar](50) NULL,
    [Type] [nvarchar](2) NULL,
    [Locked] [nvarchar](1) NULL,
    [Deadline] [datetime] NULL,
    [Import_ID] [int] NULL
    ) ON [PRIMARY]
    GO
    USE [TestDB]
    GO


    SET ANSI_NULLS ON
    GO

    SET QUOTED_IDENTIFIER ON
    GO

    CREATE TABLE [dbo].[GT_VERSANDWEG](
    [OrderNumber] [nvarchar](10) NULL,
    [ShippingPath] [nvarchar](500) NULL,
    [Import_ID] [int] NULL
    ) ON [PRIMARY]
    GO


    INSERT INTO dbo.GT_FZGTERMINE (OrderNumber, UnloadKey, HubName, [Type], Locked, Deadline, Import_ID)
    Values
    ('0857127151','200006700','Wvsd Brm', 'EI','N','2019-08-26 01:05:00.000','60'),
    ('0857127151','200000067','Hub Nord ATB Brm', 'EI','N','2019-08-26 01:27:00.000','60'),
    ('857127151','200000067','Hub Nord ATB Brm' , 'VF','J','2019-08-26 06:35:00.000','60'),
    ('0857127151','200000067','Hub Nord ATB Brm', 'AU','N','2019-08-27 22:03:00.000','60'),
    ('0857127151','200006700','Wvsd Brm' , 'EI','N','2019-08-27 22:07:00.000','60'),
    ('0857127151','200005000','Wvsd Sdf', 'EI','N','2019-08-29 09:12:00.000','60'),
    ('0857127151','200005000','Wvsd Sdf', 'VF','J','2019-08-29 09:39:00.000','60'),
    ('0857127151','200005000','Wvsd Sdf', 'AU','N','2019-08-30 06:29:00.000','60'),
    ('0857127151','200006700','Wvsd Brm', 'EI','N','2019-08-30 06:29:00.000','60'),
    ('0857127151','571UH9000','FLC Ahf', 'EI','N','2019-08-30 10:50:00.000','60'),
    ('0857127151','200006700','Wvsd Brm', 'EI','N','2019-08-30 11:01:00.000','60'),
    ('0857127151','571UH9000','FLC Ahf', 'VF','J','2019-09-03 08:42:00.000','60'),
    ('0857127151','571UH9000','FLC Ahf', 'AU','N','2019-09-03 11:00:00.000','60'),
    ('0857127151','571000055','Customer1', 'EI','N','2019-09-03 15:11:00.000','60'),
    ('0857127151','200006700','Wvsd Brm', 'EI','N','2019-09-04 02:06:00.000','60'),
    ('0857127151','200006700','Wvsd Brm', 'VF','N','2019-10-18 11:44:00.000','60'),
    ('0857127151','200006700','Wvsd Brm', 'AU','N','2019-10-18 11:50:00.000','60'),
    ('0857145974','531011700','Wvsd Hbh', 'EI','N','2008-08-07 08:06:00.000','45'),
    ('0857145974','531011700','Wvsd Hbh', 'VF','N','2008-08-12 07:13:00.000','45'),
    ('0857145974','531011704','Storage Hbh', 'VF','N','2008-08-12 07:17:00.000','45'),
    ('0857145974','531011700','Wvsd Hbh', 'AU','N','2008-08-12 09:02:00.000','45'),
    ('0857145974','531011704','Storage Hbh', 'EI','N','2008-08-12 09:03:00.000','45'),
    ('0857145974','531011704','Storage Hbh', 'AU','N','2008-08-12 09:06:00.000','45'),
    ('0857145974','571UH9000','FLC Ahf', 'EI','N','2008-08-12 13:29:00.000','45'),
    ('0857145974','571UH9000','FLC Ahf', 'VF','N','2008-08-14 06:27:00.000','45'),
    ('0857145974','571UH9000','FLC Ahf', 'AU','N','2008-08-14 08:00:00.000','45'),
    ('0857145974','571000D09','Customer2', 'EI','N','2008-08-14 11:24:00.000','45'),
    ('0057101105','705013800','Wvsd Tcl', 'EI','N','2010-04-15 00:11:00.000','45'),
    ('0057101105','705013800','Wvsd Tcl', 'VF','N', '2010-04-16 14:05:00.000','45'),
    ('0057101105','705013800','Wvsd Tcl', 'AU','N','2010-04-16 14:05:00.000','45'),
    ('0057101105','705A91000','HFN Bwk', 'EI','N','2010-04-21 12:00:00.000','45'),
    ('0057101105','705A91000','HFN Bwk', 'VF','N','2010-04-24 12:00:00.000','45'),
    ('0057101105','705A91000','HFN Bwk', 'AU','N','2010-04-26 16:30:00.000','45'),
    ('0057101105','200A12000','HFN Bhv', 'EI','N','2010-05-06 00:00:00.000','45'),
    ('0057101105','200A12000','HFN Bhv', 'VF','N','2010-05-19 12:39:00.000','45'),
    ('0057101105','200A12000','HFN Bhv', 'AU','N','2010-05-21 09:45:00.000','45'),
    ('0057101105','571UH9000','FLC Ahf', 'EI','N','2010-05-25 13:55:00.000','45'),
    ('0057101105','571UH9000','FLC Ahf', 'VF','N','2010-06-01 10:14:00.000','45'),
    ('0057101105','571UH9000','FLC Ahf', 'AU','N','2010-06-01 10:29:00.000','45'),
    ('0057101105','571000055','Customer1', 'EI','N','2010-06-01 13:16:00.000','45'),
    ('0057100297','200005000','Wvsd Sdf', 'EI','N','2010-11-18 08:56:00.000','45'),
    ('0057100297','200005000','Wvsd Sdf', 'VF','J','2010-11-18 21:21:00.000','45'),
    ('0057100297','200005000','Wvsd Sdf', 'VF','N','2010-11-19 05:57:00.000','45'),
    ('0057100297','200005000','Wvsd Sdf', 'AU','N','2010-11-19 11:27:00.000','45'),
    ('0057100297','200000028','FLC Hbg (XYX/XZ)','EI','N','2010-11-19 14:03:00.000','45'),
    ('0057100297','200000028','FLC Hbg (XYX/XZ)','VF','N','2010-11-22 12:59:00.000','45'),
    ('0057100297','200000028','FLC Hbg (XYX/XZ)','AU','N','2010-11-23 11:05:00.000','45'),
    ('0057100297','200000028','FLC Hbg (XYX/XZ)','AU','J','2010-11-23 11:45:00.000','45'),
    ('0057100297','571UH9000','FLC Ahf', 'EI','N','2010-11-24 06:34:00.000','45'),
    ('0057100297','571UH9000','FLC Ahf', 'VF','N','2010-11-24 15:32:00.000','45'),
    ('0057100297','571UH9000','FLC Ahf', 'AU','N','2010-11-25 05:00:00.000','45'),
    ('0057100297','571000160','Customer3','EI','N','2010-11-25 09:00:00.000','45')
    ;
    INSERT INTO dbo.GT_Internal_Segments (OrderNumber, ShippingPath, Import_ID, UnloadKey, UnloadKey_Sequence, isFinal)
    VALUES
    ('0057100297','200005000->200000028->571UH9000->571000160','45','200005000','0','0'),
    ('0057100297','200005000->200000028->571UH9000->571000160','45','200000028','1', '0'),
    ('0057100297','200005000->200000028->571UH9000->571000160','45','571UH9000','2','0'),
    ('0057100297','200005000->200000028->571UH9000->571000160','45','571000160','3','1'),
    ('0057101105','705013800->705A91000->200A12000->571UH9000->571000055','45','705013800','0','0'),
    ('0057101105','705013800->705A91000->200A12000->571UH9000->571000055','45','705A91000','1','0'),
    ('0057101105','705013800->705A91000->200A12000->571UH9000->571000055','45','200A12000','2','0'),
    ('0057101105','705013800->705A91000->200A12000->571UH9000->571000055','45','571UH9000','3','0'),
    ('0057101105','705013800->705A91000->200A12000->571UH9000->571000055','45','571000055','4','1'),
    ('0857127151','200006700->200000067->200005000->571UH9000->571000055','64','200006700','0','0'),
    ('0857127151','200006700->200000067->200005000->571UH9000->571000055','64','200000067','1','0'),
    ('0857127151','200006700->200000067->200005000->571UH9000->571000055','64','200005000','2','0'),
    ('0857127151','200006700->200000067->200005000->571UH9000->571000055','64','571UH9000','3','0'),
    ('0857127151','200006700->200000067->200005000->571UH9000->571000055','64','571000055','4','1'),
    ('0857145974','531011700->531011704->571UH9000->571000D09','45','531011700','0','0'),
    ('0857145974','531011700->531011704->571UH9000->571000D09','45','531011704','1','0'),
    ('0857145974','531011700->531011704->571UH9000->571000D09','45','571UH9000','2','0'),
    ('0857145974','531011700->531011704->571UH9000->571000D09','45','571000D09','3','1');
    INSERT INTO [dbo].[GT_Internal_Segments_current] (OrderNumber, Location_From, Location_To, Date_EI, Date_VF, Date_AU, isLastReportedEvent, isMovement, isFinal, Locked, Sequence)
    VALUES
    ('0057100297','571000160','571000160','2010-11-25 09:00:00.000','','','1','0','1','','6'),
    ('0057101105','571000055','571000055','2010-06-01 13:16:00.000','','','1','0','1','','8'),
    ('0857127151','200006700','','2019-10-18 11:50:00.000','','','0','1','0','','19'),
    ('0857145974','571000D09','571000D09','2008-08-14 11:24:00.000','','','1','0','1','','10');
    INSERT INTO [dbo].[GT_Internal_Segments_passed] (OrderNumber, Location_From, Location_To, Date_EI, Date_VF, Date_AU, isLastReportedEvent, isMovement, Locked, Sequence)
    VALUES
    ('0057100297','200005000','200005000','2010-11-18 08:56:00.000','2010-11-19 05:57:00.000','2010-11-19 11:27:00.000','0','0','','0'),
    ('0057100297','200005000','200000028','2010-11-19 11:27:00.000','','2010-11-19 14:03:00.000','0','1','','1'),
    ('0057100297','200000028','200000028','2010-11-19 14:03:00.000','2010-11-22 12:59:00.000','2010-11-23 11:45:00.000','0','0','','2'),
    ('0057100297','200000028','571UH9000','2010-11-23 11:45:00.000','','2010-11-24 06:34:00.000','0','1','','3'),
    ('0057100297','571UH9000','571UH9000','2010-11-24 06:34:00.000','2010-11-24 15:32:00.000','2010-11-25 05:00:00.000','0','0','','4'),
    ('0057100297','571UH9000','571000160','2010-11-25 05:00:00.000','','2010-11-25 09:00:00.000','0','1','','5'),
    ('0057101105','705013800','705013800','2010-04-15 00:11:00.000','2010-04-16 14:05:00.000','2010-04-16 14:05:00.000','0','0','','0'),
    ('0057101105','705013800','705A91000','2010-04-16 14:05:00.000','','2010-04-21 12:00:00.000','0','1','','1'),
    ('0057101105','705A91000','705A91000','2010-04-21 12:00:00.000','2010-04-24 12:00:00.000','2010-04-26 16:30:00.000','0','0','','2'),
    ('0057101105','705A91000','200A12000','2010-04-26 16:30:00.000','','2010-05-06 00:00:00.000','0','1','','3'),
    ('0057101105','200A12000','200A12000','2010-05-06 00:00:00.000','2010-05-19 12:39:00.000','2010-05-21 09:45:00.000','0','0','','4'),
    ('0057101105','200A12000','571UH9000','2010-05-21 09:45:00.000','','2010-05-25 13:55:00.000','0','1','','5'),
    ('0057101105','571UH9000','571UH9000','2010-05-25 13:55:00.000','2010-06-01 10:14:00.000','2010-06-01 10:29:00.000','0','0','','6'),
    ('0057101105','571UH9000','571000055','2010-06-01 10:29:00.000','','2010-06-01 13:16:00.000','0','1','','7'),
    ('0857127151','200006700','200006700','2019-08-26 01:05:00.000','','','0','0','','0'),
    ('0857127151','200006700','200000067','','','2019-08-26 01:27:00.000','0','1','','1'),
    ('0857127151','200000067','200000067','2019-08-26 01:27:00.000','2019-08-26 06:35:00.000','2019-08-27 22:03:00.000','0','0','','2'),
    ('0857127151','200000067','200006700','2019-08-27 22:03:00.000','','2019-08-27 22:07:00.000','0','1','','3'),
    ('0857127151','200006700','200006700','2019-08-27 22:07:00.000','','','0','0','','4'),
    ('0857127151','200006700','200005000','','','2019-08-29 09:12:00.000','0','1','','5'),
    ('0857127151','200005000','200005000','2019-08-29 09:12:00.000','2019-08-29 09:39:00.000','2019-08-30 06:29:00.000','0','0','','6'),
    ('0857127151','200005000','200006700','2019-08-30 06:29:00.000','','2019-08-30 06:29:00.000','0','1','','7'),
    ('0857127151','200006700','200006700','2019-08-30 06:29:00.000','','','0','0','','8'),
    ('0857127151','200006700','571UH9000','','','2019-08-30 10:50:00.000','0','1','','9'),
    ('0857127151','571UH9000','571UH9000','2019-08-30 10:50:00.000','','','0','0','','10'),
    ('0857127151','571UH9000','200006700','','','2019-08-30 11:01:00.000','0','1','','11'),
    ('0857127151','200006700','200006700','2019-08-30 11:01:00.000','','','0','0','','12'),
    ('0857127151','200006700','571UH9000','','','2019-09-03 08:42:00.000','0','1','','13'),
    ('0857127151','571UH9000','571UH9000','','2019-09-03 08:42:00.000','2019-09-03 11:00:00.000','0','0','','14'),
    ('0857127151','571UH9000','571000055','2019-09-03 11:00:00.000','','2019-09-03 15:11:00.000','0','1','','15'),
    ('0857127151','571000055','571000055','2019-09-03 15:11:00.000','','','0','0','','16'),
    ('0857127151','571000055','200006700','','','2019-09-04 02:06:00.000','0','1','','17'),
    ('0857145974','531011700','531011700','2008-08-07 08:06:00.000','2008-08-12 07:13:00.000','','0','0','','0'),
    ('0857145974','531011700','531011704','','','2008-08-12 07:17:00.000','0','1','','1'),
    ('0857145974','531011704','531011704','','2008-08-12 07:17:00.000','','0','0','','2'),
    ('0857145974','531011704','531011700','','','2008-08-12 09:02:00.000','0','1','','3'),
    ('0857145974','531011700','531011700','','','2008-08-12 09:02:00.000','0','0','','4'),
    ('0857145974','531011700','531011704','2008-08-12 09:02:00.000','','2008-08-12 09:03:00.000','0','1','','5'),
    ('0857145974','531011704','531011704','2008-08-12 09:03:00.000','','2008-08-12 09:06:00.000','0','0','','6'),
    ('0857145974','531011704','571UH9000','2008-08-12 09:06:00.000','','2008-08-12 13:29:00.000','0','1','','7'),
    ('0857145974','571UH9000','571UH9000','2008-08-12 13:29:00.000','2008-08-14 06:27:00.000','2008-08-14 08:00:00.000','0','0','','8'),
    ('0857145974','571UH9000','571000D09','2008-08-14 08:00:00.000','','2008-08-14 11:24:00.000','0','1','','9'),
    ('0857127151','200006700','200006700','2019-09-04 02:06:00.000','2019-10-18 11:44:00.000','2019-10-18 11:50:00.000','1','0','','18');
    INSERT INTO dbo.GT_Versandweg (OrderNumber, ShippingPath, Import_ID)
    VALUES
    ('0857127151','200006700->200000067->200005000->571UH9000->571000055','64'),
    ('0057100297','200005000->200000028->571UH9000->571000160','45'),
    ('0057101105','705013800->705A91000->200A12000->571UH9000->571000055','45'),
    ('0857145974','531011700->531011704->571UH9000->571000D09','45');

    How would I approach checking the OrderNumbers for validity within the steps so an item in "AU" and locked cannot appear at a different site as next record, same as (by date) for example two arrival Events (EI) at different sites cannot happen subsequently right away as this would definitely be having a look how Schroedingers' Cat feels like today at the edge of the box.

    Do you think going with a Graph Database / Tables would be feasible for something like that or rather stick to regular MSSQL database and tables?

    Cheers

  • Many years ago, I was asked by a graph database company to solve the "Kevin Bacon problem" in SQ to compare to their products solution. If you don't know this classic problem, the ideas that we have a database of movies with all of the actors, staff, and other credited professionals that worked on a film. Your task is to prove that nobody in Hollywood is more than six degrees away from Kevin Bacon (i.e. if you're Kevin Bacon, then your degree is zero; if you costarred with Kevin Bacon in your degree is one; if you costarred with a costar of Kevin Bacon, then at your degree is two; etc.) This same game is played by geeks, with the mathematician Paul Erdos.

    My program ran several orders of magnitude slower than their program. And it was less flexible. One of their customers was a law enforcement agency and looks of the terrorist attacks. They were able to construct a graph that tied together to recently released convicts, one of whom had stolen a rental truck and the other of whom had bought a lot of ammonium nitrate fertilizer. The traffic tickets showed the truck was parked, a National Park with a damn. All of this fell out from data collected from a dozen different sources. Just standardizing the encodings in SQL would have been a major project.

    Graph databases like any other tool. If it's the right tool for the job, then use it!

    I will look at your DDL and see if it is good SQL or not later.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • Well in my case you could potentially argue that certain products would only be produced at certain sites and depending on the location of the customer there should be a standard route between order and delivery. Usually the route is not the issue but we would certainly like to understand why on day x there was a big delay while on day y we did not face any delays. This might be extended in the future to pushing something like traffic information on route back to whoever issues the next outgoing (AU) event but right now we would be more than happy to be able to find those 10% incorrect order entrys and send those back for review. As we only run a DWH I unfortunately see very little use (but I might be wrong) for DQS here as we only get new data once a week. Thank you for having a look at this.

  • DinoRS wrote:

    How would I approach checking the OrderNumbers for validity within the steps so an item in "AU" and locked cannot appear at a different site as next record, same as (by date) for example two arrival Events (EI) at different sites cannot happen subsequently right away as this would definitely be having a look how Schroedingers' Cat feels like today at the edge of the box.

    Generally imo, looking for validity issues with queries in oltp-type situations indicates the data model is not sufficiently detailed and/or constrained.  It shouldn't be possible to show up in two places at once.  Perhaps a product arrival at a WH could be modelled as a separate entity and the GT_Internal_Segments_passed table normalized?   Maybe you could follow a particular product through the SQL steps all the way?

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • the funny part is: If you look at the OLTP DB and their flow it would appear valid (as in: there seems to be no real checks) and when we look at it from a DWH side it doesn't make sense. I was considering either modelling an arrival or a WH but I guess makes sense looking at both options.

    Can and how would I check actual dates from either status against what's expected? For example I would expect once the order arrives at a WH (EI), a change to processing (VF) would be expected within an hour, anything above is an anomaly and should be highlighted later on in reporting. I wouldn't expect departure from one WH (AU) and arrival at the next WH to take more than 24 hours except for parts of the route which have to be shipped, what would be the best approach to find such anomalies?

  • DinoRS wrote:

    the funny part is: If you look at the OLTP DB and their flow it would appear valid (as in: there seems to be no real checks) and when we look at it from a DWH side it doesn't make sense. I was considering either modelling an arrival or a WH but I guess makes sense looking at both options.

    Can and how would I check actual dates from either status against what's expected? For example I would expect once the order arrives at a WH (EI), a change to processing (VF) would be expected within an hour, anything above is an anomaly and should be highlighted later on in reporting. I wouldn't expect departure from one WH (AU) and arrival at the next WH to take more than 24 hours except for parts of the route which have to be shipped, what would be the best approach to find such anomalies?

    When you refer to the oltp db it's as "their flow" and when you refer to the DWH "side" it's as "we look at it...".  So is it correct it isn't feasible to make changes to the oltp?  I'm sure there's significant existing infrastructure.  It seems you're suggesting to create the operational reporting purely from the existing dwh tables?  Is all of the data from the examples from the dwh?  Has there been any transformation already applied to the data sent to the dwh?

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Yes, unfortunately it's absolutely not feasible to make changes to the OLTP System where data is inserted initially, it's an 3rd party ISV. All we get is some CSV Exports and what I've posted before is our first draft at how the staging tables might look like, in fact even the CSVs we get don't have all the necessary Information.

    Yes, all the data from the examples is from the DWH, asides some changes to Hubnames the data is real data. There has not been any transformation applied to the data sent to the DWH but the data we receive is only a moving delta in which we don't see unlock events for previously locked orders.

    We're willing to not process wrongful data for the moment (or well, rather highlight it as mentioned before) and we might even deal with time related anomalies later BUT  if you look at dbo.GT_VERSANDWEG.ShippingPath that's just estimated when the order was placed. While what is in GT_Internal_Segments_current and GT_Internal_Segments_passed is the actual shipping path.

    For example this is the estimated shipping path for an order:

    705013800->705A91000->200A12000->571UH9000->571000055

    We would like to understand if and how the following questions can be anwered with Graph Tables

    1. The actual path differs, for example 705013800->705A91000->300A12000->571UH9000->571000055, can and how would we query for such difference? Given we have a table in which we insert the actual path.
    2. The Order is not finished, let's say for a set of OrderNumbers we have the actual shipping path looking like this 705013800->705A91000, can and how would we query the following question: for which OrderNumbers do we expect the next ShippingPath to be 200A12000 and for which orders do we expect to have a remaining shipping path of 200A12000->571UH9000->571000055 or 200A12000->571UH9000 as the last ShippingPath is delivery to the customer?
  • DinoRS wrote:

    Yes, unfortunately it's absolutely not feasible to make changes to the OLTP System where data is inserted initially, it's an 3rd party ISV. All we get is some CSV Exports and what I've posted before is our first draft at how the staging tables might look like, in fact even the CSVs we get don't have all the necessary Information.

    Yes, all the data from the examples is from the DWH, asides some changes to Hubnames the data is real data. There has not been any transformation applied to the data sent to the DWH but the data we receive is only a moving delta in which we don't see unlock events for previously locked orders.

    We're willing to not process wrongful data for the moment (or well, rather highlight it as mentioned before) and we might even deal with time related anomalies later BUT  if you look at dbo.GT_VERSANDWEG.ShippingPath that's just estimated when the order was placed. While what is in GT_Internal_Segments_current and GT_Internal_Segments_passed is the actual shipping path.

    For example this is the estimated shipping path for an order:

    705013800->705A91000->200A12000->571UH9000->571000055

    Ok I think you're generally taking a good approach.  Receiving a moving delta isn't necessarily bad if you can get the historical files going back all way to the beginning.  Do you have a plan for updating the dw incrementally?  Besides the sample tables posted are there (or are you planning) dwh base tables with primary keys for orders, locations, imports, hub names?

    DinoRS wrote:

    We would like to understand if and how the following questions can be anwered with Graph Tables

      <li style="list-style-type: none;">

    1. The actual path differs, for example 705013800->705A91000->300A12000->571UH9000->571000055, can and how would we query for such difference? Given we have a table in which we insert the actual path.
      <li style="list-style-type: none;">

    1. The Order is not finished, let's say for a set of OrderNumbers we have the actual shipping path looking like this 705013800->705A91000, can and how would we query the following question: for which OrderNumbers do we expect the next ShippingPath to be 200A12000 and for which orders do we expect to have a remaining shipping path of 200A12000->571UH9000->571000055 or 200A12000->571UH9000 as the last ShippingPath is delivery to the customer?

    Afaik, these are both potentially good applications for a graph db.  Above more than a handful of nodes the number of pathways gets really large fast so a graph database could be very useful.

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Yes there are / will be additional tables where locations and other things will be stored and yes Primary Keys will be introduced aswell.

    We have literally no choice but to update the DWH incrementally as we need to add some events on our own to make things actually look like in the sample data above.

    I essentially am interrupting (yes, evil me!) what is being developed by asking "Why not Graph tables?" and the reply back was "well, how would our use case look like?"

    Our main issue is how would we would query for above mentioned questions if we have let's say another table

    CREATE TABLE [dbo].[GT_VERSANDWEG_actual](

    [OrderNumber] [nvarchar](10) NULL,

    [ShippingPath] [nvarchar](500) NULL,

    [Import_ID] [int] NULL

    ) ON [PRIMARY]

    Which contains what is to us known as the most recent, actual ShippingPath based on what we see in the data we Import. How could a Graph Table based query look like for differences and not yet finished orders? Not looking for a complete solution but rather understanding how we would build our queries with Graph Tables to answer these questions.

  • DinoRS wrote:

    Yes there are / will be additional tables where locations and other things will be stored and yes Primary Keys will be introduced as well.

    We have literally no choice but to update the DWH incrementally as we need to add some events on our own to make things actually look like in the sample data above.

    I essentially am interrupting (yes, evil me!) what is being developed by asking "Why not Graph tables?" and the reply back was "well, how would our use case look like?"

    Our main issue is how would we would query for above mentioned questions if we have let's say another table

    CREATE TABLE [dbo].[GT_VERSANDWEG_actual](

    [OrderNumber] [nvarchar](10) NULL,

    [ShippingPath] [nvarchar](500) NULL,

    [Import_ID] [int] NULL

    ) ON [PRIMARY]

    Which contains what is to us known as the most recent, actual ShippingPath based on what we see in the data we Import. How could a Graph Table based query look like for differences and not yet finished orders? Not looking for a complete solution but rather understanding how we would build our queries with Graph Tables to answer these questions.

    Ha, don't mess with the plan!  Again afaik (I've only read a few articles) it's not either/or having normalized base tables.  Having the base tables imo is a necessary step to make loading Graph tables go smoothly.  It sounds like there's a plan in place to get this part done.

    The versandweg table is created by an external system?  You mentioned previously " we don't see unlock events for previously locked orders".  Does this negatively impact the accuracy of reporting the actual pathways?   Is this an area where it's possible to get more granular information from the oltp system?

    Specifically regarding Graph queries and set up I'm just not familiar enough with it.  Hopefully someone else has the insight to help out.

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • It sounds like a simpler query in Graph than in SQL. From the little I've done with graph, you can model these issues in SQL or in a graph system, but the query structure is much simpler in graph, especially for unknown path lengths. When it's a known path, the SQL is straightforward, but can be long as you join to nest the number of paths needed.

    Likely there are other tricks, but you might need to model it each way and see how complex it is. Inserting into graph structures seems to be the hardest and most frustrating part in SQL Server. It seems easier in things like Neo4J.

  • No,  the GT_Versandweg table is not created externally but we're essentially limited to being provided the data you see in the first post - except for the unlock events. Data provided "externally" is only via CSV file, no human interface anywhere. How the data actually looks like is provided via data in the OP - with the difference it's spread across a few tables since I don't have access to the raw CSV. Answering your question if any transformation of the here presented data has been done was answered "no, except adding some unlock Events we are able to catch via our own code".

    We don't see unlock events for previously locked orders refers to

    Now when things go wrong the current status might be changed to locked until the issues are resolved, this is most likely happen to processing "VF" when for example you miss a certain part because there is a delay somewhere in your supply chain the record will change "Locked" to true. Once your issue is resolved a new entry is generated when the issue is resolved and processing continues at the same site. So if for example there is an issue with outgoing delivery (AU) I would not expect as next entry by date to see arrival (EI) as next event but rather another outgoing delivery which is not locked before seeing an arrival event. An example record with such issue is 0057100297, the first locked event makes sense, the second one does not.

    That is part of our 10% issue, we can catch VF but nothing in AU because simplified said: VF Events happen within a plant so the previous and next entry must be within that plant, an AU Event could have happened at the outgoing plant, the next incoming plant and in transit so we cannot exacty expect another AU 1 Event (1 being the plant here) after an AU 1 'isLocked' Event, it might aswell be a EI 2 Event or an EI 1 Event, depending on what the truck driver decides to do: Return to previous plant, go to the next planned plant …

    Simply said: Even with the best possible normalization you will not be able to catch that 10%, believe me or not.

     

    Likely there are other tricks, but you might need to model it each way and see how complex it is.

    You do realize that we don't want to model for Graph Tables because we don't understand how to build comparison queries against it, right? Because I don't see how modelling the whole structure would tell us "and after all the hard journey, your compare query looks like this!". We are very willing to go the Graph way but without understanding how relevant queries would have to be written, it's easier to stay with what is known working.

    Simplest query comparing 2 paths in 2 different tables is what we need to understand or actually the differences in 2 paths in 2 different tables so we can report against differences between planned and actual path and how many and which orders are stuck at plant XY because the order isn't finished and how the planned path looks for the remaining process.

    IF we could check the path for sanity (right order of processing steps) using Graph Tables, too it would be great but that part can be solved differently.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply