Looking for a better script

Question

Looking for a better script

adams.squared

SSCrazy

Points: 2675
More actions
June 22, 2015 at 7:48 am

#301785

Hello All,
I'm looking for a better way to write this. The client has a similar script and basically for each part, they will cycle through all available customers with their geographic values and come up with a route based off of those values. So the initial set of customers is compared to the same set up customers and all the values are compared. Basically Point A to Point B. Then that data set is compared to the initial set of customer again to get all the possible values for Point A to Point B to Point C. And then it is done a fourth time. The problem is that when you have a lot of customers, the result grow exponentially. Below is a record set of 40. So you have 40*39*38*37 or something similar and you wind up with over 2 million records. If we put a 300 record set into the mix, it's huge and time consuming.
My initial thoughts are that this needs to go into SSAS, but I'm not sure that is an option. So I thought I'd see if it is possible to script a better way. Thoughts?
Thanks
IF EXISTS(SELECT * FROM TEMPDB.DBO.SYSOBJECTS WHERE ID = OBJECT_ID(N'TEMPDB..#CUST')) DROP TABLE #CUST;
CREATE TABLE #CUST
(Part varchar(10), Customer varchar(10), VAL Float)
insert into #CUST values
('Part1','0001',6.1),
('Part1','0002',3.5),
('Part1','0003',9.3),
('Part1','0004',11.9),
('Part1','0005',3.0),
('Part1','0006',7.4),
('Part1','0007',4.4),
('Part1','0008',5.7),
('Part1','0009',2.3),
('Part1','0010',8.8),
('Part1','0011',6.7),
('Part1','0012',4.4),
('Part1','0013',10.9),
('Part1','0014',12.1),
('Part1','0015',2.1),
('Part1','0016',6.7),
('Part1','0017',8.5),
('Part1','0018',4.9),
('Part1','0019',9.0),
('Part1','0020',3.4),
('Part1','0021',11.5),
('Part1','0022',12.2),
('Part1','0023',5.1),
('Part1','0024',15.3),
('Part1','0025',1.9),
('Part1','0026',6.7),
('Part1','0027',5.5),
('Part1','0028',2.5),
('Part1','0029',9.4),
('Part1','0030',9.5),
('Part1','0031',6.0),
('Part1','0032',11.3),
('Part1','0033',10.2),
('Part1','0034',7.9),
('Part1','0035',3.4),
('Part1','0036',6.6),
('Part1','0037',2.4),
('Part1','0038',12.5),
('Part1','0039',9.7),
('Part1','0040',8.3)
IF EXISTS(SELECT * FROM TEMPDB.DBO.SYSOBJECTS WHERE ID = OBJECT_ID(N'TEMPDB..#DATA')) DROP TABLE #DATA;
Create Table #Data (Part varchar(10), Stop1 varchar(10),Stop1Range float, Stop2 varchar(10), Stop2Range float, Stop3 varchar(10), Stop3Range float, Stop4 varchar(10), Stop4Range float, TotRange float)
Insert Into #DATA (Part, Stop1, Stop1Range)
Select Part, Customer, Val
From #Cust
Insert Into #Data (Part, Stop1, Stop1Range, Stop2, Stop2Range)
Select D.Part, D.Stop1, D.Stop1Range, C.Customer, C.Val
From #Data D
Inner Join #Cust C on D.Part=C.Part and D.Stop1<>C.Customer
Insert Into #Data (Part, Stop1, Stop1Range, Stop2, Stop2Range, Stop3, Stop3Range)
Select D.Part, D.Stop1, Stop1Range, D.Stop2, Stop2Range, C.Customer, C.Val
From #Data D
Inner Join #Cust C on D.Part=C.Part and D.Stop1<>C.Customer and D.Stop2<>C.Customer
Insert Into #Data (Part, Stop1, Stop1Range, Stop2, Stop2Range, Stop3, Stop3Range, Stop4, Stop4Range)
Select D.Part, D.Stop1, D.Stop1Range, D.Stop2, D.Stop2Range, D.Stop3, D.Stop3Range, C.Customer, C.Val
From #Data D
Inner Join #Cust C on D.Part=C.Part and D.Stop1<>C.Customer and D.Stop2<>C.Customer and D.Stop3<>C.Customer
Update #Data
Set TotRange=isnull(Stop1Range,0)+(isnull(Stop2Range,0)*1)+(isnull(Stop3Range,0)*10)+(isnull(Stop4Range,4)*100)
select * from #data
Order by TotRange desc

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

sgmunson SSC Guru Points: 110639 More actions · Answer 1

I don't have a clear understanding of the data, nor am I sure exactly what the end goal is. You appear to be indicating that for a given "part" (I have to assume this means a part number for something the company manufactures, but that might be incorrect or even completely not what you meant), you look at all possible customers geographic location data to get all possible routes between them. I don't quite see how it matters what part is involved, as the problem of routing between customers would be completely independent of what part has to travel that route. I suspect that the written word is part of the problem, so please explain in more detail as to what is meant by "part" and whether or not it has any real relationship to the route generated.

In general, solving the "most efficient route" problem is an inherently difficult task, precisely because the only way to know for sure that you have the best solution is to compute the cost of ALL possible routes. Just assigning a cost to every route is also a rather sizable challenge. Just getting good map data on usable roads is a challenge, and then there's the upkeep of such data, as new roads get created, old roads go away, and bridges get temporarily closed, causing route deviations. I'm not entirely sure that the computing power and information necessary is anywhere near the necessary level to allow for anything but a guesstimate, so please connect the dots for me on what you're trying to accomplish, and we can start looking at ways to at least process the data you have in as efficient a way as may be possible.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

adams.squared SSCrazy Points: 2675 More actions · Answer 2

The part is the part #. The initial data set is all the customers with that part. What the part is doesn't really matter either except that they do not mix the parts on the route. Don't worry about bridges and such. The route is picked based off their cost value. It's more complex than what I have represented. The question is, is there a better way to write this in SSMS so that all possible routes are examined, or is this something that SSAS should be handling?

*edit Changed SSRS to SSMS.

sgmunson SSC Guru Points: 110639 More actions · Answer 3

adams.squared (6/22/2015)
The part is the part #. The initial data set is all the customers with that part. What the part is doesn't really matter either except that they do not mix the parts on the route. Don't worry about bridges and such. The route is picked based off their cost value. It's more complex than what I have represented. The question is, is there a better way to write this in SSMS so that all possible routes are examined, or is this something that SSAS should be handling?
*edit Changed SSRS to SSMS.

The problem here is one of scale. As you noticed, ramp up the number of customers and the performance goes into the ditch. The question that not mixing the parts on the route makes me ask is why? If someone thinks that merely limiting the # of customers for the routing challenge to those for a given part number are going to be severely disappointed when some part number has several thousand customers. That's why I was looking for considerably more detail. You can't really separate this part of an overall solution and treat it as if it exists on its own. There's the question of what kind of transportation will be covering this route, and what the frequency of traveling any route for a given part number is. Analysis Services might be able to help, but I think the problem here needs a lot more detail before anything I recommend is likely to help. Even with geo-spatial data types and some of the associated functions, getting a useful solution could easily be dicey at best. That's not to say that with the proper focus, and a clear logical path to a common sense solution, that something good can't happen. It's just that there's nowhere near enough detail here to provide opportunity for someone to find a "creative spark" that might suggest a new path forward.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

drew.allen SSC Guru Points: 76944 More actions · Answer 4

If I understand what you're trying to calculate, your data is incomplete. If you want to plan a route on a 2-dimensional surface, e.g. a plane or the surface of a sphere, you need to provide two coordinates to uniquely identify a position. You have only provided one. If this represents a distance from you, two clients with a distance from you of 5KM can be anywhere from 0KM to 10KM from each other. It's impossible to calculate the best path with that degree of uncertainty.

Furthermore, the calculation of the TotRange seems to depend on the direction of the path. So if you take the following path 0002 - 0001 - 0003 - 0004 the TotRange is 1289.1, but if you take the same path in the opposite direction, the TotRange is 420.3.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

adams.squared SSCrazy Points: 2675 More actions · Answer 5

You guys are making this more complicated. Forget the values. They are random and meaningless. Forget about trying to make an efficient route, I do not care about that. The question is more about the multiple joins on the same table. Is there a better way to handle that volume? Is there a better join? Cross Apply? Outter joins?

sgmunson SSC Guru Points: 110639 More actions · Answer 6

adams.squared (6/22/2015)
You guys are making this more complicated. Forget the values. They are random and meaningless. Forget about trying to make an efficient route, I do not care about that. The question is more about the multiple joins on the same table. Is there a better way to handle that volume? Is there a better join? Cross Apply? Outter joins?

Generally, you get better help when the people trying to help you have complete information. If you have to insist on hiding the details because you happen to be sure that we don't need them, then the likelihood of you getting useful information is probably zero. I for one, don't just parrot out answers based on incomplete information. I ask questions until I have a sufficiently detailed picture of what's needed so that I can be confident that what I provide will actually work. Many of us here operate on similar principles. As we are volunteers, who see incomplete pictures far more often than complete ones, you'll just have to forgive us our questions and provide enough details for us to be reasonably certain we aren't providing someone with the proverbial "just enough rope to hang themselves".

If you don't really care about the scenario you presented, then I'm going to suggest that maybe you're asking the wrong question. The question you now appear to be asking has little relevance if the scenario it applies to is irrelevant, so any answer will probably do. How about 42 ?

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

adams.squared SSCrazy Points: 2675 More actions · Answer 7

The question is about the joins. Not the actual data. If I presented you with all the scripts involved, it'd take you a couple of days to sort through it. I'm not trying to rewrite their formula for coming up with a cost of the route. I'm not trying to figure out a better route. The question is, is there a better way to join a table onto itself multiple times than the way I am doing it.

sgmunson SSC Guru Points: 110639 More actions · Answer 8

adams.squared (6/22/2015)
The question is about the joins. Not the actual data. If I presented you with all the scripts involved, it'd take you a couple of days to sort through it. I'm not trying to rewrite their formula for coming up with a cost of the route. I'm not trying to figure out a better route. The question is, is there a better way to join a table onto itself multiple times than the way I am doing it.

And therein lies the problem. A "better" solution is quite likely to be seriously data dependent. How can we know what's going to perform better with little or no knowledge about the table structure or the indexes, or in this case, having a representative set of sample data to work with? What you're asking is more of an architecture question, and requires architecture type questions, which you so far haven't answered.

Some questions you may want to ask yourself:

1.) What's the largest number of customers for a given part number?

2.) What does the execution plan say about your existing query? Are there covering indexes?

3.) What is the current execution time, and how might it relate to larger numbers of customers?

4.) Can you model this query with a large, randomly generated set of sample data and use that to help answer question 3 ?

5.) Can you post an execution plan? There may well be parts of the entire script that are particularly responsible for performance, but we're just not seeing that part.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

adams.squared SSCrazy Points: 2675 More actions · Answer 9

Do you answer posts just for the sake of building up your points or something? What you ask for is irrelevant. The question is in regards to the given data, not the data I have not provided you with. As far as I am concerned, the data is abstract. The question, which I will repeat again in the hopes that you might actually read it, Is there a better way to join a table onto itself instead of the inner joins that I have used?

g.britton SSChampion Points: 13863 More actions · Answer 10

Isn't this the Travelling Salesman problem or a variation thereof? That's an NP-Complete problem. No known polynomial-time solution exists and IMHO none will ever be found. For anything over a tiny problem space, you need to look at approximations. There are plenty of good ones out there. Google is your friend.

Gerald Britton, Pluralsight courses

adams.squared SSCrazy Points: 2675 More actions · Answer 11

Thanks g.britton for the useful response. It'll give me something to read up on.