• EDIT: Removed this approach as performance was terrible

    I did a quick and dirty scale out of the test data (adding copies of the test set for other masterid's), and the first solution I posted looks quicker than the original and far quicker than the attempt I deleted here.

    None of them scale all that well though. At 100x the size of the test sample my query took 3 seconds and the original 8 on my home laptop. At 1000x my query took 1:36 to 2:31 for the original.