The only thing I might be able to add to it is to see if using the new "APPLY" operator in your join to your Table function will speed things up even more. I have had some cases where this worked much better, again, it all depends.
Your Xml parsing might be sped up by making a Schema for it and then applying that Schema to the incoming Xml. I've even had Procs that could accept multiple versions and different Schemas.
Ex.
CREATE PROC ParseSomething
(
@data Xml
)
AS
Declare @Data_v1 Xml(MySchema_v1)
TRY
Set @Data_v1 = @data
-- At this point, if the Xml does not conform to the Schema,
-- there will be an error or the @Data_v1 will be Null
-- TODO: Do your parsing off the @Data_v1 NOT your original @data
-- this way the Xml parser has to do less thinking now that there is a valid schema
END TRY
CATCH
SELECT
ERROR_NUMBER()
END CATCH