It has been widely publicized that Analysis Services 2005 uses XML for Analysis
1.1 (XMLA) as its only network protocol for communication between client and
server. XMLA is standard public protocol defined by XMLA council and full
description can be found at www.xmla.org.
However, while reading it, it becomes immediately clear, that the standard only
discuss client server communication over HTTP only. Indeed, here is a small
excerpt from the standard - "The specification is built upon the open Internet
standards of HTTP, XML, and SOAP". XMLA uses SOAP and SOAP has standard binding
to HTTP through the use of HTTP headers. However, while AS2005 can be configured
to work over HTTP, the typical way of connecting to AS2005 is directly through
TCP/IP. There is no standard binding of SOAP to TCP/IP in general, and in
particular XMLA standard says nothing about it. So what we are going to do is to
reveal piece by piece and layer by layer what happens on the wire when Analysis
Services client (either OLEDB, ADOMD.NET or AMO) works with Analysis Services
server through TCP/IP. By doing so, we will discover and discuss several
important performance and security tradeoffs which are useful for tuning
performance and for secure maintaining the server as well as for
troubleshooting. I also have to disclose, that the TCP/IP protocol to AS is
patented technology. It is not secret - because patents are public information -
but it cannot be freely copied or reimplemented by 3rd party.
I will structure this article in a form of detective story. Armed with network
monitor, we will intercept the communications and will analyze it. For the
purpose of experiment, we will be studying the response MDX query sent to the
Foodmart 2000 Sales cube migrated to Analysis Services 2005. I am using "MDX
Sample" application which shipped with AS2000 instead of SQL Management Studio
which shipped with AS2005. MDX Sample uses ADOMD which uses OLEDB as opposed to
SQL Management Studio which uses ADOMD.NET. This is actually not the reason why
I need to use MDX Sample, since both OLEDB and ADOMD.NET use exactly same wire
protocol. The reason is that we will need to change some connection string
properties while doing some experiments, and unfortunately it is not possible
with SQL Management Studio.
So let's start by sending the following simple MDX query and watching what will
come on the wire
| SELECT Measures.MEMBERS on COLUMNS FROM Sales |
What we get back doesn't look like XMLA message at all (here and below the
non-printable binary characters are substituted with dot character)
| ....application/sx+xpress...e......d...Uy.8k..P...a:|.>.Oo..+.....M\.h6..a+..........e...p:.
.&..%B..R..9.J7'......F-..z...,..#....y..6X.gx..*.....d.....D....k.m..C...T........_......>z
#.L....4....j...Z....~.c..tl..3...%..H.....RQ......f%.fn....oR.\.%..g.X..o..u.`.}.{....~..~X
.7Is...-...P.5.2... .+Vm'.3.LU.W....&.......{.}.....6h..W..C.PD..2lM-....v<.t....._..f0g..n.
m.?.". .S33.....F.............2(.......6{......'.......`.....2.gCc..I.&..../..l).<....>v.m.O
..d..*.&)...-....M......q6...s./..A....P4..*..K..4.=....c...z...,.Q.b]9U........y'.i.."N}...
noi..K..r.lptl."3......cw...Y..R5.Q.H...1.........#...h..A....28}o.XC..Gu...(...N......s..~.
...{...5|Q0...W..U.i8su..!.J.Qs..K%....*.;9.i3....H......[7...tc......;.......]..U..t.v...;v
.lB...$...(z..D.......&..1...P.5R.of..b.:2..(.....Z1..zfY.n.6+../s}...ND..\.0.UD..p.|.!.....
./fQ.........0.S..N..v->..}..w..<.c"2t.......p.t{..1.}....88.A............&..i8x.z.vDQ.....<
.Q'.D[;HJ.....,...1`d\...i.x...G...E.P.O...?.^..8\.!.!.?..Nr2.ONa.|..JE...B....Q..f...R.1.d8
\.)....H_..Rn#....._..@.k$>.Ad...4'b..>....w.B..t..j.#...../..4mFY.Z/.* -......f.>.7R.~.j.by
9...\......un.._......t...[.P........."..S.3....J....'N.ne!T..@.......G...9.....F....)xA3.v.
.. N..c.....W}.f... G.\Rf..6.m.8.qB$....`...K..I....FV.6*..68..z...~..........!8..Ta.t..'.L.
f.$W....]....6oF..-....S..`6........'b/`W.o<......cQfR#..r..........kLY............d......e.
....I....j=3.ppf..0..!pn..u.....(.+...........=.....qo.tgu....,....e#..[.5-..aY7.........N.!
....M.........B.b`aJ9M|.!6....k..6..i.......6U......x....F...x.-....9.uo....{.g..p.n.m.....'
I..n.[V..;.g<..7.....g.S|..T(YW....4X6bL..]...}...........{.`.1.E.5...(....a....#.Y....OEM".
..e..|.e..*.>.*@nn...w=0..C@...wD...#..5v..8O.[.....uuB..'. i.....j..?.."....W..\..-m...%..x
.c........0.5.u...i..Q..9............vB......P.q=......Y.j..L.j.j...$.C..NU........4P..7b...
..&..;.u.h..WE4T..1.$.Q.<*..m...s1F.........ri18.G.k....F....j..ey0.]...j....V.CUZ.J.O.VE...
..~o...;.>....E#......4. .E.l..,...7..E..........X.:[...8...~5.X.U....p.E.Of..L...q.x..j~.b.
`.....KB..+......B.y^l...>...I<.b&.E.....ED..3..%b.......k.DM"..FB..#I|....u...A5........y..
!....o..:..=g.....(4....7.......#kG...s.$"...~.?..6....9%'FrE.......p..c...F...~....i..|U...
.x...].(..=C.|;u.EL.(.{.;...<..U\ .3...OG....v.;57...:.......&....._g......_pW.......D9.!..#
.....'j..J%.c.Z|........qo...n6[g..Km"...h......G...1.lcsDo#.....o..E.'.Em..E.........F....F
..r...A........AA.R.U!......O....x._..W..k..........e...... t.&E.KBt..U.K....A..E$...2i&...4
.J.\].o......UM...i..E.......h.{..u...JKb....s-....1.G$n...p.|.2_9.;.........%..cB..]..d.p..
...].'..6\K^..].n.]o}.gQ.D..T.W.>g.xYy...3...X...y.{...6;.j3...-i......:...D....E!..5.....3.
......q.AU}..=.....E..Z.T7.[.P..q..Y.......M.[...V..e.... ....N.S.A..TN.......#4.G.A....?......9.9.#..{qS..IB...`B.....@...D.Ps.4.;..M.....$......?4[.X...~..!w@.....!. =DU.h.n
|
First few bytes where it says "application/sx+xpress" look meaningful, but
everything else appears to be complete garbage. We will come back to the
explanation of what the "application/sx+xpress" string means later. For now
let's decipher the rest of the message. One hint into what is happening here is
if you will try to send exactly same query to exactly same cube - you will see
completely different content. This is because what we actually see on the wire -
is encrypted content. That's right, by default AS2005 encrypts (and also
digitally signs) all the network messages. You may have heard about Trustworthy
Computing initiative which swept the entire Microsoft - well this is just one of
the hundreds design changes that were made to the product. For encryption and
digital signing of network messages AS uses encryption algorithm from the SSPI
provider used for network authentication. By default in AS2005 the SSPI package
is "Negotiate", which usually resolves itself to either Kerberos or NTLM,
depending on how network is set up. In both of these cases encryption is done by
symmetric encryption algorithm using the session key which is negotiated during
authentication. The performance impact on both server and client CPUs for doing
encryption/decryption is small - in my experiments it was below 5%, so it is
usually a good choice to continue to have everything encrypted. In some
scenarios, however, when response sizes are large and server is very loaded and
the network is trusted, it may be beneficial to disable encryption or digital
signing or both to improve performance. The level of encryption is controlled by
the standard OLEDB property DBPROP_INIT_PROTECTION_LEVEL. Below is the excerpt
from OLEDB documentation:
Indicates the level of protection of data sent between client and server. This
property applies only to network connections other than RPC connections; these
protection levels are similar to those provided by RPC. One of the following:
DB_PROT_LEVEL_NONE — Performs no authentication of data sent to the server.
DB_PROT_LEVEL_CONNECT — Authenticates only when the client establishes the
connection with the server.
DB_PROT_LEVEL_CALL — Authenticates the source of the data at the beginning
of each request from the client to the server.
DB_PROT_LEVEL_PKT — Authenticates that all data received is from the client.
DB_PROT_LEVEL_PKT_INTEGRITY — Authenticates that all data received is from
the client and that it has not been changed in transit.
DB_PROT_LEVEL_PKT_PRIVACY — Authenticates that all data received is from the
client, that it has not been changed in transit, and protects the privacy of the
data by encrypting it.
In OLEDB connection string as well as in ADOMD.NET this property is mapped to
string "Protection Level". Analysis Services supports the following
values:
"none" - however by default server will not accept non-authenticated
clients, so specifying this value will fail the connection
"connect" - only authentication is done, no encryption or digital signatures
"pkt integrity" - messages are digitally signed but not encrypted
"pkt privacy" - messages are both digitally signed and encrypted
The "call" and "pkt" values are not supported. As we mentioned before,
"Protection Level=pkt_privacy" is the default, but now we will disable it by
explicitly specifying "Protection Level=connect" in the connection string. Now
after executing the very same MDX query, we will get different result.
|
....application/sx+xpress...w:..]..............1...0...U.T.F.-.1.6...)h.T..At..p.:./..s.c.h.
e.m.aX...xH.lH.ox.px.o...*.r.g:.../X.n.v(.l..p8./.......E..................n8.:..k..........
.BX.d.y.........&u..n.........-8.i..rp...f@.-x.o..:..-H.n..lp.sh.s...E..e....8.ux.e.R..s..o8
.sX..................../.5..\.\r..t...........0s....-:..d..ah.).q.....oH........V.n.....I.k.
Q....sH..........w......3(.].2@.0../.X.M.....L.S(.=.-X...tX.n..eo.dh...... o..'.. ..'......l
-@.g...........!.r....NX.mH...aX.........w.I.a.....X...t.#..m.Dx.f(.u...........qx...i..i..d
H....J.k...VE.'........ ..?'.....p..e..T@.pH. ........my...#.........MX.m.b(.r^....s0.q.u(..
...........&+b.........e./......#.#8..._...m..n.Oh.cH.r.............0..a.........u..b .9.dH.
d...p..oH...sW@.-.CX.n....tx........s.k..p........t..i`.S*............H......c...y.#t(...`..
...8.s0.....n.g...........P..q0........DX....`......I.A.v......ux.........,}{.[q@.i.....,...
(..+.Q..Q....T.........'."......e...,..........}s........G..Z.....p......c..._.s..U.......o.
E.7./....gx...0D..Z... .....S..t.......o..cp.0.!.......s.........z........j....[..C.......dX
.c...(...L..&G....U..i..n8....Z[.(.....f.."........4............S@.z..h......s..g8.I.I8.t..3
.x....O0.1O..f.o.....C..y..........W.F...E...7.)..i..L(./.....D8.t..U...2..........T........
..1?.....j..A.1Ts........]..._................3.G._..0k..+...!....B.P....C0.l.......o......:
.......]...!.]...a.?.i.w...........D........w.........9"....O..d....x...#.............+o8...
. .P'......c8.K.X....O'v.............H._.......D..a.(.t..H............ o#.....S)..$......x.%
..?......&..b.. ..Sx..7..O*...'.!.r9w.DG`.8.!..).E`.x.o../..n..l...W..`s..r......5.3H.YPg...
.........\ .W-..@.(."G._.a...@.. 7........).#......*.$......0...//.x.+..%..T...+..[h.e..s..a
.s.]...U?.,..&P..W......E..B(.R...."_...I.Q8.Eh.N.A..EH........dp....a.......C..p-f....+G..-
.'........C..P.(I.Ox.O....L......((..y./.L...w.ZV..L.......Y.uX.H./.)H..O..N............%...
DX..*.......U.0.*?..DX.SX.L..YX.Ih.F..:..=..*.A"....S..C.r8.{.k..[.3....tP.r..8..A.a.......1
.+..?....[...i...M..............j.........&../.............../....&...[..;...+..m...........
m...............c.{.o.&.....................O+.[...4d@.c....+.......'...+............7......
.?.....,..o...i.8!.......O....+...m..u9 ..Q.i./`.+.......&W........Q........I."_......%../..
.o...I. O.........O..+.W.s..+........!......-......Y..{....m..............o.......88....o...
.....+.[...)..A.4rX...+.............................(P........e..........H7.......o.....+.2E
...4yw.. ..e.t..`..2....k......27................2_../....2......o....8.O...N............O.+
.[.G....r......+...."+.......).m./...............#../..O..y.hf....N....j...?.._.+...A....mi.
... ..I.t...#.......[...'..{...]........HDO.._...$........O.....4?..._...?........+.[..e..r
.y0...K.eA.+........X....K.Y...._......(?......#....h7.O../.....}/...........o.+.[...Fr.....
. 8.A5Sp.F.T..+..O.....)[......g..;.).../...Y.X'.......(O..........(.....O........+hw...M...
+........3.........I..].O.....H.......H'../..........t.............&Q.....f0.`.2.,..V.'l....
e...3.-P...AP.U..`..F.(tp.?.4..8....R..Ap.Th.D.....8_%........m......5./...C..L.._..R.......
.w...........6.0....{.1.x.7.1.....@..TX.p.+...8..2.......9.3..M....TeP.b(.r..:.4.&..T.._.Up.
.Cy................._....L(....sv..ifp.....0.;...3...G..CH...A................&s......Q.'...
........U..............o:(m..s.... ..e....'..'..w..3..Y.....2l..H.;.A...*..}]..s.......... \
.. +.(<.)x.....8........YH..Pi.&x.1.9......7x....u.......o.]....d...ch...+. ..s.......N..`..
.........;.....wm..?...dh.].;.........y..........................I............x=/....m.... .
...................'[....;. _...y..g......'.....?....O.Z.. ................?.......f...+...O
......W..y...........c}$l.ov... ?.....G.........../..-....?....h..k......Y............../...
&..TH.pH................y..&............0.Ga.DH...HW;.5....x.<.6...5.7..0@>-.0..TH..jQ.A....
..2.6..,(97.3..0....6q.1r.=.i....u.2...*.5p.68.7p.2..u.2r..[.B.?!r..$`.6..,B.8..18...%@.C3..
.5S........8p.8...=$.4!......k......3..9.....T6h.0..9(.........@..
|
Well, the result is different, but it still doesn't look like XMLA message - it
still looks mostly like garbage. Only in the first row we see something that
remotely resembles the string "1.0 UTF-16 http://schemas.xmlsoap.org", however
the further we move through that string, the more distorted it becomes. For
example, instead of "http:" it looks like "hTAtp:" and "xmlsoap" part is almost
not recognizable. What happens now is that the XMLA message gets compressed
before it is sent over the network. The patented compression algorithm that
Analysis Services uses is an adaptive one - this is why the beginning of the
message still can be read in clear, but quickly the algorithm picks up and the
stream becomes very well compressed. Since XMLA responses can be quite big and
may contain a lot of strings - compression of the messages should improve
performance of the network transmission. This is especially true for the WANs
and other low bandwidth networks. However, compression (and decompression) are
not for free - they do take CPU time. Analysis Services supports 10 levels of
compression which offer tradeoff between size and speed. The compression level
can be controlled by means of connection string property "Compression
Level". Integer values from 0 to 9 are valid compression levels. The default
compression level is 9, and it can be changed at the server by changing the
server config property Network\Responses\CompressionLevel. Note, that the
encryption was done on top of compression and not vice versa. There are good
reasons for that. One reason is to do with performance - because encryption
tends to add entropy to the data, and data with higher entropy is not
compressible as well as data with low entropy. Another reason is to do with
security - it is harder for the attacker to try to break encryption when more
parts of the encrypted message are not known. With XMLA there is a big percent
of the response which can be predicted - all the XMLA standard tags -
<Axis>, <Tuple>, <Cell> etc. When the data is compressed such
attacks are harder to mount. Sometimes it may be useful to turn compression off
completely - for example for troubleshooting - this can be achieved by
connection string property "Transport Compression". The two valid values
are (note that they are case sensitive)
"None" - for disabling compression
"Compressed" - for enabling compression
"Default" - for default behavior (same as "Compressed" in this version)
It is important to note here, that everything that we said about compression,
applies by default only to server responses, but doesn't apply to client
requests. By default no matter what properties are set - the requests are always
uncompressed. The reasoning for such behavior is because of the observation that
requests are typically small (just the content of MDX query) and responses are
large. But why not to compress the requests anyway - even if they are small -
what's the harm in it - you may ask. Here we need to be reminded about
Trustworthy Computing again. The server is designed to be robust and withstand
hackers attacks to break it. One of the common techniques to achieve this goal
is called Reducing Surface Area of Attack. By default server simply doesn't
even accept requests which are compressed, therefore if somebody will find
security hole in the decompression algorithm - he won't be able to exploit it to
attack the server. Of course we hope that no such hole would be ever found, but
by disabling decompression code on the server we create additional layer of
defense. The assumption that requests are small is usually a correct one,
however, there are few scenarios when requests can become quite big. The most
important scenarios probably are
Pipeline processing - when Integration Services pipeline pushes data to
Analysis Services either for cube processing or mining model training
Massive cell writeback - when client application updates a lot of cells and
sends them to the server
Frequent creation of complex UDMs - when XML DDL which defines UDM becomes
very big and is sent to the server frequently
Certain data mining scenarios - again when there is a lot of data to be
pushed to the server
If you ever run into this or similar situation, you can always change the
default server behavior by changing the following server config property
"Network\Requests\EnableCompression" from the default value 0 to value 1. Now to
continue our investigation of AS protocol let's connect with connection string
"Protection Level=connect;Transport Compression=None" and again execute our MDX
query. This time the response is notably bigger since no compression was applied
to it. Therefore we will only paste here the first few bytes
|
....application/sx.........1...0...U.T.F.-.1.6...)h.t.t.p.:././.s.c.h.e.m.a.s...x.m.l.s.o.a.
p...o.r.g./.s.o.a.p./.e.n.v.e.l.o.p.e./...s.o.a.p...E.n.v.e.l.o.p.e..........x.m.l.n.s.:.s.o
.a.p........)h.t.t.p.:././.s.c.h.e.m.a.s...x.m.l.s.o.a.p...o.r.g./.s.o.a.p./.e.n.v.e.l.o.p.e
./...B.o.d.y.........&u.r.n.:.s.c.h.e.m.a.s.-.m.i.c.r.o.s.o.f.t.-.c.o.m.:.x.m.l.-.a.n.a.l.y.
s.i.s...E.x.e.c.u.t.e.R.e.s.p.o.n.s.e.........x.m.l.n.s........&u.r.n.:.s.c.h.e.m.a.s.-.m.i.
c.r.o.s.o.f.t.-.c.o.m.:.x.m.l.-.a.n.a.l.y.s.i.s...r.e.t.u.r.n.........0u.r.n.:.s.c.h.e.m.a.s
.-.m.i.c.r.o.s.o.f.t.-.c.o.m.:.x.m.l.-.a.n.a.l.y.s.i.s.:.m.d.d.a.t.a.s.e.t...r.o.o.t........
...0u.r.n.:.s.c.h.e.m.a.s.-.m.i.c.r.o.s.o.f.t.-.c.o.m.:.x.m.l.-.a.n.a.l.y.s.i.s.:.m.d.d.a.t.
a.s.e.t...x.m.l.n.s.:.x.s.i........)h.t.t.p.:././.w.w.w...w.3...o.r.g./.2.0.0.1./.X.M.L.S.c.
h.e.m.a.-.i.n.s.t.a.n.c.e...x.m.l.n.s.:.x.s.d........ h.t.t.p.:././.w.w.w...w.3...o.r.g./.2.
0.0.1./.X.M.L.S.c.h.e.m.a... h.t.t.p.:././.w.w.w...w.3...o.r.g./.2.0.0.1./.X.M.L.S.c.h.e.m.a
...x.s...s.c.h.e.m.a..........t.a.r.g.e.t.N.a.m.e.s.p.a.c.e........0u.r.n.:.s.c.h.e.m.a.s.-.
m.i.c.r.o.s.o.f.t.-.c.o.m.:.x.m.l.-.a.n.a.l.y.s.i.s.:.m.d.d.a.t.a.s.e.t...e.l.e.m.e.n.t.F.o.
r.m.D.e.f.a.u.l.t.........q.u.a.l.i.f.i.e.d....0u.r.n.:.s.c.h.e.m.a.s.-.m.i.c.r.o.s.o.f.t.-.
c.o.m.:.x.m.l.-.a.n.a.l.y.s.i.s.:.m.d.d.a.t.a.s.e.t...x.m.l.n.s.:.x.s........ h.t.t.p.:././.
w.w.w...w.3...o.r.g./.2.0.0.1./.X.M.L.S.c.h.e.m.a....c.o.m.p.l.e.x.T.y.p.e.........n.a.m.e..
........M.e.m.b.e.r.T.y.p.e....s.e.q.u.e.n.c.e.........a.n.y.........n.a.m.e.s.p.a.c.e......
...#.#.t.a.r.g.e.t.N.a.m.e.s.p.a.c.e...m.i.n.O.c.c.u.r.s.........0...m.a.x.O.c.c.u.r.s......
...u.n.b.o.u.n.d.e.d...p.r.o.c.e.s.s.C.o.n.t.e.n.t.s.........s.k.i.p......a.t.t.r.i.b.u.t.e.
..........H.i.e.r.a.r.c.h.y...t.y.p.e.........x.s.:.s.t.r.i.n.g..........P.r.o.p.T.y.p.e....
..e.l.e.m.e.n.t...........D.e.f.a.u.l.t.....0..........n.a.m.e.....x.s.:.s.t.r.i.n.g...u.s.e
.........r.e.q.u.i.r.e.d.........t.y.p.e.....x.s.:.Q.N.a.m.e..........T.u.p.l.e.T.y.p.e.....
.....M.e.m.b.e.r......M.e.m.b.e.r.T.y.p.e.....u.n.b.o.u.n.d.e.d...........M.e.m.b.e.r.s.T.y.
p.e..........M.e.m.b.e.r......M.e.m.b.e.r.T.y.p.e.....0.....u.n.b.o.u.n.d.e.d..........H.i.e
.r.a.r.c.h.y.....x.s.:.s.t.r.i.n.g.....r.e.q.u.i.r.e.d...........T.u.p.l.e.s.T.y.p.e........
..T.u.p.l.e.....T.u.p.l.e.T.y.p.e.....0.....u.n.b.o.u.n.d.e.d.......g.r.o.u.p.... ......S.e.
t.T.y.p.e....c.h.o.i.c.e....!........M.e.m.b.e.r.s.....M.e.m.b.e.r.s.T.y.p.e.........T.u.p.l
.e.s......T.u.p.l.e.s.T.y.p.e.........C.r.o.s.s.P.r.o.d.u.c.t.....S.e.t.L.i.s.t.T.y.p.e.....
....U.n.i.o.n........r.e.f...."....S.e.t.T.y.p.e.....0.....u.n.b.o.u.n.d.e.d.............S.e
.t.L.i.s.t.T.y.p.e........S.e.t.T.y.p.e.....0.....u.n.b.o.u.n.d.e.d.........S.i.z.e.....x.s.
:.u.n.s.i.g.n.e.d.I.n.t..........O.l.a.p.I.n.f.o..........C.u.b.e.I.n.f.o............C.u.b.e
.....u.n.b.o.u.n.d.e.d............C.u.b.e.N.a.m.e.....x.s.:.s.t.r.i.n.g.........L.a.s.t.D.a.
t.a.U.p.d.a.t.e.....0.....x.s.:.d.a.t.e.T.i.m.e.........L.a.s.t.S.c.h.e.m.a.U.p.d.a.t.e.....
0.....x.s.:.d.a.t.e.T.i.m.e...............A.x.e.s.I.n.f.o............A.x.i.s.I.n.f.o.....u.n
.b.o.u.n.d.e.d............H.i.e.r.a.r.c.h.y.I.n.f.o.....0.....u.n.b.o.u.n.d.e.d............#
|
Now this looks much more human readable then before. The reason for the dot
after every character is because all the strings are in Unicode, and therefore
their second byte is NULL. But otherwise what we see here looks very much like
SOAP Envelope with the content looking like XSD schema for the MDDataSet result
- just like in the XMLA 1.1 spec. However, there is still one strange thing
about this result. We would expect XML, which means familiar < and >
brackets around tags - and they are nowhere to be seen. To make things even more
interesting, let's take a look at the end of the message, where we should have
CellData section containing actual cell values.
|
.......C.e.l.l.D.a.t.a.....;.5..C.e.l.l.....<.6...5.7..0...-.....TH.A......2.6.6.,.7.7.3...0
.0....6.7..1...-.=.i....A......2.2.5.,.6.2.7...2.3....6.7..2...-..[.B.?!A.....$.5.6.5.,.2.3.
8...1.3....6.7..3...-.5S...........8.6.8.3.7....6.7..4...-.....k..A......3.3.9.,.6.1.0...9.0
..........
|
This is even more interesting - note only < and > are missing, but all the
XML tags such as <Cell>, <Value>, <FmtValue> as well as
attributes such as <Ordinal> - everything is missing. There is single
CellData tag and single Cell tag, but there are no closing tags. And the result
contains some binary characters as well. All this is pretty strange, after all
we removed all the layers which could add binary content - such as compression
and encryption - so we should get clear XML now. Well, we kind of got XML here
(or, more precisely, we got XML Infoset). There are many different encoding for
XML, two most popular being UTF-8 and UTF-16. However, there are more encodings,
and there is no reason why someone wouldn't come up with new encoding. So this
is exactly what AS does. By default it uses proprietary encoding called SX. It
is exactly same encoding that SQL Server uses when it communicates with its
OLEDB driver when it sends results for the SQL SELECT statements which use "FOR
XML" clause. I have heard people calling it sometimes "binary XML", but I really
don't like this term. There is a lot of confusion around this term with many
different people calling many different things "binary XML" - now W3 standard
body has a special working group to study "binary XML". Analysis Services simply
uses encoding called SX to encode XML. The way SX works is maintains a
dictionary for things like XML tags, namespaces attribute names etc - so the
actual string would appear only once, and after that it is encoded in the
dictionary. Closing tags are never emitted, because of the symmetry property in
XML. These things mostly help with the size of the message, but what perhaps is
even more important is that SX encoding is XSD schema aware which means that
elements are stored in their native data type. To explain this with example -
suppose we have element with name Value and type double. In UTF-8 encoding it
would look something like
|
<Value xsi:type="xsd:double">3.3961089640000253E5</Value>
|
As you can see the actual element value has been converted to (rather lenghty)
string. However with SX encoding this example would look like just 8 bytes
holding the actual binary representation of the value. This goes beyond just
saving space - it actually saves CPU time and memory copying - since none of the
cell values need to be converted to strings for sending out into request - they
are just sent in their native representation. The data type awareness is, in my
opinion, the most important characteristics of SX encoding. Of course if the
data type of the element is string - then it will be a string. SX encoding only
works with Unicode strings, which is natural fit for AS, since it also keeps all
of its strings in Unicode. It is possible to control which encoding AS will use
by means of connection string property "Protocol Format". It can be one
of the following values
"XML" - UTF-8 encoding is used
"Binary" - SX encoding is used (the value name is unfortunately confusing)
"Default" - the default behavior (same as "Binary" in this version)
The same comment about responses vs. requests which was done in the compression
section also applies here. By default server doesn't accept requests in SX
encoding - but pretty much in the same cases where you would want to enable
compression on requests you would want to enable SX encoding on requests as
well. This is done by changing server config property with misleading name
Network\Requests\EnableBinaryXML from default value 0 to value 1. We can now
connect again to the server this time using new connection string "Protection
Level=connect;Transport Compression=None;Protocol Format=XML" and send our
query. This time the result looks like following (middle section skipped for
brevity):
................text/xml<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/
"><soap:Body><ExecuteResponse xmlns="urn:schemas-microsoft-com:xml-analysis"><return><root x
mlns="urn:schemas-microsoft-com:xml-analysis:mddataset" xmlns:xsi="http://www.w3.org/2001/XM
LSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><xs:schema targetNamespace="u
rn:schemas-microsoft-com:xml-analysis:mddataset" elementFormDefault="qualified" xmlns="urn:s
chemas-microsoft-com:xml-analysis:mddataset" xmlns:xs="http://www.w3.org/2001/XMLSchema"><xs
:complexType name="MemberType"><xs:sequence><xs:any namespace="##targetNamespace" minOccurs=
"0" maxOccurs="unbounded" processContents="skip"/></xs:sequence><xs:attribute name="Hierarch
y" type="xs:string"/></xs:complexType><xs:complexType name="PropType"><xs:sequence><xs:eleme
nt name="Default" minOccurs="0"/></xs:sequence><xs:attribute name="name" type="xs:string" us
e="required"/><xs:attribute name="type" type="xs:QName"/></xs:complexType><xs:complexType na
me="TupleType"><xs:sequence><xs:element name="Member" type="MemberType" maxOccurs="unbounded
"/></xs:sequence></xs:complexType><xs:complexType name="MembersType"><xs:sequence><xs:elemen
t name="Member" type="MemberType" minOccurs="0" maxOccurs="unbounded"/></xs:sequence><xs:att
ribute name="Hierarchy" type="xs:string" use="required"/></xs:complexType><xs:complexType na
me="TuplesType"><xs:sequence><xs:element name="Tuple" type="TupleType" minOccurs="0" maxOccu
*** skipped ***
Caption>All Store Size in SQFT</Caption><LName>[Store Size in SQFT].[(All)]</LName><LNum>0</
LNum><DisplayInfo>1000</DisplayInfo></Member><Member Hierarchy="[Store Type]"><UName>[Store
Type].[All Store Type]</UName><Caption>All Store Type</Caption><LName>[Store Type].[(All)]</
LName><LNum>0</LNum><DisplayInfo>1000</DisplayInfo></Member></Tuple></Tuples></Axis></Axes><
CellData><Cell CellOrdinal="0"><Value xsi:type="xsd:double">2.66773E5</Value><FmtValue>266,7
73.00</FmtValue></Cell><Cell CellOrdinal="1"><Value xsi:type="xsd:double">2.2562723359999983
E5</Value><FmtValue>225,627.23</FmtValue></Cell><Cell CellOrdinal="2"><Value xsi:type="xsd:d
ouble">5.652381299999928E5</Value><FmtValue>$565,238.13</FmtValue></Cell><Cell CellOrdinal="
3"><Value xsi:type="xsd:long">86837</Value><FmtValue>86837</FmtValue></Cell><Cell CellOrdina
l="4"><Value xsi:type="xsd:double">3.3961089640000253E5</Value><FmtValue>339,610.90</FmtValu
e></Cell></CellData></root></return></ExecuteResponse></soap:Body></soap:Envelope>
|
This now looks very familiar - it is a classic XMLA response in MDDataSet format
and UTF-8 XML encoding. One interesting thing to discuss between SX and UTF-8
encodings is that in SX, as we noted above, all the strings are in Unicode,
therefore taking twice as much space as those in UTF-8. So it may seem that
UTF-8 could be a better encoding, especially if there are a lot of strings in
the resultset, since it will be smaller. In practice. however, since AS keeps
all of its strings in Unicode, when UTF-8 is used - it spends CPU time to
convert from Unicode before sending, and back to Unicode upon receiving. And
since by default the message is being compressed - all these extra NULL bytes
get eliminated anyway.
So now that we uncovered all of the layers, let's discuss the actual binding of
XMLA message to the TCP/IP protocol. We know that for HTTP this is done through
HTTP headers, and there is no standard binding of SOAP to TCP/IP. Actually, at
some point (around year 2000 when we started Yukon), there was an attempt to
standardize SOAP bindings to TCP/IP through protocol called DIME. You can find
description of DIME here - http://msdn.microsoft.com/msdnmag/issues/02/12/DIME/default.aspx
and the formal specification here - http://www.gotdotnet.com/team/xml_wsspecs/dime/draft-nielsen-dime-01.txt.
This spec never made it into real standard and expired (in favor of
WS-Attachments I suppose), but Analysis Services successfully adopted it - this
simple and lightweight protocol really fit all the needs without unnecessary
overhead. So now it should be clear what are these few bytes at the beginning of
each response - it is the DIME header, and the strings that we saw at the
beginning are DIME TYPEs. This is why compressed and SX encoded message had DIME
type "application/sx+xpress", but when we removed compression it became
"application/sx", and finally in the clear text encoding it was "text/xml".
In the closing I summarize that we saw some of the details of what exactly goes
on the wire when client connects to the server over TCP/IP, we learned several
connection string properties and server config properties which affect the data
format, we discussed security and performance implications of such changes and
justifications for the default values. Please remember, that all the discussion
was about TCP/IP protocol, and when Analysis Services is set up over HTTP - the
picture is different, although there are some similarities in the stack -
perhaps separate article is warranted to discuss what happens exactly with HTTP.
Also note that this information applies to the versions of OLEDB and ADOMD.NET
which shipped together with Analysis Services 2005. You may know that there is
version of ADOMD.NET which shipped about year and half ago on the Web. It can
connect to the AS2005 server, and it can take advantage of some layers discussed
here (such as encryption), but it doesn't have support for other layers
(compression and SX encoding) therefore when AS2005 server detects that it
works against that early version of ADOMD.NET, it downgrades protocol to
text/xml. Therefore for better performance it is recommended to upgrade
ADOMD.NET to the newer version.
Hopefully armed with all the information in this article, the Analysis Services
protocol will become demystified for you and you will be able to apply this
knowledge in the real-world settings.
More information and article updates can be found here:
Mosha Pasumansky - Microsoft OLAP - Analysis Services
Analysis Services 2005 protocol - XMLA over TCP/IP