Nice question. I found that thinking about it led me into all sorts of side tracks (which didn't stop after I'd reached an answer) some of which leave me quite confused and looking for more documentation - which is exactly the efect a question should have to maximise its boost to my learning about new things.
I wonder whether the structures used will introduce performance issues when row length is significantly shorter than the hardware cache line length - and if it will, does that matter. Obviously it doesn't matter much if scans in a particular order are rare and accesses to small clusters of adjacent records are also rare, and both of those conditions tend to be true in (most??) OLTP applications, and besides that it may be that cache line length is most often short enough to eliminate the issue, but was that part of the design philosophy here? The remark somewhere in the documentation that all indexes can be considered as covering seems to indicate that it was, but I don't recall seeing anything to that effect in early documentation on 2014. Actually I don't know what typical cache line lengths are on the processors windows runs on these days so it may be a complete non-issue.