You're misunderstanding CI and ML.
CI is designed to catch these things. Developers do not magically write better code with CI. This is < 1 bug/month per dev. Shocking? I'd say that's pretty good. CI is designed to evaluate the code, according to the rules/tests/Static code analysis, and do it completely. Developers should do this, but they're human, and across 47k people, each making changes and compiling most days, they'll forget things. CI is independent validation and verification of code on another machine. I'm stunned it's 30k bugs. I'd have expected 100k+/month, maybe more like 470k for 47k devs.
Likely, there are more bugs caught, but these are the security bugs, not all bugs. There are also bugs that slip through, but the unit testing and ML evolve to keep getting better at catching things.
The ML isn't written. What happens is someone looks at the code that produces various security bugs, say a buffer overrun. They find the C++ (C#) that doesn't end check a parameter or something else. They then set up some extraction for this as a factor. Think this:
some_code.substring() == [poor parameter checking]
This gives them factors. They have lots of these, then they feed these through an ML algorithm that looks for patterns. There are real developers that evaluate code and give the algorithm some idea of what actually produces a bug and what doesn't. Over time, this trains the model (lots of repetitions here).
This is the same way people learn about buts. Imagine you saw declare @i varchar a few times. At some point, you're realize this a bug and stop doing it (or launch a pork chop at a dev). The ML is teaching the system to find this and automatically raise a flag.
Over time, they get an ML model (not an algorithm), that is good at recognizing the code that does and does not cause issues. They deploy this, scan code, and have humans verify results. Eventually, they turn the algorithm loose and let it notify developers by failing CI with these bugs highlighted.
In the real world, periodically, or continually, humans randomly sample the ML results (both hits and misses) and try to see if the model needs to be retrained. It's a boring evaluate, set new factors, rerun model against training data, re-run against sample data, review, repeat. This is what data scientists do, and it's a lot manual analysis of samples, then letting ML run against small samples, then large samples. It's, IMHO, a boring job for really, really detailed oriented numbers scientists.
The goal is to do it right the first time, but there are pressures from business to get things done, spend resources on new things, and a bunch of other real world software pressures. The quality has gone way up. It's not perfect, never will be, but things like ML here to prevent simple mistakes. The bar keeps being raised. Not where you want it, and certainly the focus on features is not something addressed here.
That is orthogonal to quality. Spending more time on a feature isn't a quality decision. Ex. String_split. MS thinks it works fine and meets the spec. Many of us think it's woefully incomplete. From a quality standpoint, string_split() works. From a customer standpoint, it doesn't