March 19, 2025 at 12:00 am
Comments posted to this topic are about the item Lower Your Attack Surface Area
March 19, 2025 at 3:01 pm
Interesting post! Would Dynamic Data Masking be the tool to use here? I"m assuming not, since the data could be unmasked on the development server. Thus the actual data would still be in two different locations. So, would it be best to just start at anonymizing or building synthetic data?
March 19, 2025 at 3:23 pm
Also, remember that it makes no difference how complex your service account passwords are, if they ultimately get stored in configuration files or source code.
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
March 20, 2025 at 10:18 am
I know from previous experience that it is possible to lock down production. The caveat being that you have to think through your operational needs and invest as much in the design and build of those operational processes as you would for customer facing features. In fact I would go further and say that you need to invest MORE in design and build of operational features. If you don't then you run into the issues where security impedes legitimate need.
With our data pipelines we have a robust set of unit, component and integration tests. We also have security scanning for Docker images. The ability to scan Docker images tilts our design decisions to use containers where possible.
The challenge with patching infrastructure is that something that passed all security scans yesterday can fail with a critical error today as a new vulnerability has been discovered and added to the external vulnerability database. This means that scanning before deployment is not enough.
Our CICD pipeline is Github workflows and we use Renovate to auto-patch our software. Our workflows auto-approve the merging of GitHub branches created as part of Renovate or Dependabot patching but only if all tests including the security scan passes. The auto-merge approach is only possible because of the robustness of those tests.
We still use Github pull requests for changes made by team members. In theory we could auto-approve team member changes when all tests pass but in reality, new features mean new tests and the tests require greater scrutiny given their importance
Deployments to any shared environment can only take place using a CICD pipeline. Only certain people can deploy to higher environments because we need to restrict the GitHub branch from which people can deploy. If this was a greenfield development that would be easy but there are legacy considerations that are hard to mitigate.
MFA is on everything we can put it on. Again, legacy considerations come into play so this isn't 100%.
Observability is something that we are investing a lot of time in. Thought and careful design of logging messages reduces the need to use DB queries to diagnose issues.
March 21, 2025 at 12:36 pm
Interesting post! Would Dynamic Data Masking be the tool to use here? I"m assuming not, since the data could be unmasked on the development server. Thus the actual data would still be in two different locations. So, would it be best to just start at anonymizing or building synthetic data?
DDM doesn't help, as anyone that could get privileged access to the database (dbo-level) can get the data. That's a concern for sure as the attack surface area is still too high (for me).
I would look for anonymized/synth data sets, which are work. Either with a product (less work, more cost) or custom/bespoke (ongoing maintenance and dev/DBA time coding).
March 21, 2025 at 12:37 pm
Also, remember that it makes no difference how complex your service account passwords are, if they ultimately get stored in configuration files or source code.
Not necessarily, especially with modern managed service accounts and service principals we can avoid a lot of this. We can also inject passwords from a pipeline, so they are always in a privileged space. If you do need to store this stuff in a repo, I'd make it a separate, private repo with limited privileges, but if possible, avoid that.
March 21, 2025 at 12:39 pm
I know from previous experience that it is possible to lock down production. The caveat being that you have to think through your operational needs and invest as much in the design and build of those operational processes as you would for customer facing features. In fact I would go further and say that you need to invest MORE in design and build of operational features. If you don't then you run into the issues where security impedes legitimate need.
Agreed. It takes work, and ongoing work, to ensure this remains secure. However, it's mostly a habit. The actual work isn't as much of a burden once you have the knowledge and habit to do the work.
March 21, 2025 at 12:39 pm
I know from previous experience that it is possible to lock down production. The caveat being that you have to think through your operational needs and invest as much in the design and build of those operational processes as you would for customer facing features. In fact I would go further and say that you need to invest MORE in design and build of operational features. If you don't then you run into the issues where security impedes legitimate need.
Agreed. It takes work, and ongoing work, to ensure this remains secure. However, it's mostly a habit. The actual work isn't as much of a burden once you have the knowledge and habit to do the work.
March 24, 2025 at 12:36 pm
Eric M Russell wrote:Also, remember that it makes no difference how complex your service account passwords are, if they ultimately get stored in configuration files or source code.
Not necessarily, especially with modern managed service accounts and service principals we can avoid a lot of this. We can also inject passwords from a pipeline, so they are always in a privileged space. If you do need to store this stuff in a repo, I'd make it a separate, private repo with limited privileges, but if possible, avoid that.
Yes, our organizational standard is to use domain service accounts or at least keep the credentials in Azure KeyVault where the IP address is restricted and only the SQL Agent that runs the job has been granted access.
But I still see people on occasion developing SSIS packages or whatnot using SQL authenticated accounts and the password embedded in the connection string.
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
March 24, 2025 at 4:35 pm
That’s a great question! Dynamic Data Masking (DDM) could help in limiting exposure to sensitive data, but it wouldn’t fully solve the core issue here. Since DDM only applies obfuscation at the query level and the underlying data remains unchanged, developers with the right privileges (or on a development server without masking rules) could still access the original SSNs. This means the same conflicting SSNs would still exist in different locations.
Given this, anonymization or synthetic data generation might be a more robust approach. Anonymization could work if you can irreversibly transform SSNs while maintaining uniqueness for indexing purposes. However, synthetic data might be the best choice if you need a dataset free from real-world conflicts but still structurally valid for testing.
Would love to hear your thoughts—are you leaning toward one approach over the other?
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply