July 1, 2025 at 8:29 am
Dears,
Hope this message finds you well
We have a on-premises transactional database which is heavily used for transactions.
Issue : We are creating a chatbot which will query the database so that users can make questions in natural language and get replies in SQL. This is all done by the AI, But for that, the AI needs to query the database directrly or, ADF extracts the information into somewhere in Json and then put the questions to the Json. Anyway
In one way or the other, we dont want to work with the transaction SQL Server database and we don't want to allow connections to it via the chatbot or even via the data factory.
We were requested to use azure SQL DB and to connect diretly to it. instead of into the master database (the true transactional on-prem DB).
Can Someone please help me? Thank you very much
July 1, 2025 at 9:19 am
I doubt it is ready for production use yet but, depending on your timelines, you might want to look at Mirroring for SQL Server in Microsoft Fabric (Preview)
July 1, 2025 at 11:40 am
If you're running in Azure SQL Database, there are two mechanisms that can allow you to set up a read only secondary that you can then point the AI to. First is Geo-Replication. Or, there's the Read Scale Out mechanism within the Business and Enterprise tiers. Either of these will do what you're looking for.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
July 1, 2025 at 12:02 pm
Hello Grant,
Thank you very much for your reply. the problem is that the "Primary database" is not in Azure. Its on Premises. I have the impression that the solutions you are suggesting is only possible if the primary DB is already in Azure?
July 1, 2025 at 12:43 pm
This shows how to connect to a local SQL Server using ADF:
Tutorial: How to access on-premises SQL Server from Data Factory Managed VNet using Private Endpoint
Presumably you would then need to setup CDC to know what data to collect.
The Fabric option might be easier to get going and could be more useful if you are doing analytics and AI. I am not convinced Fabric is ready for production use at present but Microsoft are throwing a lot of money at it.
July 1, 2025 at 2:35 pm
But can CDC be done if the tables don't have time stamps? Fabric is not ready for Production, hence it may not be a good option. I was trying to understand the options provide by Grant (geo replication and Read Scale out) but I have the impression that those do not work if the "master Database " is on-prem?
July 4, 2025 at 1:16 pm
This was removed by the editor as SPAM
July 4, 2025 at 1:18 pm
This was removed by the editor as SPAM
July 4, 2025 at 6:56 pm
Do you need real-time data in the secondary database? If not, you could have an SSIS job that pulls data out of the primary and pushes it to the AI database on a schedule. I imagine you'd still want something like CDC unless you have a nice way to track new/changed data going into the system and only sync that data. If you don't have a mechanism in place for tracking what is new or changed data, then SSIS will get messy as you either need to do a truncate and reload (which will get slower and require more memory as the data grows) OR take a merge approach (either MERGE command or INSERT and UPDATE approach, which will also get slower and require more memory as the data grows). Truncate and reload results in a window of time where the AI bot will have no data (which may be OK or may be disaster). The merge approach keeps the data available but I have found is often slower for large data sets.
Service broker is another option if you have a secondary instance/database to push data to as it is an eventual consistency tool for SQL but it relies on triggers on the tables you want to sync and does come with a bit of overhead both on the SQL side and on the admin side.
Now a different question, how often will the AI bot query database data? The reason I ask this is you may be able to get by with having the bot query your main data if you set the isolation level correct for the bot so that it doesn't cause blocking or that reduces blocking. Or you may only have a few, light queries hitting your DB and the AI bot impact will be minimal. You could have the queries issued by the AI set the isolation level to READ UNCOMMITTED which would mean it doesn't need a shared lock or care about exclusive locks while it does the SELECTs, but has the drawback that it reads uncommitted data. I wouldn't apply this to the database as a whole, but it might make sense for the AI bot queries. This is also a cost saving approach because anything in Azure or spinning up a new instance or database comes with additional cost (actual money which would be monthly/yearly with anything Azure, disk space due to data duplication, CPU, network I/O, disk I/O). Keeping it all in the main system may only impact CPU and disk I/O and a little bit of network I/O but a lot less since you are not needing to duplicate the data.
I would recommend you start by doing a proof of concept with a subset of the production data on a test system and see the impact the AI bot has on the database. Then simulate a realistic load on the test DB without the AI bot connected to get some performance benchmarks. Next run the realistic load on the test DB with the AI bot connected to it and see how badly the AI bot hurts performance. Then extrapolate that data and you should have an estimate for how it'll impact production. Use that assessment to determine if the data duplication is worth the effort, cost, and administrative overhead. You may be surprised by how big or little impact the AI bot has on your system and your boss will likely appreciate your cost-savings approach to getting the AI bot set up.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
Viewing 10 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply