December 21, 2020 at 6:45 pm
In the project I work on, the HMI panel logs data instantly as a csv file.
1-) My main question is; The reason for using this csv format is that if the server is cut off momentarily or for a while, not to experience data loss at that point, can I do a retrospective scan of the csv file via sql or c # and save it to the table in the database? My file folder will contain the data of these missing dates in CSV format.
2-) Can I create a separate table for each machine and each day that I will need to store a very large amount of data, and can it cause problems at the sql point?
3-) Do I have a chance to get these data in an instant loop with sql? Now I can pull all data with the bulk insert query code, but I want this to be provided automatically.
c#
December 22, 2020 at 2:35 pm
To answer your questions:
1 - SQL (via SSIS) or C# can pull data from a CSV file (and multiple other formats) into SQL Server
2 - You can create as many tables as you see fit. There are limits to the size a database file can be (a database can have multiple database files), but as long as you have disk space to store the data, you should be fine. You can also turn on compression to save some space at the cost of slightly more CPU.
3 - instant loop with SQL is possible, but it is not recommended and can lead to problems if you don't code it carefully.
Since you own the code for the HMI panel and how it does its logging (I presume), my approach would be to build this into the application layer. Have the application write to the SQL database as a "primary" source and have a CSV file as a backup for if the database writes fail. Then you have a single SQL job that runs on startup of the SQL instance (and can be run manually in the event the instance didn't stop, but the network had issues for example) that pulls all data from the CSV into SQL then deletes the content of the CSV. Have SQL as your primary location for logging and CSV as the secondary rather than duplicating the data - once to CSV and once to SQL Server.
One risk I see with your approach of writing to CSV and pulling that into SQL is duplicate or missing data. What I mean is lets say at 8:30:00.00000 you start pulling data out of the CSV into SQL. Your options here are to either lock the CSV file and then the HMI panel can't access it either OR leave it in shared access mode and the contents will change while SQL is reading the file. What happens if SQL finishes the import at 8:32:00.00000 and at 8:32:01.00000 it removes all data from the CSV (to make future imports faster, so you don't need to duplicate as much data) BUT at 8:32:00.50000 the HMI panel put another log entry in. That last log entry would be lost. Alternately, if you don't clear out the log file when you finish an import, you are now either pulling in duplicate data OR are needing to do sanity checks on each piece of data coming in to make sure that you are not putting duplicate data in the database. And the larger the CSV file gets, the slower that import is going to become.
With my approach, you could even have the logging service fire off the job that pulls data from the CSV and dumps it to the database. And depending on how instant the logs need to be and how many logs per second you are getting, you could buffer up several logs in memory before dumping them to the database. With my approach the downside is that you will have a delay when it decides it needs to switch to CSV mode rather than SQL mode as you would be waiting for the SQL command to time out (30 seconds by default I believe in .NET, but this is configurable). But if you have your HMI panel code multithreaded, you could have a thread for log writes so end users wouldn't see the delay - you just have the main thread spawn a child asynchronously and the main thread does not need to wait for the child to to return a value. The child would attempt to write to the SQL log and if it fails it writes to a CSV file. Then you have another thread that does CSV file checking and cleanup - if a CSV file exists, it attempts to start the job on the SQL side and if it fails, send out an email alert to the administrator of the system. Or you may want it to fail 5 times before sending the email (for example).
That would be my approach. I do not know your system and it MAY be that there are things in your system that do not allow this (such as the HMI panel code being closed source). But a question I have for you is if the SQL instance goes down, should the HMI panel software continue to run? With a website that has a SQL Server on the back end for example, if SQL goes down, the website goes down. This is by design to ensure data integrity plus the website usually requires some data from the SQL Server. In your case, the logging is the only thing going into the database and no reads are coming from your tool (correct?) in which case having the database down MAY be an acceptable state.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
December 23, 2020 at 8:20 pm
First of all, thank you very much for your wonderful explanations and ideas.
1-) First of all, I would like to start by adding that my HMI panel logs the data in "DTL" format and then converts the program into "CSV" format if I run it. If I can pull the "DTL" format directly to SQL database without running that "CSV" converting program, my whole problem will be solved. Do you have any information about this?
2-) As an answer to your question, in case of my SQL database crash, the HMI panel can continue to log data, because this panel only works in a separate place that runs my machine, and since the SQL database will be on a computer or cloud, there may be a crash independent from the HMI panel.
3-) At the point of your approach, it really made sense, I have the possibility to ignore minimum data loss. Based on your approach, I came to a thought and would like to consult you. I designed a windows form application with c # myself and I can provide data extraction to my SQL database through this form application, I can continue this data extraction process as long as my computer does not crash, in case of a crash if the form application close, I can send information to the HMI panel and the panel can start logging data in CSV format with the information at that point. If a file has been created in CSV format by making a query on it, can I perform the extraction process?
4-) How can I update the data I captured with my C # form application on the same row instead of writing in separate row.
5-) And finally how can I create a folder as a database in SQL for each machine automatically and how can I create each table for each day, Is it possible automatically?
Again and again thanks a lot for your deep explanations, Sincerely
December 24, 2020 at 2:59 pm
I'll answer things in order.
1:
Honestly, I am not certain what DTL format is, but if you need another tool to convert that to CSV, then it really depends on if SQL can handle the DTL format. I mean on one hand, you could import the data in VARBINARY(MAX) format and just have the raw DTL's in there, but then you lose the benefit of having it in the database to begin with - querying against it.
Now, if DTL is just a fancy plain text format and converting to CSV is as simple as running a "replace" on it, SQL can handle that pretty easily.
2:
That is kind of what I thought. If you have the source code behind the HMI panel, my approach would be to tackle this by making the HMI panel code multi-threaded and write to SQL every time and IF it fails to write to SQL, then you'd have a CATCH block in the code that writes to CSV.
3:
Having a separate application for writing to the database would work too, but it really depends on how you would want to handle that. If the HMI panel writes to the CSV (or DTL) file and your application reads from the file and writes to the database, you'd have to be care of file locking and ensuring that you are not writing duplicate values to the database when you re-start the application after a crash. You also run into the problem of duplicate data (database + source file) which results in extra disk space used.
The advantage of this approach though is that your SQL writer application doesn't need to live inside the HMI panel server or the database server, but can run independently. So updates to the SQL writer system would not impact any other system. If I was going this route, I would do it as a windows/linux service rather than a stand alone application. The reason for this is that at least with windows services (I am not certain on linux, but likely) you can restart it on crash AND you can have it auto-start on system start without needing to log in or touch the computer at all. It can be fully automated.
4:
This one becomes a bit more tricky, but falls under "it depends". My understanding was that this was a log. Generally once a record is written to a log, it remains unchangeable, so I would expect you DON'T update the row, but instead insert a new one. if this is not your intent, then you'd just need some way to identify which row should be changed and issue an UPDATE command instead of an INSERT.
5:
Personally, I wouldn't create a new database for each machine and I wouldn't create a new table for each day as that means the application user needs to have a LOT of permissions added to the database and increases the risk of the database being compromised. My approach would be to have a column in the database for the current date and time (which I'd probably call something like RecordTimestamp), a column for the computer name, a column for the IP address, a column for the user logged into the system, etc. Basically a column for each piece of data you MAY want to do lookups on that is NOT captured by the HMI panel log. Mind you, I'd probably end up normalizing some of that data to reduce the size per row in the database. Like the computer name and user name can both reside in their own tables and just have lookups (via foreign keys) back from the data table. Depending on the data stored in the HMI log, it may make sense to normalize some of that too.
SQL can handle a LOT of transactions per second so I expect that your requirements for it won't be that hard on the server if it is tuned well (good indexes, properly tuned stored procedures, etc). I have 1 table on one of my systems that gets roughly 70,000 records inserted per day in an 8 hour work-day window and it is far from being overloaded. And I have 2 tables on that one database that have that same load, and larger loads on 2 other databases. That one system is our service broker system and it is our message broker system. The main system sends messages to the broker that handles passing the messages out to other systems.
I am not sure what your workload looks like, or what your HMI system logs, but I expect SQL can handle it.
One thing to keep in mind is if your SQL edition is "Express", you have a 10 GB limit per database. This may not be a problem for years or it could be a problem pretty quickly. I don't know what your data looks like. If you are using Standard/Enterprise, then this won't be a concern.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
December 25, 2020 at 8:03 am
Again, thanks a lot for your deep explanations, sir.
1:
I tried it but I think it is not possible in SQL. It should convert DTL to CSV then I can get it into the SQL table.
2:
Catch block is make sense for me can I make comparison and according to this comparison can I get missing data with this catch block.
4:
I did it for only one rows because You understand me wrong, this one row data getting is only for the real-time reading for example the temperature of my machine. But in this situtation I need to right click and click Execute SQL button in SQL server or with this query
SELECT TOP 1 [Register1]
,[Register2],[Register3],[Register4],[Register5]
From [HMI].[dbo].[KAYITLAR] to read datas while I was sending data from my C# application. But I don't want to execute sql in manually I want to see real-time changes in SQL table automatically. Because if I clicked execute sql button or with query I can see change of these register values, so my C# application work correctly. How can I see this change of temperature in my SQL table row automatically.
5:
I agree with you I won't create each table or database for each machine, because all of them will have spesific machine name or ID number, so I can control them easily.
I think my workload is not more than yours. So your explanations are very well for me, now my only concern is how can I make comparison in SQL to get old data that could not get the cause of the Laptop dead. And how can I make all queries automatically to real-time watching.
December 30, 2020 at 2:32 pm
That sucks for point number 1. So you will need to dump it to DTL then convert to CSV then import to SQL which means 3 copies of the data (bleh). In this case, I would go for a scheduled approach rather than real time. Have a SQL job that calls a powershell script which does the conversion from DTL to CSV and once that completes, it calls an SSIS package to pull the CSV into a SQL table. And repeat for each new table you need
For point number 2 - It is your code. You can have it work however you like. for example, the catch block could write to CSV and store the values in memory as well. Then when SQL comes back up, it writes everything in memory back to SQL and exits the try-catch block.
Point 4 (point 3 was missed): I am not sure what you are trying to do. SQL is just a container for your data; it is not a presentation layer for your data. SSMS will give you point-in-time view of your data, but it won't give you a constantly updating view of your data. If you want real-time constant updates on your data, you will need some reporting framework behind it that allows for constant, realtime data pulls OR to develop your own.
Also, something to note about your query - TOP 1 without an ORDER BY will give you a single row, but you don't know which row. It MIGHT be the most recently added row, or it could be the least recently added row, or any row in-between. If you are using TOP, you should ALWAYS use ORDER BY.
So if you need real-time, constant data pulls from SQL, you technically can do it in SSMS, but it is a messy WHILE (1=1) loop with the SELECT inside. BUT there is no way to clear the results between each SELECT, so you will end up with a result window that scrolls on indefilitely.
Now, if you have a tool such as RedGate SQL Monitor, you could put that in as a custom metric and it can query the table once per minute to get your results. Alternately, I think that PowerBI should be able to handle that but I've never done that. There are multiple tools on the market that can handle it too. If you are not interested in buying a tool, it wouldn't be too difficult to make an application that would execute a query against a SQL database after a certain amount time has passed and present the results in a datagridview object that can be tied directly to the table.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
January 8, 2021 at 1:51 pm
Firstly, sorry for a late answer because of COVID-19. I need to stay in quarantine in my home. And of course, thank you for your responses.
Above all, I would like to say that I will no longer deal with the DTL format because I will use the own program of the panel that translates it as ready to CSV format. I wrote a program that automatically downloads these DTL files to my C # form application.
For now, all I have to focus on and do is to create a separate table for each day and compare the number of days for these tables, if there is a missing number of days, write the tables of the missing days into the SQL database. If there is no missing day, go into the last day and compare the number of data by ID. And write missing parts of this day.
How possible do you think this is, I really need your answer.
January 8, 2021 at 2:06 pm
Possible? yes. Would I do it? no.
The reason I wouldn't do it is you are going to have a whole bunch of tables that are named by the date. Each year you will have 365 tables. Maintenance is going to be a nightmare and designing anything that can grab data from those tables efficiently is going to be painful.
if it is the approach you NEED to go with due to some requirements, you can do it. My approach though would be to have 2 tables. First table is for the data, the second table is for the archived data. Both tables contain a column for the current date/time. then rather than having one table per date, you end up with one table that has all of the dates. now, you can easily build indexes on it and views and stored procedures and reports and whatever you need. If you go with your one table per date approach, building reports is painful, index tuning may be painful, views and stored procedures are almost pointless, permissions are a pain in the behind.
With my approach, I would set up a job to move data to the archive as it is no longer required. For example, if the data is only relevant for 1 year, I would move all data older than 1 year to the archive and do that on a weekly basis. Then keep that data for as long as makes sense (may be another year or may be 10 years or may be forever) and then delete the data when it is no longer required via a SQL agent job.
Just my 2 cents.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
January 8, 2021 at 4:27 pm
First of all, I would like to say that the information you provided was very useful and I decided not to create a daily table anymore, but if the customer requests information such as how much water I consumed on 09.01.2021, can I pull this information from the same table?
My main question is that the process of sending the data to the SQL database will be done by the computer (I mean C # FORM APP) and it will be able to continue this process as long as the computer is on, but even if the computer is turned off, the data in CSV format continues, the data generated when the computer is turned on How can I pull it and how can I extract data from the past when the computer is turned off and make this comparison?
The main purpose of this question is that my data in CSV format is always saved as log files, therefore, in case of 1 hour computer shutdown on 09.01.2021 at 15:00, the data for 09.01.2021, not from 16:00, but from 15:00. how can i get the data.
January 8, 2021 at 8:39 pm
First question is easy. SELECT <columns> FROM <table> WHERE <date column> = CAST('09.01.2021' AS DATE)
Second question is a bit more tricky as it depends. This would likely be handled in your application and my approach would be to grab the latest date from the database when starting it and that way you know where to start from. Alternately, you could write a log to indicate which files have been fully processed and which files are partially processed and when the tool starts up, you check the log to know where to start working from.
But to me, those last problems sound more like application level problems than database level problems. My concern would be more that you have a scenario like your CSV file has 100 rows in it and in the middle of uploading that to SQL, your computer crashes. When it comes back up, your application needs to know where to resume the upload of data from. One solution is to grab the latest day of data from SQL and compare that to the CSV file and just add what is missed. Alternately, you could have a log that keeps track of everything that has gone across from the CSV to SQL. Or a 3rd option could be that you push across the entire contents of the CSV file in one go. The last option has the advantage that you wouldn't need to resume in the middle of a file, but the disadvantage that you are pushing across potentially large amounts of data in one go that could bog down the network or cause timeouts (depending on how your app is configured).
My favored approach would be if the CSV file doesn't NEED to be retained, you read the first row from the CSV, send it to SQL, then remove the row from CSV and read the new first row. Using this method, if something gets interrupted in the middle, you have 2 possible states. 1 (ideal) you can resume where you left off as the first row has not gone to SQL yet. 2 (worst case) the row was written to SQL but not removed from the CSV; here you only need to grab the "newest" row from SQL and compare it with the first row from the CSV. With either case, you are only needing to check 1 row - the latest row entered into the SQL database.
Now, how to design the table is a whole other beast. Depending on the data in the CSV, it may make sense to have multiple tables that you can normalize for optimal performance and space considerations, or it may make sense to have a single table to hold all of the CSV data.
When inserting the data, i would have a timestamp for when the data went in as well as the timestamp for when the data was recorded. This can be useful for debugging.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
January 8, 2021 at 9:50 pm
Again and again, I would like to say that you understand and help me very well. I would love to contact you privately via mail, LinkedIn, or WhatsApp.
As for our topic, the answer you gave to the first question and the problems created by a separate table for each day you said before made it certain to create one table for all datas.
In addition, I can get timestamp values while obtaining data from the HMI panel in CSV format and I will definitely do this. Your favorite approach was very interesting to me as I DO NOT NEED to keep CSV files.
Your favorite approach was very interesting to me as I don't need to keep CSV files.
And I would definitely like to use this method, but my last request from you is how and in what direction I can use this method. For example ;
1) Does SQL perform this last data comparison process or can I do it myself?
2) How can I automatically complete the comparison with a code since it will be a separate CSV file for each day (I know that I will delete it, but since their names will be different and each will be separate from each other)
3) How can I delete the pulling data row (which is taking to SQL) from the CSV file after the pulling process?
By the way, I can extract data from my HMI panel with my C # form application, I can write it to SQL and watch live instantly. In general, I have completed my coding process for live monitoring of the form application.
January 8, 2021 at 10:11 pm
To answer your questions I would say "it depends". For you, I would say do it all application side and just push the row across to SQL with a simple INSERT statement.
on application start, you would do a SELECT from the database to grab the most recent row and store that in a variable. Next you grab the top row from the CSV file and do a comparison between that row and the one in the variable. If they match, discard both and continue processing the file. If they don't match, then insert the row and continue processing the file.
It sounds like you are more familiar with C# than SQL or SSIS which is why I would use that approach if I was you. Alternately, you could use SSIS to pull data from the CSV and dump it tot he table.
But to answer your questions in order:
1 - SQL is basically just a container for your data. It will only do what you ask it to do. You COULD ask it to compare the data and it will either give you a result (if it was found) or no result (if the row wasn't found) and then pass that information back to your application which would need to handle the response. OR you could just ask for the most recent piece of data and see what it comes back with. It is up to you how you want to handle it. But the thing to remember is that SQL will not do anything automatically for you. You still need to tell it what you want it to do.
2 - I would use a filesystem object to look for files in a specific folder and loop through each file. Once I have completed a file, the contents should be empty (I delete a row as I process it) and if the file is empty, I delete the file.
3 - to delete the row, you have a few options. My approach would be to pull the entire file content into memory when I start into a string array. Then as I process a "row", I delete the first element in the array and write the remainder of the string array to disk, overwriting the existing file. When the string array is empty, delete the file.
Now this approach won't work if the file is too large to fit in memory, but most computers have TONS of RAM and worst case could page to disk.
The above approach also assumes that performance is not a driving factor, but data consistency is. If performance is a driving factor, then I think using something like SSIS to pull in the CSV file(s) would be a better option.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
Viewing 12 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply