Whats faster, flat files or SQL table

Question

Whats faster, flat files or SQL table

Digs

SSCertifiable

Points: 7333
More actions
June 5, 2009 at 1:31 pm

#384725

I have 5000 txt files of Date,open, high, low, close, volume data. Each file is named after the stock symbol, like goog.txt or msft.txt
They can be in one windows folder or many.
Each txt file has about 5000 records in each (15 years of data).
What is a faster forward read only approach for a website, place data in a single SQL table or read txt files.
Currently I am using txt files, I import all the txt file data (say for msft.txt) into an array via .net files import code, then I use the array to filter period ranges ( ie between this day and that day etc)..BUt my testing is only on 50 or so txt files in a folder, what happens when I have 5000, in folder, or maybe 1000 in 5 separate folders..
What wins, in speed, thats finding the data in a folder or finding the data in a SQL 2005 table ( indexed, using Idatareader with a WHERE clause.) so I can place into an array for manipulation ?

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004664 More actions · Answer 1

Because you have the data in an array, once loaded, it will probably be the fastest but may not scale well if you consume all of the memory available for the 5000 files.

A real flat file consisting of fixed size fields will probably be the fastest non-memory option if you want to read everything, but will need to be in the perfect sorted order in order to do any searches. That is, unless you build your own index and binary search algo.

In this case, I don't believe that the fastest is the best. Build a set of test files and try all of the options including loading the data into SQL Server. With the help of a little dynamic SQL, loading 5000 files of 5000 rows of all the same type is virtual child's play once you know how to do it.

If you want to test how quick SQL Server can find something in a million row table, lemme know and I'll fix you up with a data generator. Just let me know what you want the table to look like and how many stock symbols you might be after.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Digs SSCertifiable Points: 7333 More actions · Answer 2

Sorry I didnt make myself that clear,,,

Whats fastest, when you need to pull one TEXT file up into the asp.net application.

1) Pulling a single (say msft.txt) txt file out of 5000 txt files in a windows directory and reading into an array with then asp.net application.

or

2) Having all the 5000 txt files loaded into a SQL table already, then doing a SQL search via a WHERE clause to get the data for a single stock symbol like MSFT...

Jeff Moden SSC Guru Points: 1004664 More actions · Answer 3

No, you where clear. You just didn't think I was. 😉

Which do you think will be faster if you have 5000 files with 5000 lines each for a total of 25 million lines of information to load into an array? Unless you're really good at writing your own database functionality, I believe you'll find that having that information already loaded into to something like a nice table where you can use the natural filtering tools of SQL Server is the way to go.

Of course, I don't know for sure and I don't believe you'll find many that do. That's why I suggested you actually do a test.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Digs SSCertifiable Points: 7333 More actions · Answer 4

.."Which do you think will be faster if you have 5000 files with 5000 lines each for a total of 25 million lines of information to load into an array? "..

I only want to load 5000 lines into an array (ie a single file), the question was whats was quicker SQL server index table with Idatareader to find the 5000 lines or finding a plain text file with a windows folder amoungst 4999 other files, to load a single file...

Anyways a test will be on the cards...;-)

LutzM SSC Guru Points: 107049 More actions · Answer 5

If I understand you correctly you're asking what's faster:

You have an asp.net application and you're asking for the faster of the following two options (using .net coding):

1) find a specific file out of 5000 in a windows folder and load the content into an array or

2) do a SELECT [values] FROM WHERE FileName = [search string] on a database where the files already are loaded and the table is properly indexed (my assumption) into an array

Both will result in an array with 5000 rows.

If my understanding from above is correct, I think this is more a question for a .NET forum since it comes down to what .NET function is faster: the file search and load or the select statement.

What we could help you with is to provide a data generator -as Jeff mentioned- so you can measure how long it would take SQL Server to find the 5000 records in a 25mill table.

From my point of view it's important though that you do the test on your own system since within this community there is a huge variety of hardware available and the test will show different duration for different hardware. Otherwise you'd be comparing apples and oranges.

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Carl Federl One Orange Chip Points: 25384 More actions · Answer 6

With the text file having 6 fields (Date ,open , high, low, close, volume) and assuming each field is 8 bytes, this is 48 bytes per record. With 2.5 million records, this works out to 120 Mb of data. With such a small size, an in-memory database (IMDB) could be a better solution. There are some open source IMDB at codeplex.com and sourceforge.net.

SQL = Scarcely Qualifies as a Language