Chopping Off Data

  • Comments posted to this topic are about the item Chopping Off Data

  • Been involved with this too many times.

    Users think Excel can do everything and when it can't they don't comprehend. I've given them a Microsoft Access front end a few times which is always quirky to use. Most times they just want all of the Data so that they can sort it out themselves. The obvious answer is to talk to an IT professional and get the results that they were after in the first place.


  • Excel definitely has it's place in the world, but critical production applications aren't really that place.

    We've converted quite a few client-owned Excel tools where the data has outgrown the requirements and have pushed these into other more robust, version managed database technologies, like SQL Server or (more recently) Snowflake.

    I understand that financial regulators have been cracking down for a number of years on the use of Excel for certain reporting requirements as critical elements (such data lineage or provenance) just cannot be guaranteed in many cases. Strange that something as important as tracking the progress of a pandemic is done using this basic technology. A lesson learned the hard way!

    That's not to say we don't use Excel, of course. My favourite usage is repetitive ad-hoc code generation 🙂


  • A former colleague of mine, who is much smarter than I am, once opined that people that do what we do should be in some way licensed, especially when working with data such as the data in this example. If you do something like this, you lose your license. Simple. There are too many of these spreadsheet warriors in the wild, often practicing as "BI Developers" and businesses don't realise the damage they have done until it's too late. By then, of course, they have toddled off to their next job to repeat again. It isn't acceptable and in this case it is borderline criminal.

    I'm on LinkedIn

  • I was stuck in a traffic jam on the day this news item broke, and only going on what was not said, I suspect that:

    • they didn't use an 'enterprise' database because they didn't want to involve IT, with all the testing/QA/regulatory overhead in the work, as it would take 6 months and might never be delivered (quite a few of the UK's civil service projects die without delivery or take too long) - a laudable desire to 'cut the red tape'.
    • that the people commissioning the work might have been Arts graduates without an appreciation of scientific rigor in decision making (which seems to happen a lot in our civil service; because in the UK education system, there's normally only 3 subjects post-16 and one for your degree).

    Anyway, one of the analysts explained that the 65K rows only contained 14K tests, and that the decision maker had been aware of this limit at the start, but then it was forgotten to revisit the decision later on.   The rest, as they say, is history.

    I'm looking forward to other people giving ample examples of where Excel spreadsheets have been misused as databases ("when you import this sheet to the database, remember that when the cells are green this means the data is provisional...")!

  • I can't put this in clearer terms as a sage buddy of mine once told me a phenomenal line about any/all Excel Spreadsheets:


    His previous organization went through a serious death-by-spreadsheet purge to upgrade all the spreadsheets to a database. Which, you guessed it, was MS Access! Eventually, someone realized that a "personal" database is NOT (<>) the same as a "real" database like what we do here and they eventually figured it out after shelling out for contractor dollars and training budgets.

    So, any time I start hearing the Excel-Spreadsheet-Fairies talking I'm always there to clip their wings with database realities.


  • A little bit of self - promotion: Nowadays it is simple to transform tabular Excel in application


  • The upsetting thing about this from a UK tax payer point of view is that this Excel non-solution has costs millions.   There's also an issue with the choice of vendor.

    I'm sorry there are tools out there that can produce a crude but working  UI from a DB.  A fairly rudimentary DB is not that hard to set up.  Much as MS Access was derided it was also loved because it was easy to put together forms and reports.  As a starting point MS Access would have been a better bet and would have had a clear migration path.

    Given that this is a global pandemic and track & trace is an important component in controlling the disease I think this is a case where the UN could have stepped up, roped in Apple and Google and had an international tech solution.

    If you want to do a 5 minute job in 2 years give it to a government department.  I'm willing to bet that Google would have had the Android app on every non-Apple phone in 2 weeks flat.

    Yes, I know there are privacy issues but if every phone comes pre-installed with Facebook software and you can't get rid of any of it then that is a bit of a moot point.

  • I'll leave this meme here...


  • The article says that "When a CSV file longer than that is opened, the bottom rows get cut off and are no longer displayed."

    Did Excel provide any warning that the data was ignored? Or did Excel just silently proceed as if nothing was wrong?

    Just am curious.

    [font="Comic Sans MS"]Jonathan Gennick
    Brighten the corner, where you are[/font]

  • The Office suite is a desktop application. It has some workgroup interoperability, but it was never designed for a multi-threaded environment. In the US, finance professionals tend to be highly literate in Excel and want to see the data there. They should be brought to the understanding that Excel is not the proper format for server processing. It really is simple. If the amount of data is more than a single person can look at, say 65k rows, you need a different solution. Increasing the limitation to 1m is not exactly helpful, despite everyone screaming for it.

  • Jonathan Gennick wrote:

    Did Excel provide any warning that the data was ignored? Or did Excel just silently proceed as if nothing was wrong?

    Hi Jonathan, I think that your question provides an excellent example of the misunderstanding being discussed here, and I am delighted to provide an answer.

    CSV file format is comma separated values. It is a very old protocol, used for transferring data between systems. It has the advantages of being very flat, meaning that the file contains very little metadata and so is small as possible; the format is well-known, so it is easy to process; and the file contains nothing but text.

    In other words, it is not an Excel file. The format was designed to transfer data between systems. Just because Excel can open a CSV file and display it, that does not mean that the file is native to Excel or that Excel is the correct application to display it.

    This is not a criticism of you or your knowledge. Many people lack this understanding, which is the point of Steve's post.

  • Jonathan Gennick wrote:

    The article says that "When a CSV file longer than that is opened, the bottom rows get cut off and are no longer displayed."

    Did Excel provide any warning that the data was ignored? Or did Excel just silently proceed as if nothing was wrong?

    Just am curious.

    Me too.

    It's a concern and hopefully Public Health England have learned a lesson from this that we can all use.

  • There is a note that some data can be cut off, but it can be easy to miss, and if you are busy, you could close it without realizing that it applies to your file. The verbiage isnt completely clear, especially to laymen.

  • Answer since 2010 is Excel PowerPivot, which let's you have millions of rows in Excel, makes it like a reporting database. Then Power Query in 2013, which gives you good way to get the millions of rows of data. And now Power BI, which makes it pretty and powerful.

Viewing 15 posts - 1 through 15 (of 28 total)

You must be logged in to reply to this topic. Login to reply