RE: Getting started with SQL Azure

Valued Member

Points: 74

October 22, 2009 at 10:17 am

We will likely never use cloud computing for a number of reasons, most of which are legal issues. Many of these issues go back to the Electronic Communications Privacy Act of 1986. That was the law that required companies whose computing equipment that are a part of the e-mail infrastructure to retain copies of all e-mail messages that pass through their servers for at least 90 days. Current pending legislation that has already passed in the House of Representatives and is awaiting passage in the Senate would extend that holding period to two years.

Existing case law indicates that rights to privacy seem to vanish when data of any kind is stored in any more or less public repository. For example, in United States v. Councilman (case #03-1383, 6/29/2004), it was decided that defendant Councilman did nothing illegal when he directed his ISP to produce a program for internal use that would read e-mail messages sent to his customers from Amazon.com before those messages had been delivered to their intended recipients. The basis of that decision was that the messages were in temporary storage in RAM or on hard discs. Google "processes" gmail subscribers' e-mail messages at the specific direction of its advertisers in order to enable them to target their advertising to users more precisely. In short, putting anything in a cloud environment sacrifices rights to privacy and confidentiality. Companes that say they keep tight reins on information obtained from their customers and which at the same time use cloud computing in order to mitigate expenses are either engaging in self-deception or sophistry.

Taking things a bit further, let's say that a company were to be sued civily. Current legal practices emphasize the use of e-discovery, a process that requires opposing parties to produce volumes of digital evidence in response to demands from their counterparties. The scope of such demands include all e-mail messages; instant messages; electronic documents; exact copies of entire hard discs from workstations and servers either on premises or off including those owned by 1099 contractors, home computers (if companies allow employees to do work from home), and off shore developers; cached files produced by web browers; recoverable deleted files; cell phones and PDAs; and other forms of electronic communication that have the capacity to be persisted in some form of storage.

Let's say that a company were to use Google's services for storage of data. Google maintains data centers around the globe, some of them in Eastern Europe and Russia. Some of the data centers where it holds data are not owned by Google itself; instead it leases space in those facilities. Now let's say that a client company gets sued and a demand for production of data commences. Imagine the cost and complexity of obtaining data from a data center located in Russia that is operated by Russian nationals under Russian law.

How expensive can it get? In the 2001 case of Rowe Entertainment, Inc v. William Morris Agency just shy of $11 million was spent on e-discovery before the litigants even appeared on the first day of the trial. Most patent infringement cases today cost around $4.5 million, most of that being e-discovery costs. It is assumed by many in the legal profession that the average case involving a small- to medium-sized company would entail between $2-3.5 million in e-discovery costs.

Let's take a much smaller and more personal example. A friend of mine who runs a one-man software company recently sued a client for reverse engineering and decompiling software that he provided the client under license prohibiting such acts. The actual monetary damages were not yet realized, but had they been, the amount would have been in the range of $150-200 K. The trial itself which lasted four days cost him $80,000. The e-discovery costs preceeding the trial amounted to more than $300,000.

The latest fashion in computing management of documents is "de-duplication," a technique of data storage that keeps only the most current versions documents and data in archives intended for either later reference or disaster recovery. Anyone who has ever worked with an attorney will know that attorneys are paper-centric in their thought processes. They don't want to see what you have now but rather what you had 3 years ago that you sent to the person who is now suing you. They want to see the revisions of documents, the process of communication, the promises made in the ebb and flow of the relationship before everything went sour. Those are the pieces of information that they would use to build your defense in court. De-duplication destroys every shred of that grist for the defense.

CTOs, DBAs, and application programmers often take the initiative to do things that promise to streamline operations and reduce capital costs. That is an excellent motive, but often in the hindsight afforded to a litigator such actions can be the death knell for a company. Destruction of evidence, even if unintentional, can lead to charges of spoliation which can eviscerate a company's defense.

I am not an attorney, but I have come to believe that either a staff attorney or outside counsel should be made a part of the decision making group when it comes to data practices in a company. Most attorneys couldn't put a simple Lego toy together in under 24 hours let alone pretend to know what class inheritance or an iterative loop is. What they do know, however, is what the court system will expect to be produced in litigation, and it behooves any company that believes that it will eventually have to step into a courtroom as a defendant (i.e. everyone) to maintain its data accordingly.

What are the implications of this? It means that when data is stored, it needs to be stored in two entirely different ways, each with their own disaster recovery plans. The first way is the tradition IT way - minimal footprints, keeping only the most current data for purposes of recovery, and being able to produce a quick restore to get things running when bad things happen.

The second way targets the needs of the business as a legal entity. Data has to be categorized based on its likely use in a legal defense. Things can only be deleted according to specific, legally defensible written policies after so many months have elapsed. Revisions of documents have to be maintained as well as the final documents. When a final document has been produced, a hash of the document has to be produced and published so that later pretenders can be disproven to be authentic. The computational method for producing the hash has to be saved along with all software and operating systems that were used to produce the documents. Cloud computing should only be used for things that contain ABSOLUTELY NOTHING that is confidential, private, or even remotely so since cloud storage forfeits any rights to privacy. That includes e-mail messages (which are usually the most damning pieces of evidence), documents and memos, and databases.

Your reaction might be, "If that's the case, is it really cheaper to store anything digitally instead of on paper?" Your reaction might be right. The nice thing about paper is that when a final document is produced, the possibility for alteration becomes severly limited. It can be locked in a vault, and that's the end of the life cycle for that document. Paper consumes natural resources and space, however, and it was the urge to reduce those costs that helped fueled the drive toward electronic storage. Now, however, the digital world has taken on a life of its own, and the space required to store the data contained in electronic documents is being overwhelmed by the space requirements and computing costs for the storage and manipulation of the meta data, or data about the data.

On a closing note, serious thought needs to be given to the way that humans behave in a natural environment. Deforestation led to electronic storage out of good intentions, but now the power consumed to maintain what is admittedly the fragile state of data threatens to do even more harm. The sources of power used to keep the growing millions of servers, workstations, phones, televisions, lighting systems, cooling systems, and communication devices operating are generally fueled by coal. The delivery systems for the physical equipment and the people who operate them are generally fueled by fossil fuels. The technology that permits the use of these fuels assumes that robbing the atmosphere of oxygen and then dumping the byproducts produced from combustion back into the atmosphere is an acceptable practice. In the face of what is increasingly appearing to be unstoppable climate change, the computing industry's initiative has been toward so-called "green" computing. The intention is good, but if the growth of the scope and intensity of computing activity and resource consumption continues unabated, green computing will mean nothing. Which is more important, losing trees or losing air? The IT industry, like many other industries, is rapidly approaching a point at which it must become sufficiently self-aware to be able to say that perhaps it needs to throttle its own growth as an act of social conscience.