How to Deal With Unspecified Document Volume in eDiscovery

Continuing my series of articles aimed at introducing my new friends in SA to the wonderful world of eDiscovery, I thought we should change tack and talk about some of the questions which are most often raised in a litigation case or investigation. These include 'How many pages and documents do I have, and how much data do I have?'. Unbelievably important questions, especially at the beginning of a case because until a lawyer knows the answers he has no chance at all of knowing if he can organise a review of the documents in time for discovery etc. Furthermore, he cannot estimate costs, which the client is bound to want to know! The problem is no one knows the answers precisely to these questions until each and every document is processed.
That being the case, how can we help to answer these questions at the beginning of a matter which is when everyone needs to know? We can do this based upon experience and we can give guidelines. They are estimates only but at least will help the lawyer and client enormously at the crucial time of early case assessment. I must make it clear that the following estimates and formulae are mine only based upon my experience. Others within industry will have different views based upon their experiences.

Let us begin with paper documents. For example if I was told that there are 20 boxes of documents in binders then I would say there could be 35,000 pages equating to approx. 11,700 documents. How can I possibly say that so quickly without seeing or counting them?  I have handled a lot of cases in my time and we form a view based upon experience as an educated best guess. Here is how it works. Assume a typical lever arch binder, it would hold approx 350 pages and a typical banker’s box would hold 5 binders – hence 1750 pages per box x 20 boxes = 35,000 pages. Before we go any further this is also assuming all pages bear single sided text which they do more often than not but if they were all with double sided text then the number of pages would double. Some pages will be a document in its own right e.g. a one page letter, but some documents will consist of more than one page e.g a 5 page report. Therefore I assume an average page/doc ratio of 3:1, hence 35,000 pages would consist of almost 11,700 documents. Sometimes it is possible at the outset to agree an estimated page/doc ratio but in the absence of any real guide we use 3:1. I must repeat that this is only a guide based on averages for cost and time purposes and the actual page and doc counts will naturally vary and therefore affect both cost and time.

Electronic documents or data are more complex but again we make educated guesses based upon experience. With these we must first deal with file sizes. An average custodian’s documents on his PC or laptop could equate to 15 gigabytes (Gb) of usable data once collected. This will expand slightly when processed due to the release of container files (such as zip files), perhaps to 18 Gb. This will decrease, once the de-duplication process, that I have mentioned previously, has been undertaken, perhaps to 12 Gb. It is worth mentioning here that a mobile device, iPhone, Blackberry etc. could well hold 5Gb of data.

Before you lose the will to live let me explain why these figures are important to a lawyer and the client at the very outset. Firstly, service providers are likely to be charging the processing and hosting at a per Gb price. Secondly, it is imperative to have some idea as to how many documents are comprised in these figures so that the lawyer can calculate his reviewing time or think about rushing off to Court with an application for an extension of time.
Estimating the number of pages and documents of data is difficult but again I have to apply a best guess based upon experience. That shows that an average Gb of data could contain 6500 documents or 20,000 pages. Now, before we move on let me say that if the data consisted of nothing but 2 line emails with no attachments then 1Gb could equate to as many as 100,000 pages but I have never seen that scenario. There is  always a mixture of doc types along with emails, such as Word Docs, Excel spreadsheets, PowerPoint and the like and they carry larger file sizes which obviously reduces the number of documents or records – hence my average of 6500 docs per 1Gb of data. Therefore, returning to our custodian who has a net 12Gb of data from his PC/laptop and, say, 3Gb from his mobile device, the total of 15Gb could produce 97,500 docs or 300,000 pages. Very few cases only one have custodian and if there were, say, 5, then you could be dealing with almost half a million documents or 1.5m pages! Now you see why we do not want to be printing electronic data and you also see why it is imperative to use technology to filter these documents in a hosted solution for review. It is simply not possible or practical or cost effective for a lawyer to look at every document (more on this when we consider proportionality in a future post).

Obviously there are so many variables and no two cases are alike but what I have outlined here is on a general “rule of thumb” basis within the industry and on the numerous cases in which I have been involved. We also know from experience that certain types of cases will cause variances. For example, construction cases often have large size files which reduce the number of documents. The largest case that I have personally dealt with had approximately 10m pages of hard copy and 4 terabytes of data (by the way, there are 1024 gigabytes in a terabyte!). That equated to almost 30m documents. You may imagine that this matter was over a very long period – about 4 years - and lawyers were working for far longer than that, on the investigation. 

Once again this shows the need for expert assistance in these matters. A lawyer cannot be expected to know these numbers and I hope I have been able to shed some light on best practices to employ during early data assessment.