For many months now, I have been wrestling with myself about this post. Should I write a very brief piece with a practical explanation using non-legal or non-technology language? Should I write a more technical explanation? In the end I settled on a combination of the two because I want SA to understand the basic principles of TAR, but also let them see how complex this subject and process is, without having readers lose the will to live!
Before looking at Technology Assisted Review (TAR, which is mostly how we refer to it in this industry) aka predictive coding in the context of litigation or investigations let me give you the simplest everyday use analogy that many of us within this industry use to describe predictive coding. Many of us buy books online through Amazon and I am sure you will have noticed that once you have competed your purchase, Amazon then suggests other books you may wish to purchase based upon the one you have just bought. The suggestions made by Amazon go further than simply suggesting that because you have bought e.g. James Patterson’s latest novel you might want to look at his other publications. Suggestions are also based upon price, the genre of the book you have purchased, the author, whether it was a paperback, hard back or kindle version. In other words it looks at all of the aspects which are likely to have contributed to your thought processes in selecting that particular book and quickly applies the same “thinking” before recommending others. Google uses predictive techniques, so do credit scoring agencies and insurance risk underwriters, and many other industries, so the concept is not new nor is it restricted to the legal world. So there you have the very basic concept and we can now turn our attention to how predictive coding applies to litigation, arbitration, investigations and the like.
“Predictive coding is a type of machine-learning technology that enables a computer to help “predict” how documents should be classified based on limited human input. The technology is exciting……… because the ability to automatically predict document responsiveness has the potential to save organisations millions in document review costs. The savings are mainly attributable to the fact that fewer dollars are spent paying lawyers to review and segregate responsive from non responsive documents when responding to discovery requests” states the wonderfully entitled publication “Predictive Coding for Dummies”.
TAR, even though it is not new technology, is really hitting the headlines now in various parts of the world. I guess this is partly because of the influence of larger volumes of data in cases, coupled with the vision of driving down costs of Discovery. However, I would also venture to suggest that it is very much because there have been more decided cases in various jurisdictions referring to and supporting the use of TAR. No doubt there have been more (particularly in the USA) but I refer you to the following from USA, Ireland and UK respectively.
Da Silva Moore v. Publicis Groupe, No. 11 Civ.1279 (ALC) (AJP) (S.D.N.Y. April 26, 2012).
Rio Tinto PLC v. Vale S.A., 2015 WL 872294 (S.D.N.Y. Mar. 2, 2015)
Irish Bank Resolution Corporation Limited & Ors v. Quinn & Ors  IEHC 175;  3 JIC 0306
Pyrrho Investments Limited and another v MWB Property Limited and others  EWHC 256 ( Ch)
Now let us not run away with the notion that these cases have been, and are trailblazers for the use of TAR in their respective countries, because the truth is that the solution has been used for a number of years, (I did my first case involving TAR in 2012, and very successful it was too and I know of its use as long ago as 2009!). It is simply that these cases represent landmark decisions because a Court actually decided in favour of the use of such technology in those cases. So, when people use the phrase that TAR is “changing the legal landscape” there is now clear judicial evidence by way of support. The last mentioned case above, in the UK, the Pyrrho judgment, is the most recent and is certainly receiving a great deal of attention. The law firms on either side (I have worked with both) collaborated in their thinking and actions about the use of TAR in this particular case and the Judge agreed with them on the basis of proportionality. I recommend you read more about this one and there has been plenty to read, the best material coming from Chris Dale in my view.
So, what exactly is TAR and when can it be used? Firstly, I, and many others in the industry would say that it is best used when there are not less than 100,000 “reviewable” documents after culling by other methods. I know that others have suggested there are instances of its use with a smaller volume and I accept that - my friend Andrew King from New Zealand referred to a case in one of his blog posts, where he had heard that TAR had been used with only 30,000 docs and there would have been a good reason for that. For the purposes of this article I maintain my view on volumes. Put in very simple terms, TAR is a technology within an eDiscovery solution that effectively applies the thought processes of a reviewer to the remainder of the documents within the database and finds similar ones whilst excluding the others. “Come on”, I hear you say, “How can that be possible and how can I trust it”?
Firstly, you need to know that the solution you are using, or intend to use, actually has this technology and secondly you want to know if there is a person within your service provider company who has sufficient experience and knowledge of how to conduct this exercise with you. Then, a reviewer or small number of reviewers who know the case really well, will be guided to review a sample set of say, 1000 or 1500 documents. The resultant decision of the relevance or otherwise of these documents would then be applied by your service provider against the other documents within the database. Using highly complex algorithms the software would then suggest a number of documents which appear to be similar to the ones marked from the first sample set and together this set is known in the industry as a “seed set” or “training set”. Thereafter, a further sample of those suggested as being relevant would be reviewed by the same person or team and this would result in a more refined set of results, whereby some of the suggested documents would be classed in the same way as those in the original seed set, whilst others would fall away. Depending upon the number of documents within the database it is not uncommon to see something like 5 rounds of this iterative process and each time the software is learning more about the thought processes that identify these documents. In essence, this review sampling would continue until there are no more documents being returned as relevant or the amount is sufficiently small to enable the lawyers to make an educated decision based upon proportionality not to proceed further with this exercise.
Perhaps I could explain it better with a real case study, this one came for an Australian service provider, and the Manager there who dealt with the case (I knew him very well when when he worked in the UK) is a very experienced eDiscovery Manager. I will summarise his case study - the case started out with 6.6m documents which was cut in half by de-duplication and then further culling by keyword searching produced a balance of 157,000 documents to be reviewed. It was clear however that there were still many documents within the 157,000 which were not relevant, and therefore it was decided to use TAR to weed them out and also accelerate the process as time was of the essence. The lawyer completed three rounds of reviewing a sample of 1000 documents each and for each round between 6% and 7% were identified as responsive. The lawyer then tested the accuracy of the software (in this case it was Relativity by the way) by conducting a QA exercise and the provider took a statistical sample of documents based upon a 95% confidence level and 2% margin of error. This resulted in 2,226 for the lawyer to inspect and, largely, he agreed with the decisions made by the software. Therefore, the software then applied this logic to the remaining 152,000 documents and in the end a total of 27,122 were found to be responsive. The review cost savings by using this method rather than reviewing all, amounted to the Rand equivalent of R3m - no brainer! From this example, by the way, you can see why I say you need at least 100,000 documents because, had there been only 20,000, the reviewer would have reviewed a large chunk anyway during the process. Just ponder on this for a moment and reflect on the time and cost saving here. Where proportionality is now the overriding feature in litigation cases, it is almost impossible to argue against the use of TAR in such matters and it is another reason why we want eDiscovery solutions written into our Rules here in SA.
So is TAR foolproof? No, of course not, but are humans reviewing documents foolproof? The major disadvantage with TAR surrounds those documents with little or no text (spreadsheets for example) or poor quality text. I say this because TAR obviously relies upon text. There are one or two other possible disadvantages but most can be overcome in other ways as technology has improved. Remember here also that a main advantage of TAR is that is driven by the lawyer/investigator as he is the one who determines the “seed set”, QC’s results and makes a judgment call on the final percentages.
I hope that this post is of assistance to people here in SA in particular. So much is written on TAR elsewhere, undoubtedly better and more comprehensive than this, but I am continuing my theme of doing my level best, in my posts, to educate and inform those in SA that want and need to learn more about how eDiscovery technology can help, hopefully using understandable language. Let me direct you to all of my blog posts and as ever, I repeat, contact me for further information via my contact page. Ask me questions; query something I have said; seek my advice or opinion but above all else, think about how what I am saying can assist you, your clients or your company or organisation.