TAR, or as some say, predictive coding is arguably the most important feature of eDiscovery technology. It has actually been around since 2010 but become more used as the years (and the technology) have progressed. Now in the UK, US and Ireland, that I know of, there have been decided cases advocating or ordering its use in appropriate cases. I also know it is used in other parts of the world but as yet it has not been used in SA to the best of my knowledge and belief. Perhaps that is not surprising given that we do not even have eDiscovery as part of our Uniform Rules but it does not mean that this fantastic technology cannot be used here right now if the case warrants it.
I have been encouraged by some of my contacts and regular readers to explain exactly what TAR is and how it works. Let me say from the outset that this is a very basic guide and meant simply to give SA practitioners and their clients a flavour. My colleagues from other jurisdictions will doubtless think this is “old hat” and the industry has moved on a great deal, but give us a break here guys. In SA we are at the beginning of the eDiscovery journey and people are curious and looking for information and education. As I said I do not believe it has been used here as yet although I am aware of some close calls! One of the immediate problems is that there are so few people in SA who have ever been involved in the use of TAR - in fact the number is less than the fingers of one hand. The first case in which I successfully recommended the use of TAR was back in 2012 in the UK and I will come back to that one later.
To begin, let us get some basics out of the way. Many in our industry contend that TAR is not worth considering for financial reasons unless there are at least 100,000 electronic documents to be reviewed. Indeed the volumes have been in the millions or hundreds of thousands in all the cases involving TAR which have attracted the attention of the Courts. There are very good reasons why a large volume is recommended before the use of TAR which hopefully will become clear later but I must say that I am aware of cases where TAR has been used when there were only 30,000 documents. Another basic point is that TAR, a form of AI or machine learning, relies upon predictive techniques which are long established by such as Google, credit scoring agencies and insurance risk underwriters. The point being that this is not something that has been “invented” recently by those in the eDiscovery industry. What has happened is that the industry has harnessed well tried and tested algorithmic techniques to the legal world.
Let us take a practical look, in what I hope is understandable language, as to how TAR works before I make some important points by way of conclusion. First, a reviewer or small group of reviewers who know the case really well are guided to manually review a sample, say 1000 or 1500 docs and determine the relevance (responsiveness) or otherwise of this set. Personally, I always say that the final decision on this sample set should come from one person, perhaps the lead lawyer in the case, and I say that because if you ask 5 reviewers to look at 1000 docs you will not get exactly the same results - it is called human nature! This first set is very important and we often term it as the control set. After the control set is reviewed manually, a similar set is created which some of us term as the training set and again perhaps 1000 documents for this set. It is usually not a random selection but one in which likely relevant documents are selected and this may well have been achieved by using other features of eDiscovery technology e.g. keywords, clustering, concept searching etc. The idea is for human reviewers to teach the TAR system to identify relevant against non-relevant documents and the system then effectively studies and learns the thought processes behind the reviewers’ decisions. The system then generates a computer “decision” based upon the knowledge it has gained from the reviewers and applies a prediction “score” to each document. This score is used by the reviewers to analyse and, importantly rank the documents in the collection. Next, we need to measure the performance of the TAR system and so we instruct it to make predictions about the relevance or otherwise of the control set and the decisions made by the system are compared to the decisions made by the reviewers on the same set of documents and we can see how they match. It is not good practice simply to rely on one use of a training set and always best to repeat the training process a few times on more selected documents (many modern solutions can now select these further training sets automatically rather than having to select manually). After each round the decisions are set against the control set and it will be seen that the performance levels are bettered as the system has continued to learn. The number of rounds required will depend entirely upon the performance levels the reviewers have determined to achieve and it is not at all unusual to perform up to five rounds. The required levels are set by the legal team and a suggestion may well be that “relevant” documents must be 95% accurate against the control set with, perhaps a 2% margin of error. Therefore any document not given a score within the parameters would be classed by the system as not being relevant. So you begin to see that we are now reaching the point that the reviewers are happy that the accuracy of their decisions has been reflected by the system, which leads us now to be able to apply these decisions to the entire collection of documents. This will produce a prediction score to each and every document and the reviewers will be able to see how many documents fall within their selected parameters. It follows therefore that a decision can be made as to the relevance or otherwise of every document in the collection without the majority of them having been subjected to “human eyes”. Now, the above simple explanation is the basic method but more and more modern solutions are bringing new techniques for TAR. For example, using more AI with some solutions, we can apply the whole document as well as merely phrases to the solution and this is likely to reduce the incidents of false positives. It is not at all unusual for us to see that as much as 80% and more of the documents in the collection do not need to be manually reviewed. Imagine the time and costs savings! Depending upon the volumes concerned the savings can run into millions of Rand.
Now, let me attempt to bring this all together by making some important points which may answer some questions:-
- Some lawyers will say that they cannot possibly rely upon a system that effectively “ignores” 80% of the documents in a collection. It does not ignore documents because the truth is that all documents have “machine” eyes rather than “human” eyes and the system is such that there are constant checks and balances with lawyers performing sample reviews or reviews of the decisions made by the system. In any event, there is nothing in the rules of any jurisdiction that says every single document must be looked at - quite the opposite in fact as proportionality is an overriding factor these days and it behoves lawyers to use whatever technological steps are available to reduce costs. In any event I trust machines more than humans - they do not have “off days”, they do not get tired or sick!
- Why not just use keyword searching? Well, there is trend in other jurisdictions to no longer merely rely on keywords. How can anyone possibly know every single keyword in a case which would produce relevant or responsive documents? Also if the list is too long and over inclusive then more documents than necessary will be revealed and therefore the object of eliminating or reducing the amount to review will be lost. Conversely if the list is too short or inaccurate then, for sure documents will be missed. Finally, where the collection is so large or time is so short, keyword searching alone will take too long. Interestingly there was a case in the US recently in which the Judge told the parties what keywords to use. In truth, it is widely felt in other jurisdictions that keyword searching alone is unreliable now and as I mentioned there are many decided cases relating to TAR. These are important for practitioners to read and understand and therefore I will give you links to the 6 most important ones that I know of Da Silva Moore, Rio Tinto, Irish Bank, Pyrrho, Brown, . The keen eyed amongst you in SA will notice that the second one on my list refers to Rio Tinto - familiar name?
- I referred earlier to the first TAR case in which I was involved back in 2012. I won’t go into great detail but what was unusual here was that my client was a corporation direct rather than their law firm. After the usual culling and filtering we were left with approx 150,000 docs and the client knew what they were looking for but did not want to pay their lawyers to review this number of docs. We advised TAR to which the client readily agreed and with our assistance they performed the control and training set reviews as they knew the case so well. After sufficient rounds the client was happy that only approx 12,000 documents were left for their lawyers to review. It would be right to say that the lawyers were less happy but the client insisted and there was a disagreement! I intervened and suggested that the lawyers reviewed 1000 documents that the system had rejected as being potentially relevant, and guess what, the lawyers had to agree that none were. This was sufficient to convince the lawyers that the system could be trusted.
- Finally I want to talk about how you go about deciding if and when to use TAR. Firstly, you need quality advice from the outset which can only come from people who know what they are doing and have experience. Also you need to know if the solution that you are using in the case, actually has the capability of performing TAR! You must take advice because not every case would be suitable for TAR and only experienced people can tell you that. You want to work with people who have done this before and who can work with you and your client - choose very carefully! It may also be worth disclosing to the other side the method you are adopting and even inform the Court. I say that because, for example, in the UK this would be normal practice, as it would be raised at the Case Management Conference before the Judge - a feature which we do not adopt in SA as yet.
I know this a very long post and it is complex but I have tried to explain as much as I can in understandable language because I think it is very important for lawyers and their clients in SA to understand what can and should be achieved in cases with a large document population. As ever do not hesitate to contact me for more information.
P.S. Just as I had completed the main body of this post I was notified of a matter in Australia in which TAR was used spectacularly - 778Gb of data equated to 6.6m documents and was reduced to 157,000 documents for review in 31 hours. It took one lead lawyer, a service provider and an independent consultant working with TAR - I rest my case!