eDiscovery in SA -What else is there besides keywords searching?

Many months ago I embarked upon a series of posts aimed at explaining, in easy to understand language, the process of eDiscovery. This series is ostensibly aimed at my contacts in SA in an attempt to increase their knowledge of an, as yet, infant industry. As often happens I was sidetracked from this series by other happenings in the world of eDiscovery which required my attention and so now, we are back to the series! 

One of my earlier blogs dealt with searching via keywords where I mentioned that there were various other methods of finding or defensibly eliminating documents in cases using eDiscovery technology. In this post we will look at some of the most popular, in a very basic and hopefully easy to understand manner.

Email threading

This is an excellent feature in cases which are email heavy. I am sure that everyone understands how a chain of emails is created - you suggest to a recipient a meeting, he responds asking for dates, you suggest some and he responds again agreeing, then you suggest a venue and time to which he responds with his agreement and you confirm. That could take 7 or more emails all about the same thing and because in each case the response has been created by hitting reply, this set of emails is a thread. Most, if not all eDiscovery solutions have this feature (and if the one you are dealing with does not, then I suggest you do not use it!). The system allows you to look at this group of emails together as it connects related messages which otherwise might not appear together. This has a significant effect upon review as the reviewer can often make a coding decision e.g. relevant or not relevant which would deal with the entire thread rather than looking at every single one. I have seen threads running into hundreds of emails and by grouping them together in this way, one look at the main, or pivot email, can tell you whether this thread is relevant or not or privileged etc.. Either way you will have “viewed” all of these and can tag the entire thread one way or the other. It also goes some way to build a story of the relationship between the authors and recipients of these emails which is often invaluable. You may learn that there is an important relationship in this matter between two individuals simply by the number of email threads which may then lead you to ask the system to isolate all communications between these individuals for specific review. It is obvious really as you are looking at an entire conversation rather than piecemeal which must be beneficial and time and cost saving. I recall once being involved in a case where there was an extremely long thread of emails between a Director of the company and its external lawyers. This thread was then forwarded to the CEO of the company from which a further or longer thread ensued. In terms of review all of these emails were privileged and were marked accordingly with one press of a button. I remember it because in all there were over 300 emails which could represent almost a day’s work for one reviewer had they been looked at separately.


Another very useful feature of most eDiscovery solutions. During the process of eDiscovery we will have de-duplicated the data set prior to hosting ( I referred to de-duplication in this blog post http://www.terryharrison.co/blog/2015/5/25/processing-of-electronic-data-discovery) and as the name implies the software will identify duplicates and isolate them. They are not deleted, merely set aside and can be recalled later if required but clearly there is no point in having absolutely identical documents repeated in the data collection and being reviewed as well as incurring unnecessary hosting costs. De-duplication is an exact science using computer algorithms to identify duplicates and it often causes disagreements as clients will sometimes complain that they have seen the same document twice because it looks the same as the other. There will always be some difference as every document has its own unique fingerprint which would prevent another document from being recognised as a duplicate. This then leads us to use “near-duping” or “find similar” as it is called in Relativity. In most cases there will be documents which are similar but not identical. A common example would be a Word doc that several people have worked upon creating revisions of that file which were subsequently converted to PDF. These would not be duplicates as even though the final PDF would have the same text as the final Word doc they are different documents. So, de-duplication would find multiple copies of the PDF but NOT the Word doc from which the PDF was created. Near duping would find the Word doc as these documents are “similar” and it may be relevant for you to know what revisions had been made and by whom. This is quite a complex example and there are usually numerous other ways of similar documents in existence and the whole point here is to group those documents together to assist the reviewer. The application works by setting a threshold (which can be changed very quickly). By this I mean, for example, you tell the system that you want to see documents which are 90% the same as the one you are looking at, or any percentage that you wish until you are satisfied with the results. It is a process which you would use in certain types of cases where there are likely to be a lot of similar documents and one in which you would seek advice and assistance from a good service provider. There is no doubt at all that is more than useful and significantly aids review in appropriate cases.

Concept searching 

This is another method of using data analytics within eDiscovery software to facilitate review. One of the most popular analogies used to illustrate concept searching surrounds the searching of the keyword “diamond”. Appropriately, here in SA your case may involve a diamond mine but the problem with the word “diamond” is that it can refer to many other things such as; jewellery; a “diamond” chip company; or even a baseball field! You do not want to waste your time (and your client’s money) reviewing documents that talk about baseball or someone’s engagement and wedding rings so, effectively, you tell the system that you only want to see documents containing the word diamond when it refers to mining. This is a fairly simplistic way of describing concept searching but I think, for the purposes of this post you get the idea and you can see how helpful and time/cost saving it would be.


As the word suggests clustering determines which documents are related to each other and gathers them together in clusters. It effectively organises documents into groups and has a fantastic effect upon review as entire clusters of documents can be reviewed together. The documents are labelled by for example keywords and, again, you can imagine having the ability to save time/cost by reviewing these groups rather than individually searching. It may be that an entire group can be tagged as relevant, not so, or privileged thereby once again avoiding the tedious task of looking at every document. Where the data set is large or complex, clustering is absolutely invaluable. It helps to identify major issues and you can prioritise documents so much easier and quicker. Particularly relevant ones may need to be reviewed by a higher cost lawyer and these can therefore be grouped for that lawyer’s review only, saving the client costs. Without doubt clustering can improve the consistency of review and reduce human errors.

The above methods are just a few of the many features contained in a decent eDiscovery solution and there are many other techniques available nowadays such as visualisation whereby the system provides graphic illustrations of, for example, relationships within a case and there is also sound software whereby recordings, phone messages etc. can be searched. Furthermore, of course there is predictive coding or technology assisted review (TAR) which is one of the hottest topics in the world of eDiscovery right now. Without doubt it is a subject on its own and I am busily preparing a very basic overview of TAR for the SA market which will be published next.

I must make it clear and repeat that what I have described is just a basic snapshot of some of the things that can be done. Each of these topics merits a fuller description in its own right but that is not my aim here. I want to continue to educate and inform those involved in cases and investigations in SA proving that technology materially assists by saving time and money AND ensures that the right documents are found and not missed. As ever it needs good preparation by the lawyers/investigators and considerable liaison with the correct service providers.