Data's complexity and volume spur legal tech innovation

Analysis 17 Mar 2023

Europe Investigations, Litigation and Forensics

The world is constantly evolving. When was the last time you visited a library to undertake some research? What happened to those red telephone boxes that could be found in our town centres? In the last few decades improvements in technology that may have once seemed like science fiction have now become the norm. Our children have grown up in a connected world with the Internet, social media, and instant messaging. We are now constantly connected and the devices which we use daily are fundamental to our lives – whether to arrange a meeting, send an email, work or stay connected with friends and family. Long gone are the days when businesses were dependent on filing cabinets to store their critical business information. Our use of technology in every facet of our lives results in exponentially growing data points making the importance of a good data governance strategy and approach to any legal compliance request of upmost importance to keep costs under control.

Collecting data from the multitude of end points which exist is a challenge. Traditionally forensics would involve connecting directly to target devices to perform forensic imaging but fast forward to present time – and, since the pandemic, remote collections have become much more normal. With the adoption of hybrid working, approaches to collecting data have changed as technology has evolved to support more frequent remote collections and data collection directly from cloud-based systems such as Microsoft 365, which can streamline the collection process and avoid the bottlenecks traditionally encountered with data transfers.

The costs of data processing which can seem substantial are often minimal in comparison to the costs of the document review exercise whether responding to a regulatory enquiry or any litigation matter. In litigation matters there can be some control as to the extent of what is relevant for review due to requirements of proportionality. In other cases, such as a cyber breach, there can be terabytes of exfiltrated data in scope for review in order to meet regulatory obligations to notify data subjects. This warrants a sophisticated approach and the use of technology to ensure that costs can be kept under control.

In a recent cyber breach response case Control Risks was engaged upon there was almost 1TB of data located on a server that had been compromised. The costs to physically review all the data could have reached millions. Using a combination of data analytics to perform detailed file listing analysis excluding non-PII files, deduplication and focussed PII searches as well as data subject mapping, we reduced the document population and associated review costs by 92%.

With review costs being the most expensive and difficult to predict part of typical eDiscovery matters, it is easy to see why the use of analytics has become a core component of document review exercises. Email threading, which ensures that only the most inclusive part of email threads are reviewed, is perhaps the simplest and most utilised method which has now almost become a de facto standard technique used on every case.

The types of data encountered is also evolving. The adoption of Microsoft Teams, Slack and other collaborative applications in many organisations, which has accelerated since the pandemic, requires different techniques to make the data reviewable. Understanding the context of thousands of one line chat messages is impossible unless you have a way to stitch them together. Fortunately, technology is continually evolving to handle these newer forms of communication and allow long conversations involving multiple participants to be easily focussed on specific time periods and key participants. Non-relevant parts of conversations can be filtered out allowing just the relevant material to be produced.

When approaching any investigation, the development of good keywords to focus the review is usually a sensible starting point. These can easily be applied to emails and Microsoft Office documents and image content once it has been OCR’d but advanced processing now allows non-textual content such as video and audio files to be searched in the same way with the addition of transcription now being an option during processing. Similar processing can be performed to perform machine translation of content to allow foreign language content to be searched and reviewed by non-native language speakers. Keyword searches are now also supplemented with sentiment searches to detect not just what is being said but how it is being said.

We are all aware of how widespread the use of CCTV is and people have easy access to technology from Ring doorbells to mobile telephones that generate videos and images that can also form part of a document review exercise. Innovations in technology mean that this content too can be processed and searched using object detection to avoid the necessity to perform a manual review of lengthy video recordings and thousands of irrelevant images.

Another challenge with the increasing digitisation and use of technology in all aspects of life is the amount of sensitive and PII data that becomes part of any document review exercise. This in itself poses challenges when it comes to disclosing documents to other parties during legal proceedings with the necessity to review the documents for PII and privileged material and redact any sensitive information before exchanging documents to the other parties. This process can be greatly streamlined using technology that can now automatically redact sensitive information. Advanced platforms can redact thousands of instances of sensitive information in seconds and even apply inverse redactions to redact entire documents except for parts where someone is mentioned, a technique widely used in response to DSAR requests to keep costs under control.

The adoption of supervised learning is perhaps technology’s best answer to increasing data volumes and keeping review costs under control. Using machine learning to develop intel from review team coding decisions and scoring the likely relevance of the not yet reviewed document population allows for review prioritisation and can give considerable insight into the likelihood of finding relevant content in the documents that have not yet been reviewed. This machine input can prove invaluable to ensure the proportionality of any document review exercise.

Where are we now and what is coming next? Technology is getting better all the time in helping us make decisions about the documents that we need to review. AI models are being developed which will enable profiling human behaviour possible from the outset before even applying any keywords. What we learn from one investigation can be applied to similar investigations, particularly within the same organisation, supplementing the experience of our investigations team with the machine equivalent of a veteran detective surfacing what is relevant without the need for lengthy investigations. It is difficult to imagine what data types we might see in 20 years’ time, however I am confident our workflows and the associated technology will continue to innovate to ensure streamlined review processes.

*This article was first published with ThoughtLeaders4 FIRE Magazine