How Much Data is Needed for Your Investigation?

November 29, 2022

The more data that you collect in an investigation, the more time-consuming the investigation becomes.   Computer hard drives, smart phones, tablets, cloud storage devices and email servers are frequently a source of evidence that can contain an overwhelming quantity of data and open the investigation and subsequent litigation to questions as to the proper handling and review of the data that was collected.   In government investigations, it has become particularly challenging as reported recently in the Wall Street Journal.  White collar crime cases being prosecuted by the U.S. Attorney’s Office in the Southern District of New York (where many Wall Street cases are prosecuted) have reportedly been struggling to keep up with the exponential increase in the complexity of its investigations and the corresponding mountains of electronic evidence that its agents and prosecutors collect, process, load onto review platforms and then expend sometimes thousands of hours sifting through.  The situation in the Manhattan U.S. Attorney’s Office has reached the point where the current U.S. Attorney, Damian Williams, is calling it a crisis and has sought the assistance of DOJ headquarters.  In the meantime though, Mr. Williams has taken several steps to modernize his office’s ability to effectively manage the terabytes of data it collects  “rolling out a new electronic-evidence unit whose staff includes a lawyer specializing in document review. The office has also recently begun using a new tracking system for evidence.”

One outgrowth from the growing challenge of collecting large quantities of data to analyze is being more selective in the amount of data that is being collected.  While certain government investigations may not have a choice in the matter, private sector investigations have more latitude in how many email accounts and devices they collect and have far back on the calendar they choose to go to analyze what happened.  As a practical matter, electronic discovery costs in an investigation can easily run into the millions of dollars and clients and their counsel are well aware of how the costs can quickly spiral out of control.  It is a good idea to work together and decide what evidence is collected, from whom and how far back on the calendar the review should reach.  This will enable the review team to gauge the number of documents that are a part of the initial scope and estimate how many people will be needed and for what period of time. Having some idea of the expected costs of the investigation tends to lessen the anxiety associated with the process.

In a recent investigation, we initially sought to obtain the email accounts of nearly 20 individuals.  The data transfer was fraught with challenges and we ended up only receiving the email of 4 people … and over 4 million electronic files. The other 16 had a combined total of nearly 2 Terabytes of data.  This happy accident made the eDiscovery part of the investigation much more manageable.  As fortune would have it, the bulk of the evidence we collected came from one person’s email account and as a result, we did not need to revisit whether the investigation should be expanded to include other email accounts and devices.  What also helped drive efficiencies in the investigation was the use of artificial intelligence tools including machine learning and something called “predictive coding”.  After reviewing and categorizing a portion of the electronic evidence on a specialized email review platform, these platforms are often equipped with AI tools that examine which documents the review team has deemed relevant and then builds algorithms to identify additional electronic evidence that has attributes in common with the evidence already deemed relevant. My own experience is that this process of predictive coding is startingly accurate and saves a great deal of time and effort sifting through irrelevant evidence.  In another example of how smart tools can drive efficiencies, we noted in the course of the investigation, with the exception of the main subject, most of the people he was discussing illegal activity with were using their personal emails. We styled a search looking for all communications with this one individual and anyone with a Gmail, Yahoo, AOL, Outlook or other popular, personal email domains and discovered a cache of additional relevant emails that our initial key word searches had not yielded.  This process also helped provide a measure of negative assurance that our initial review caught most of the relevant files on the first pass.

Most internal investigations require some amount of computer forensics and email review using electronic discovery tools.  Be thoughtful about how you approach the collection and analysis of data.  Target the individual devices and email accounts most likely to yield the evidence you need. To guard against the possibility of the destruction of evidence you may later need, consider preserving a larger population of devices and email accounts for possible analysis later.

Electronic evidence is a vital part of investigations in both the private and public sector.  While the sheer volume of electronic data is growing exponentially by the day, the number of tools and techniques available to investigators to efficiently manage its analysis is keeping pace.  Embrace it and you may actually find that the electronic evidence part of your investigation contributed the majority of what you needed and in record time.



Leave a comment

Your email address will not be published. Required fields are marked *