Every modern dispute has a common sticking point: mobile and cloud data. Companies know they need it, counsel knows it is discoverable, and regulators assume you are retaining it correctly. Yet time and again, we see organizations attempt to cut corners through self-collection – particularly across Microsoft 365, Teams, SharePoint, and mobile devices – only to incur significantly higher costs later on.
The problem is complexity. The digital evidence landscape is now so intricate that even minor errors ripple into delays, processing failures, inflated review costs, or allegations of spoliation. Recent industry research highlights how collection complexity and defensibility – which self-collection often undermines – drive higher overall eDiscovery costs. For example, the 2025 eDiscovery Collection Update underlines how eDiscovery collection activities are becoming a critical and increasingly costly part of total discovery spend. Self-collection might seem to be the cheaper option, but that perception is misleading. In reality it often fails, but what does a defensible alternative really require?
The illusion of “simple” self-collection
On the surface, self-collection feels straightforward. Microsoft has made self-service exports incredibly accessible – a few clicks in Purview or an eDiscovery (Standard) search can produce user mail, chats, documents and logs. The trouble is that exported data is not the same thing as reviewable, defensible evidence.
Three recurring issues cause most of the problems we see.
1. Teams data is not “one thing”
Teams is less a single data source and more a mesh of chat streams, channel conversations, meeting chats, call logs, file references, SharePoint site structures, OneDrive links and shared channels. These often cross multiple tenancies.
A quick Purview export typically produces JSON blobs, message fragments and a web of links that review platforms struggle to handle. Legal reviewers lose threading, context and metadata. Counsel then asks the inevitable question: can this be reconstructed so it can actually be reviewed? That is when the real cost begins.
2. Hyperlinks quietly break the record
Teams, Outlook, SharePoint and OneDrive increasingly use links instead of attachments. So a single email might point to a versioned SharePoint document, a folder with restricted permissions, a file that has been renamed multiple times, or a collaborative document where edits are not stored as discrete versions.
When IT runs a quick export, those links often fail to resolve. They may point to sites that IT cannot access, versions that no longer exist, tenant-specific URLs, or locations belonging to external users. Downstream, this creates “document stubs,” references to content with nothing behind them. Reviewers are thus forced to track down documents individually, or to request secondary collections, undermining the cost savings that self-collection is meant to achieve.
3. Purview and E5 exports are not forensic preservation
Purview is a powerful governance tool, but it is not a forensic suite. Default exports were not designed to preserve evidence in the manner courts increasingly expect. Timestamps shift, file system metadata disappears, chat order changes, version histories flatten, and custodian context vanishes.
Once that happens, review platforms misinterpret data as duplicates, misthread conversations, or classify critical records as noise. Opposing counsel spots missing or inconsistent fields, and you find yourself defending the collection methodology instead of the merits of the case.
None of this is hypothetical; these are exactly the issues we are asked to resolve after the fact.
The silent risk: spoliation through self-collection
If these problems were only about efficiency, they would be frustrating but manageable. The larger problem is that self-collection introduces the single biggest legal risk of all: quiet loss or alteration of evidence.
Spoliation happens most often in three places.
Custodians “cleaning up” is the first. Custodians delete messages, empty folders, modify chats or remove apps without realizing that they are altering discoverable evidence. Mobile devices silently purge older texts or WhatsApp messages. Teams trims older chats by default. iMessage threads vanish during OS updates. Whenever a custodian handles their own data – even with the best intentions – they inevitably alter it.
Second, IT may copy files instead of creating an image. We regularly see PST exports instead of proper mailbox collections, file share copy and paste jobs that skip hidden or system files, screenshots of texts instead of forensic extractions, logical iPhone backups missing most of the chat metadata, or admin level mailbox forwarding instead of preservation. These shortcuts change timestamps, alter access dates, and strip metadata. That is spoliation.
Third, consumer “backup” tools add another layer of risk. There is an entire ecosystem of inexpensive tools marketed as text message extractors or phone backup utilities. They were built for parents or power users, not for litigation. Their output usually offers no hash values, no audit logs, missing attachments, incomplete threads and altered timestamps. Courts increasingly treat these as unreliable methods and exclude them.
When these patterns collide with state privacy obligations or statutes like CCPA, the stakes only increase. Organizations can find themselves defending not only their evidence handling but also their handling of personal data.
How self-collection drives cost across the entire lifecycle
Self-collection rarely fails in just one place. It reverberates through processing, review and production.
On the processing side, malformed Teams data triggers ingestion failures, processing exceptions, and dozens of hours of manual data repair. Hyperlinked documents become unresolved references that must be tracked down. Time that should be spent on analysis is wasted simply making data readable for the review platform.
During review, broken threads, missing attachments and malformed metadata force reviewers to work harder just to reconstruct what happened. That drives up reviewer hours, multiplies quality control cycles, requires re-running analytics, and extends timelines. Counsel also spends more time explaining the gaps to opposing parties or regulators.
When the time comes to produce, if your production includes link stubs or metadata that does not line up with what the platform expects, opposing counsel is likely to challenge it. In more contentious matters, they may move for adverse inference if they conclude that evidence is missing.
At that point, the “free” in-house collection has become the most expensive part of the case.
Why organizations still try to self-collect
Given these risks, why do so many organizations still rely on self-collection for modern data sources?
Three drivers show up consistently. First, there is a belief that “IT can handle it.” IT is outstanding at uptime, security and performance. It is not trained for forensic defensibility, chain of custody, or preservation integrity. These are different disciplines.
Second, there is overconfidence in Microsoft tooling. Purview is powerful, and E5 licensing puts export capabilities at everyone’s fingertips. That convenience can create a false sense of sufficiency.
Third, there is budget pressure. Collection looks like an easy place to save money. A forensic estimate is seen as optional rather than as risk mitigation. Ironically, once you add the cost of cleaning up a failed self-collection, recollecting, and defending the process, doing it properly from the outset almost always costs less.
The solution is not to abandon Microsoft’s tools or internal capabilities. It is to recognize where they stop being sufficient for litigation or regulatory scrutiny, and where a forensic process needs to begin.
What a defensible collection really looks like
Courts today are less interested in whether you used a particular vendor than in whether your process was complete, transparent and repeatable. In practice, that means a defensible collection should meet four tests.
First, it must demonstrate forensic preservation. That includes hashing and logging at each step, leaving metadata unaltered, and capturing all relevant systems in a way that can be verified later. Second, it must allow full context reconstruction. Teams threads, SharePoint versions, mobile chats and cloud logs need to be captured and parsed in a way that review platforms can actually use, rather than turned into a pile of partially readable exports. Third, it must maintain a clear chain of custody. Every touchpoint is documented, every action is repeatable, and every artifact is reproducible. Fourth, there must be expert interpretation. A qualified expert should be able to explain where the data came from, what was and was not captured, why the method is reliable, and how it aligns with legal requirements.
This is the level of clarity that courts now expect when they ask, “How did you get this data, and what might be missing?”
The real cost: self-collection vs controlled collection
Self-collection looks cheap because the upfront cost is zero. There is no invoice for an IT export, or a custodian forwarding emails to Legal. The hidden cost drivers show up later when malformed exports need to be reprocessed, when gaps require repeat collections, and when review teams must spend additional time stitching conversations and documents back together. Hosting costs increase as duplicative or malformed data accumulates. Motion practice over incomplete or inconsistent productions adds more expense and risk. On top of that, there is the heightened exposure around spoliation, and the expert fees required to defend the original collection decisions.
By contrast, a controlled forensic collection typically involves a single predictable, agreed scope and one clean, complete dataset that is ready for review. The data arrives in a format that the platform can ingest, with intact metadata and context, so there is no need for rework. Disputes over method are far less likely, and discovery delays driven by collection errors are avoided. In litigation, efficiency and predictability matter, but defensibility matters even more. When you weigh the full lifecycle of a matter, the cost of doing it right at the start is almost always lower than the cost of repairing it later.
Academic research on eDiscovery increasingly emphasizes that the collection phase is foundational to the integrity, defensibility and efficiency of the entire discovery process. Kato Nabyre’s academic paper E-Discovery and the Language of Digital Evidence (2025) explores how inadequate or informal data practices introduce technical and legal risks that often require remediation later, increasing overall cost and complexity. By contrast, supervised or controlled collection methods reduce downstream re-work, discovery disputes and regulatory scrutiny, ultimately lowering total discovery costs despite higher upfront effort.
How Control Risks approaches these challenges
At Control Risks, we treat digital evidence the same way we treat physical evidence. It is handled under a chain of custody; preserved for courtroom scrutiny; and collected with repeatable forensic methods across Microsoft 365, mobile devices, cloud platforms and enterprise systems. Our goal is to ensure that when your data is questioned, the process behind it stands up to challenge.
In practice, that means our teams are often called in to rebuild broken Teams exports so they can be reviewed properly, recover missing SharePoint or OneDrive content that sits behind dead hyperlinks, reconstruct mobile message histories from complex device ecosystems, and normalize Slack or Google Workspace structures so they can be meaningfully compared to other evidence. We also work to recreate version histories and activity logs, when those are central to the fact pattern in a dispute or investigation.
Just as importantly, we advise clients on when not to collect. In many matters, preserving data in place until the scope is clear will reduce cost without sacrificing defensibility, provided it is done with the right controls and documentation. The difference in philosophy is simple. Self-collection creates questions. A disciplined forensic collection process produces answers.
In discovery, shortcuts are an illusion
The legal system does not care whether the data was collected cheaply. It cares whether it was collected defensibly. In an era where one missing Teams thread can shift the outcome of a dispute, the cost of doing it right is always lower than the cost of fixing it later.
If your organization is relying on self-collection for modern data sources, or if you are unsure of whether your current approach can withstand scrutiny, now is the time to reassess. Digital evidence is unforgiving. Your collection process does not have to be.
Article written by: Lucas Woodland