OpenAI no longer has to preserve all of its ChatGPT data, with some exceptions

OpenAI Wins Relief in ChatGPT Data Preservation Case With NYT Copyright Lawsuit

Court Lifts Broad Data Retention Order in Landmark AI Copyright Case

A federal judge has significantly narrowed the data preservation requirements imposed on OpenAI in its ongoing copyright infringement lawsuit with The New York Times, marking a pivotal development in the legal battle over AI training data. According to court documents, Judge Ona T. Wang’s October 9 order terminates the previous mandate that required OpenAI to indefinitely preserve all ChatGPT output logs, instead implementing a more targeted preservation framework.

From Blanket Preservation to Focused Data Retention

The original preservation order, issued in May 2024, compelled OpenAI to maintain comprehensive records of ChatGPT interactions to facilitate the NYT’s investigation into alleged copyright violations. The newspaper had filed suit in late 2023, claiming that OpenAI trained its AI models using the publication’s copyrighted content without permission or compensation.

OpenAI had vigorously contested the broad preservation requirement, arguing in court filings that it constituted regulatory overreach and posed significant privacy risks to ChatGPT users. The company maintained that the order would force retention of massive amounts of user data far beyond what was reasonably necessary for the litigation.

New Framework Balances Legal Needs and User Privacy

Under the revised order effective September 26, OpenAI is no longer required to preserve all ChatGPT output logs moving forward. However, the ruling maintains several key exceptions: any chat logs already preserved under the previous order remain accessible, and OpenAI must continue retaining data associated with ChatGPT accounts specifically flagged by the NYT as potentially relevant to the copyright claims.

Judge Wang’s decision establishes a more balanced approach that acknowledges both the evidentiary needs of the case and the privacy concerns raised by OpenAI. According to the ruling, the NYT retains the ability to expand the number of flagged user accounts as its investigation progresses through the previously preserved records.

Broader Implications for AI Industry and Copyright Law

This development comes amid increasing legal scrutiny of how AI companies train their models and handle user data. The case represents one of the highest-profile tests of how copyright law applies to generative AI systems trained on publicly available content.

Legal experts following the case note that the modified preservation order could set important precedents for how courts balance discovery requirements against the operational burdens and privacy implications for AI companies. As artificial intelligence systems like ChatGPT become more integrated into daily life, the ruling may influence how similar data preservation issues are handled in future AI-related litigation.

Ongoing Legal Battle and Industry Impact

The copyright lawsuit continues to unfold as both parties prepare for further legal proceedings. The NYT’s original complaint alleged that OpenAI’s models could reproduce substantial portions of Times articles when prompted, suggesting extensive use of copyrighted material during training.

Meanwhile, the AI industry watches closely as the outcome could significantly impact how companies approach training data acquisition and retention policies. Many AI firms face similar copyright challenges, and the resolution of this case may inform broader industry practices around data management and legal compliance.