OpenAI pushes back against NYT request for millions of conversations, citing user trust
OpenAI is pushing back against a request from The New York Times for access to roughly 20 million ChatGPT conversations, warning that releasing such a dataset could expose sensitive user information and undermine public trust.
OpenAI is resisting a sweeping legal demand from The New York Times that seeks access to tens of millions of ChatGPT conversation logs. The Times, engaged in ongoing litigation with the company, initially requested more than one billion chats. That figure has since been reduced to roughly 20 million conversations covering a two-year period from late 2022 to late 2024. Even so, OpenAI argues that producing such a dataset would expose private user information and undermine confidence in the platform.
A core issue is data permanence. Until recently, users could delete their chat histories with the assurance that they would be erased within 30 days. That changed in May 2025, when a court-ordered preservation requirement forced OpenAI to retain even deleted conversations indefinitely in connection with certain legal matters. This created a substantial archive of highly personal interactions that were never intended to persist beyond short-term use.
OpenAI has told the court that it will attempt to strip identifying details from the records, but privacy and security researchers note that de-identification offers only limited protection. Large language model conversations often contain unique phrasing, contextual clues, or specific personal information that can allow individuals to be re-identified, particularly when cross-referenced with other datasets. The request therefore raises concerns not just about theoretical risk but about meaningful exposure of users’ confidential queries, professional materials, health information, or private thoughts.
The dispute underscores an emerging tension in the digital ecosystem. Content owners and rights-holders increasingly seek access to AI training data and user interactions to pursue legal claims or assess compliance. AI developers, by contrast, must balance these demands against privacy obligations and the trust of the millions of people who rely on their tools for sensitive tasks. Courts will now be forced to consider how far discovery rights extend when the records in question consist of personal conversations with an AI system.
Its significance extends well beyond a single lawsuit. The outcome of this case may shape future norms for data retention, deletion guarantees, and the governance of conversational AI. It signals a shift in how digital permanence is understood. It is no longer only about whether data is stored indefinitely but also about how it may later be compelled, repurposed, or reconstructed in ways users never anticipated. As legal battles over AI intensify, the question of who controls conversational data, and under what conditions, is becoming a defining issue of the next era of digital regulation.
