OpenAI is once again sparking debate surrounding its handling of training data. According to a report, external contractors were allegedly asked to upload real-world work samples from past and present jobs. These are not theoretical examples, but actual results from real-world professional practice. This incident demonstrates the lengths to which AI companies are now going to train their models with the highest quality, most practical data possible.
The artificial intelligence market is developing rapidly. Companies like OpenAI are under pressure to make their systems increasingly powerful. A key lever for this is training data that reflects real-world workflows. At the same time, this increases the legal and ethical risks. The current report makes it clear how thin the line has become between technological progress and problematic data usage practices.
Contractors should upload actual work
According to Wired, OpenAI and the training data company Handshake AI are asking external contractors to describe tasks they have performed in previous or current jobs. They are also asking them to upload examples of real work they have actually created themselves.
The presentation explicitly states that summaries or abstract descriptions are not desired. Instead, concrete results should be submitted. Examples of acceptable materials include Word documents, PDFs, PowerPoint presentations, Excel files, images, and even complete code repositories. The apparent goal is to train AI models with the most authentic working material possible.
Part of a larger strategy by the AI industry
According to the report, this approach is not isolated. Rather, it is part of a larger strategy within the AI industry. A growing number of companies are relying on external contractors to generate high-quality training data. The hope is that, through real-world examples from everyday work, AI models will be able to automate more office and knowledge work in the long term.
Tasks such as document creation, analysis, presentation preparation, or software development can be represented much more accurately with real work samples than with synthetic or highly simplified data.
Handling sensitive and protected data
According to the report, OpenAI instructs contractors to remove protected and personal data before uploading it. For this purpose, reference is made to a ChatGPT-internal tool called "Superstar Scrubbing," which is intended to help clean up sensitive content.
However, the responsibility largely lies with the contractors themselves. They must decide which information is confidential, which contains personal data, and what may be uploaded. The report does not describe a central pre-screening process by the company.
Legal concerns and criticism
This is precisely where the criticism begins. Intellectual property lawyer Evan Brown explained to Wired that any AI lab pursuing this approach is taking a significant risk. The reason lies in the high level of trust placed in external contractors.
They would have to assess for themselves what is confidential and what is not. Mistakes could lead to copyrighted works, internal company documents, or sensitive personal data ending up in training data. This could have legal consequences and permanently damage trust in AI providers.
No comment from OpenAI
According to Wired, an OpenAI spokesperson declined to comment. This leaves open the question of exactly what the internal review mechanisms are, how frequently such uploads occur, and how potential violations are handled.
The growing need for real-world training data
The report reveals the growing demand for realistic training data in the AI industry. OpenAI is apparently trying to align its models more closely with real-world work processes to further enhance their capabilities in office work. At the same time, significant legal and ethical risks arise when real-world work products from external contractors are used as training material.
Whether this approach is sustainable in the long term depends on how carefully sensitive data is handled and how clearly responsibilities are distributed. The discussion about training data, intellectual property, and data protection is therefore likely to intensify. (Image: rafapress / DepositPhotos.com)
- WhatsApp Channels bring the app under EU law
- Apple TV receives seven nominations at the 2026 PGA Awards
- Instagram data leak: 17.5 million users affected
- Grok scandal: US senators demand app store ban
- Apple Home: Support for the old version is nearing its end
- These new emojis could soon be coming to the iPhone
- Apple stock in focus: Evercore raises price target to $330
- Apple will provide insight into its Q1 2026 figures at the end of January
- Apple reveals executive salaries: Tim Cook's income in 2025
- Apple announces date and details for its 2026 annual shareholders meeting
- iOS 26.3: Second test for Apple's background security
- John Ternus is once again being considered as the next CEO of Apple
- AirPods of the future: Apple is researching smart gesture logic
- Apple wins antitrust lawsuit and patent dispute against AliveCor
- Apple under pressure: India insists on possible billion-dollar fine
- Blood glucose monitoring with the Apple Watch is getting closer
- Chase will become the new issuer of the Apple Card
- Apple TV dominates Actor Awards with comedy hit The Studio
- OpenAI launches ChatGPT Health with Apple Health integration
- WhatsApp introduces new features for organized group chats
- iOS 26.3 tests background security updates in the system
- Apple 2026: Are higher prices for Macs and iPhones imminent?
- iPhone Fold is getting closer: CES provides strong clues
- Apple makes Macs and iPads significantly faster on 5 GHz Wi-Fi
- Apple Vision Pro makes NBA games immersive for the first time



