It was recently revealed by The Atlantic that Meta, the parent company of Facebook, Instagram and WhatsApp, has used LibGen, a vast database of pirated material, to train its AI models.
https://www.theatlantic.com/technology/archive/2025/03/search-libgen-data-set/682094/