Wmt16
Hugging FaceDataset Card for "wmt16" Dataset Summary Warning: There are issues with the Common Crawl corpus data (training-parallel-commoncrawl.tgz): Non-English files contain many English sentences. Their "parallel" sentences in English are not aligned: they are uncorrelated with their counterpart. We have contacted the WMT organizers, and in response, they have indicated that they do not have plans to update the Common Crawl corpus data. Their rationale pertains… See the full description on the dataset page: https://huggingface.co/datasets/wmt/wmt16.
Interesting queries to try
Columns
- translation mixed