Helix the Robot
Helix Helix
arrow_backAll datasets

Wmt16

Hugging Face

Dataset Card for "wmt16" Dataset Summary Warning: There are issues with the Common Crawl corpus data (training-parallel-commoncrawl.tgz): Non-English files contain many English sentences. Their "parallel" sentences in English are not aligned: they are uncorrelated with their counterpart. We have contacted the WMT organizers, and in response, they have indicated that they do not have plans to update the Common Crawl corpus data. Their rationale pertains… See the full description on the dataset page: https://huggingface.co/datasets/wmt/wmt16.

descriptionwmt16--de-en.parquet view_list500 rows cloud_downloadwmt16
boltOpen in Helix

Interesting queries to try

Columns

  • translation mixed

Login to Helix

Don't have an account? Sign up here

Sign Up for Helix

Already have an account? Login here