For decades, machine translation between natural languages fundamentally relied on human-translated documents known as parallel texts, which provide direct correspondences between source and target sentences. The notion that translation systems could be trained on non-parallel texts,
independently written in different languages, was long considered unrealistic. Fast forward to the era of large language models (LLMs), and we now know that given their sufficient computational resources, LLMs exploit incidental parallelism in their vast training data, i.e., they identify parallel
messages across languages and learn to translate without explicit supervision. LLMs have since demonstrated the ability to perform translation tasks with impressive quality, rivaling systems specifically trained for translation. This monograph explores the fascinating journey that led to this point,
focusing on the development of unsupervised machine translation. Long before the rise of LLMs, researchers were exploring the idea that translation could be achieved without parallel data. Their efforts centered on motivating models to discover cross-lingual correspondences through various
techniques, such as the mapping of word embedding spaces, back-translation, or parallel sentence mining. Although much of the research described in this monograph predates the mainstream adoption of LLMs, the insights gained remain highly relevant. They offer a foundation for understanding how and
why LLMs are able to translate.