Edit Distance has always been a great metric to measure the changes made to something, according to at least one person (me). In our MT world, we are usually talking about the MT output as a starting point, and a final version of the target language as the end point. This target language can be a post-edited version created from the MT output, or it can be a purely human translation created from scratch, without seeing the MT output. The Edit Distance measures the changes from starting point to end point. Edit Distance can be a metric for quality: if your starting point is of good quality, it needs few changes to get to the end point. So a low Edit Distance is good. It could also be seen as an indicator of productivity: if you need to make few changes, your work will tend to be faster than if you have to make lots of changes.
Edit Distance is probably at the core of nearly all known MT quality scores such as BLEU, TER and others. All of them compare an initial version to a final version and measure what changed. But these metrics are adding more complex features to try to be closer to human evaluations. Edit distance may just be the simplest of all metrics.
One word about all scores: whatever score you like, that score is usually much better when used to compare things than to provide absolute verdicts. Are you comparing multiple MT engines? Or different versions of one MT engine? You may get reliable results saying that engine X is better than Y or that version 2 is better than 1, or that Neural is better than Phrase-based. Or you may find out that Transformers is the best movie ever is a better technology than other NMTs. But the absolute statement “this MT is good”, based on any score, has been usually harder to trust.
Now that we know where we are, let’s go back to simple, and take a look at Edit Distance.
The core of Edit Distance calculation is an algorithm called Levenshtein distance, which finds the minimum number of changes (additions, deletions or substitutions) to change something into something else.
This is how it works:
Let’s say you want to change characters. If I want to change Rose > Violet (you know where this is going):
- Change R to V = Vose
- Insert an i = Viose.
Don't do anything to o. - Change s to l = Viole
Don't do anything to e. - Add a t = Violet.
The Edit Distance is 4 for a total of 4 operations, or 4 characters.
Funny enough, Violet to Rose is also 4: Violet > Riolet > Rolet >Roset > Rose. (And if you thought that “Edit Distance being the same in both directions” was really something “funny enough,” you may be a fellow nerd.)
Now let’s change words:
Roses are sometimes red > Violets are blue and you are sweet
- Change Roses to Violets = Violets are sometimes red
Don't do anything to are. - Remove sometimes = Violets are sometimes red
- Change red to blue = Violets are blue
- Add and = Violets are blue and
- Add you = Violets are blue and you
- Add are = Violets are blue and you are
- Add sweet = Violets are blue and you are sweet
The Edit Distance is 7 for a total of 7 operations, or 4 words.
The calculation is simple, but you see that there are already two variations in how we can calculate.
So one question already is here: What am I gonna watch after Game of Thrones ended? Should we use characters or words?
What is your take on this?
You can find the first two articles in this series at Going the Distance — Edit Distance 2 and Going the Distance — Edit Distance 3.