Translation rating systems for VNs revisited

Previously I’ve taken a look at the problem of “The people who most need translations can’t judge translations, and the amateur translations out there can be of good or dismal quality, so how can we translators (as people who can judge), try to communicate things clearly and simply”.

In 15 second summary, I wrote that there are two things we needed: 1) a web of trust and reputation, a person’s judgments are only as valuable as the weight of their generally accepted skill (translators considered ‘good’ have more weight that ‘bad’ ones and ‘unskilled’ ones, etc), and 2) a common vocabulary, not a bag of vague words like “readable” “fluent” “good” where everyone has their own opinion of what those words mean.

Now, I’m going to go back and poke at a few weak places in the system.

“Good” Rule-breaking

When I laid out the rating scale of A to F, I was really focusing on the “minimal criteria for an acceptable translation.” This is assuming that all of us here in the community are bad at what we do, and the most important work is identifying the works that do the most harm by severely misrepresenting the original text.

It makes the assumption that, from an information standpoint, there is “One True Story” that’s being translated: the “information content” I mentioned previously. How you choose to tell the story is deliberately left out of it, use high prose, draw diagrams, make wavy gestures, it didn’t matter.

When we get higher in the realm of skill and quality, this assumption is patently false. Just as a piece of text can be read and interpreted different ways, a translator has to pick an interpretation as they write. They can try to encompass as many different layers of interpretations and readings as they can, but they can’t cover it all, or even most of them.

“The cat went home” may sound like a straightforward translation, but the line can take on new meanings if it’s worded different, such as, “The cat returned (home, from context)”. Depending on the story setting, one, or the other, or something else entirely can work well. You can’t even say that one is right or wrong, just whether one is better than another for some reason relating to the story.

The raw “informational meaning” of the text isn’t everything, and as you start drawing on imagery from the target language, you’ll often be confronted with a choice of how you want to deal with a passage where “replacing it with something else that’s similar” is a valid option.


I tossed aesthetics to the side previously, saying “just tack on a number 1-5 for your judgment of how aesthetically pleasing it is.” It doesn’t get much more subjective than that, and there are certainly reasons why one person’s ‘unreadable’ is still useful for someone else. However, upon reflection, the problem of aesthetics is a very deep one.

The thorniest issue is the issue of grounding the scale. There’s probably two schools of thought on where to look for grounding. The first is grounding it on some notion of personal judgment of “prettiness”. This is what I had in mind originally when I thought of the aesthetic scale. If it looks pretty to you, reads like satin and silk, give it a 5. Essentially, you decide if it sounds good and leave it at that.

There’s a second school of thought. You can ground the thing against the original work. The theory being that if the original work reads like the pinnacle of stylized poetry, the translation should be close to that. If it’s not close, by the judge’s metric, then it has failed aesthetically speaking.

The second proposal has it’s merits, in that an experienced translator probably is thinking about it. However, whether they follow it or not is a conscious choice. Look at the different variations of translations on the Bible, they all take various tones for various reasons, so it’s an open question which you want to judge as your 5.

Bringing in the web of trust

As for other issues, “Aesthetics” probably shouldn’t be the catchall miscellaneous bin, cut off from “Information”. That division was an arbitrary one I put up to create a way to talk about minimal requirements, and as you go up the ladder, both are critical. The aesthetics category could be divided into multiple pieces, how much someone likes a style, how similar it is to the original style, how well the translator executed their choice of style, and so on.

However, breaking it down would make the scale too complicated. No one wants to take out a grading rubric when doing quick judgments on a work. So, I’m going to propose a bit of a clarification to the murky aesthetic scale.

Attach a 1-5 to the letter grade of A to F, where the number is how much you like the aesthetics of the work. You can like it, or not, for any of the reasons discussed above, or from gut feel.

Why, after all that debate above did I just throw it all out? Think about where the ratings come from, and the problem I wanted to tackle at the start. Common terminology is being addressed by the scale. The other half is the web of trust and reputation. Anyone can grade anything A5, however, we should consider ratings from established translators to be ‘worth more’ because they are better equipped to judge than the average joe. Then, abusing the power of averaging over many raters, interesting things fall out.

An established translator would probably have thought about the issues of aesthetics, picking a style, breaking the information content, etc.. In that case, it would factor into their rating. Meanwhile, “average joe” raters wouldn’t be considering such issues since they can’t, and so would most likely be judging what they can, readability, and prettiness in their eyes.

So now, we can extract the information of “readability” from one group of ratings, and “fidelity to the original’s style, all things considered” from another group’s. So long as you can tell the groups apart, it can fall into place.


Commenting is closed for this article.