Limitations of Computers as Translation Tools Part 1 As should be more than evident from other contributions to this volume, the field of computer translation is alive and well—if anything, it is now entering what may prove to be its truly golden era. But there would be no need to point this out if certain problems from an earlier time had not raised lingering doubts about the overall feasibility of the field. Just as other authors have stressed the positive side of various systems and approaches, this chapter will attempt to deal with some of these doubts and questions, both as they may apply here and now to those planning to work with computer translation systems and also in a larger sense as they may be connected to some faulty notions about language held by the general public and perhaps some system developers as well. Explaining such doubts and limitations forthrightly can only help all concerned by making clear what is likely—and what is less likely—to work for each individual user. It can also clarify what the underlying principles and problems in this field have been and to some extent remain. To begin with, the notion of computer translation is not new. Shortly after World War II, at a time when no one dreamt that word processors, spreadsheets, or drawing programs would be widely available, some of the computer's prime movers, Turing, Weaver and Booth among them, were already beginning to think about translation. (1) They saw this application mainly as a natural outgrowth of their wartime code-breaking work, which had helped to defeat the enemy, and it never occurred to them to doubt that computer translation was a useful and realizable goal. The growing need to translate large bodies of technical information, heightened by an apparent shortage of translators, was one factor in their quest. But perhaps just as influential was a coupling of linguistic and cultural idealism, the belief that removing `language barriers' was a good thing, something that would promote international understanding and ensure world peace. Two related notions were surely that deep down all human beings must be basically similar and that piercing the superstratum of language divisions could only be beneficial by helping people to break through their superficial differences. (2) Underlying this idealism was a further assumption that languages were essentially some kind of code that could be cracked, that words in one tongue could readily be replaced by words saying the same thing in another. Just as the key to breaking the Axis code had been found, so some sort of linguistic key capable of unlocking the mysteries of language would soon be discovered. All these assumptions would be sorely tested in the decades ahead. Some Basic Terms Some of the most frequently used terms in this field, though also defined elsewhere in the book, will help the reader in dealing with our subject. It will quickly become evident that merely by providing these definitions, we will also have touched upon some of the field's major problems and limitations, which can then be explained in greater detail. For example, a distinction is frequently made between Machine Translation (usually systems that produce rough text for a human translator to revise) and Computer Assisted Translation devices (usually but not invariably software designed to help translators do their work in an enhanced manner). These are often abbreviated as MT and CAT respectively. So far both approaches require the assistance or active collaboration to one extent or another of a live, human translator. Under Machine Translation one finds a further distinction between Batch, Interactive, and Interlingual Approaches. A Batch method has rules and definitions which help it `decide' on the best translation for each word as it goes along. It prints or displays the entire text thus created with no help from the translator (who need not even be present but who nonetheless may often end up revising it). An Interactive system pauses to consult with the translator on various words or asks for further clarification. This distinction is blurred by the fact that some systems can operate in either batch or interactive mode. The so-called Interlingual approach operates on the theory that one can devise an intermediate `language'—in at least one case a form of Esperanto—that can encode sufficient linguistic information to serve as a universal intermediate stage—or pivot point—enabling translation back and forth between numerous pairs of languages, despite linguistic or cultural differences. Some skepticism has been voiced about this approach, and to date no viable Interlingual system has been unveiled. Batch and Interactive systems are sometimes also referred to as Transfer methods to differentiate them from Interlingual theories, because they concentrate on a trade or transfer of meaning based on an analysis of one language pair alone. I have tried to make these distinctions as clear as possible, and they do apply to a fair extent to the emerging PC-based scene. At the higher end on mini and mainframe computers, there is however a certain degree of overlap between these categories, frequently making it difficult to say where CAT ends and MT begins. Another distinction is between pre-editing (limiting the extent of vocabulary beforehand so to help the computer) and post-editing (cleaning up its errors afterwards). Usually only one is necessary, though this will depend on how perfect a translation is sought by a specific client. "Pre-editing" is also used to mean simply checking the text to be translated beforehand so as to add new words and expressions to the system's dictionary. The work devoted to this type of pre-editing can save time in post-editing later. A more extreme form of pre-editing is known as Controlled Language, whose severely limited vocabulary is used by a few companies to make MT as foolproof as possible. Advocates of MT often point out that many texts do not require perfect translations, which leads us to our next distinction, between output intended for Information-Only Skimming by experts able to visualize the context and discount errors, and `Full-Dress' Translations, for those unable to do either. One term that keeps showing up is FAHQT for Fully Automatic High Quality Translation, which most in the field now concede is not possible (though the idea keeps creeping in again through the back door in claims made for some MT products and even some research projects). (3) Closer to current reality would be such descriptions as FALQT (Fully Automatic Low Quality Translation) and PAMQT (Partly Automatic Medium Quality Translation). Together, these three terms cover much of the spectrum offered by these systems. Also often encountered in the literature are percentage claims purportedly grading the efficiency of computer translation systems. Thus, one language pair may be described as `90% accurate' or `95% accurate' or occasionally only `80% accurate.' The highest claim I have seen so far is `98% accurate.' Such ratings may have more to do with what one author has termed spreading `innumeracy' than with any meaningful standards of measurement. (4) On a shallow level of criticism, even if we accepted a claim of 98% accuracy at face value (and even if it could be substantiated), this would still mean that every standard double-spaced typed page would contain five errors—potentially deep substantive errors, since computers, barring a glitch, never make simple mistakes in spelling or punctuation. It is for the reader to decide whether such an error level is tolerable in texts that may shape the cars we drive, the medicines and chemicals we take and use, the peace treaties that bind our nations. As for 95% accuracy, this would mean one error on every other line of a typical page, while with 90% accuracy we are down to one error in every line. Translators who have had to post-edit such texts tend to agree that with percentage claims of 90% or less it is easiest to have a human translator start all over again from the original text. On a deeper level, claims of 98% accuracy may be even more misleading—does such a claim in fact mean that the computer has mastered 98% of perfectly written English or rather 98% of minimally acceptable English? Is it possible that 98% of the latter could turn out to be 49% of the former? There is a great difference between the two, and so far these questions have not been addressed. Thus, we can see how our brief summary of terms has already given us a bird's eye view of our subject. Practical Limitations There are six important variables in any decision to use a computer for translation: speed, subject matter, desired level of accuracy, consistency of translation, volume, and expense,. These six determinants can in some cases be merged harmoniously together in a single task, but they will at least as frequently tend to clash. Let's take a brief look at each: 1. Speed. This is an area where the computer simply excels—one mainframe system boasts 700 pages of raw output per night (while translators are sleeping), and other systems are equally prodigious. How raw the output actually is—and how much post-editing will be required, another factor of speed—will depend on how well the computer has been primed to deal with the technical vocabulary of the text being translated. Which brings us to our second category: 2. Subject matter. Here too the computer has an enormous advantage, provided a great deal of work has already gone into codifying the vocabulary of the technical field and entering it into the computer's dictionary. Thus, translations of aeronautical material from Russian to English can be not only speedy but can perhaps even graze the "98% accurate" target, because intensive work over several decades has gone into building up this vocabulary. If you are translating from a field whose computer vocabulary has not yet been developed, you may have to devote some time to bringing its dictionaries up to a more advanced level. Closely related to this factor is: 3. Desired level of accuracy. We have already mentioned the former in referring to the difference between Full-Dress Translations and work needed on an Information-Only basis. If the latter is sufficient, only slight post-editing—or none at all—may be required, and considerable cash savings can be the result. If a Full-Dress Translation is required, however, then much post-editing may be in order and there may turn out to be—depending once again on the quality of the dictionaries—no appreciable savings. 4. Consistency of vocabulary. Here the computer rules supreme, always assuming that correct prerequisite dictionary building has been done. Before computer translation was readily available, large commercial jobs with a deadline would inevitably be farmed out in pieces to numerous translators with perhaps something resembling a technical glossary distributed among them. Sometimes the task of "standardizing" the final version could be placed in the hands of a single person of dubious technical attainments. Even without the added problem of a highly technical vocabulary, it should be obvious that no two translators can be absolutely depended upon to translate the same text in precisely the same way. The computer can fully exorcize this demon and insure that a specific technical term has only one translation, provided that the correct translation has been placed in its dictionary (and provided of course that only one term with only one translation is used for this process or entity). 5. Volume. From the foregoing, it should be obvious that some translation tasks are best left to human beings. Any work of high or even medium literary value is likely to fall into this category. But volume, along with subject matter and accuracy, can also play a role. Many years ago a friend of mine considered moving to Australia, where he heard that sheep farming was quite profitable on either a very small or a very large scale. Then he learned that a very small scale meant from 10,000 to 20,000 head of sheep, a very large one meant over 100,000. Anything else was a poor prospect, and so he ended up staying at home. The numbers are different for translation, of course, and vary from task to task and system to system, but the principle is related. In general, there will be—all other factors being almost equal—a point at which the physical size of a translation will play a role in reaching a decision. Would-be users should carefully consider how all the factors we have touched upon may affect their own needs and intentions. Thus, the size and scope of a job can also determine whether or not you may be better off using a computer alone, some computer-human combination, or having human translators handle it for you from the start. One author proposes 8,000 pages per year in a single technical specialty with a fairly standardized vocabulary as minimum requirements for translating text on a mainframe system. (6) 6. Expense. Given the computer's enormous speed and its virtually foolproof vocabulary safeguards, one would expect it to be a clear winner in this area. But for all the reasons we have already mentioned, this is by no means true in all cases. The last word is far from having been written here, and one of the oldest French companies in this field has just recently gotten around to ordering exhaustive tests comparing the expenses of computer and human translation, taking all factors into account. (5) As we can see quite plainly, a number of complications and limitations are already evident. Speed, wordage, expense, subject matter, and accuracy/consistency of vocabulary may quickly become mutually clashing vectors affecting your plans. If you can make allowances for all of them, then computer translation can be of great use to you. If the decision-making process involved seems prolonged and tortuous, it perhaps merely reflects the true state of the art not only of computer translation but of our overall knowledge of how language really works. At least some of the apparent confusion about this field may be caused by a gap between what many people believe a computer should be able to do in this area and what it actually can do at present. What many still believe (and have, as we shall see, continued to believe over several decades, despite ample evidence to the contrary) is that a computer should function as a simple black box: you enter a text in Language A on one side, and it slides out written perfectly in Language B on the other. Or better still you read it aloud, and it prints or even speaks it aloud in any other language you might desire. This has not happened and, barring extremely unlikely developments, will not happen in the near future, assuming our goal is an unerringly correct and fluent translation. If we are willing to compromise on that goal and accept less than perfect translations, or wish to translate texts within a very limited subject area or otherwise restrict the vocabulary we use, then extremely useful results are possible. Some hidden expenses may also be encountered—these can involve retraining translators to cooperate with mainframe and mini computers and setting up electronic dictionaries to contain the precise vocabulary used by a company or institution. Less expensive systems running on a PC with built-in glossaries also require a considerable degree of customizing to work most efficiently, since such smaller systems are far more limited in both vocabulary and semantic resolving power than their mainframe counterparts. Furthermore, not all translators are at present prepared to make the adjustments in their work habits needed for such systems to work at their maximum efficiency. And even those able to handle the transition may not be temperamentally suited to make such systems function at their most powerful level. All attempts to introduce computer translation systems into the work routine depend on some degree of adjustment by all concerned, and in many cases such adjustment is not easy. Savings in time or money are usually only achieved at the end of such periods. Sometimes everyone in a company, from executives down to stock clerks, will be obliged to change their accustomed vocabularies to some extent to accommodate the new system. (6) Such a process can on occasion actually lead, however, to enhanced communication within a company. -------------------------------------------------------------------------------- Alex Gross served as a literary advisor to the Royal Shakespeare Company during the 1960's, and his translations of Dürrenmatt and Peter Weiss have been produced in London and elsewhere. He was awarded a two-year fellowship as writer-in-residence by the Berliner Künstler-Programm, and one of his plays has been produced in several German cities. He has spent twelve years in Europe and is fluent in French, German, Italian and Spanish. He has published works related to the translation of traditional Chinese medicine and is planning further work in this field. Two more recent play translations were commissioned and produced by UBU Repertory Company in New York, one of them as part of the official American celebration of the French Revolutionary Bicentennial in 1989. Published play translations are The Investigation (Peter Weiss, London, 1966, Calder & Boyars) and Enough Is Enough (Protais Asseng, NYC, 1985, Ubu Repertory Co.). His experience with translation has also encompassed journalistic, diplomatic and commercial texts, and he has taught translation as part of NYU's Translation Certificate Program. In the last few years a number of his articles on computers, translation, and linguistics have appeared in The United Kingdom, Holland, and the US. He is the Chairperson of the Machine Translation Committee of the New York Circle of Translators, is also an active member of the American Translators Association, and has been involved in the presentatations and publications of both groups. http://language.home.sprynet.com/lingdex/limtran1.htm