2025/08/14

Taiwan Today

Taiwan Review

A Computer that reads Chinese

May 01, 1986
The Chinese characters are "decomposed" for processing.
A microcomputer which can read, process, and store handwritten Chinese characters? Sounds like a prop for a science fiction fantasy. But it not only already exists, in two or three years it may be on the market—the "fifth-generation" Chinese computer.

For the time being, the emergence of this next generation is in process in a lab of the Department of Electronic Engineering of the Republic of China's National Tsinghua University, at Hsinchu "Science City."

A few shoes, parked in the corridor outside, indicate that something is going on behind the locked door—and, usually, always the same shoes. Only a very few outsiders have yet been invited to slip into a pair of proffered straw courtesy-slippers (with little cats embroidered on them) and enter into a room that looks satisfyingly like—and indeed is—an inventors' workshop.

The inventors-Professor Hsu Wen-hsing, ten graduate students, and six undergraduates—have been working here for one to three years. Their efforts to date are described in 10 papers; one, prepared specifically for outsiders, is still too full of technical terminology to be really comprehensible to non-experts.

What it is all about, however, becomes clear with a careful glance at the equipment around the room: The idea is to "feed" a computer handwritten Chinese characters—in this case, displayed or simply written down in front of a camera; or to substitute for the routine use of a keyboard, direct input from a camera and other devices. The computer, in turn, must learn to "read" or "understand" them.

"After years of research, I finally realized that it would be a deadend to try just to expand input through the keyboard," explained Professor Hsu. After two years of research in Japan, he carted his "optical character reader" with him to the 'Republic of China, the only place in the world making real progress in such an effort.

One process step involves electronic-camera "acquisition" of the characters.

The first four generations of Chinese computers, all married to keyboard input, have long since proved that the Chinese language characters can indeed be handled by computers—something that just a few years ago was considered almost impossible. Today, the advanced Chinese-language computers have noted limitations, but are comparable to systems handling English-language programs.

The world leader in such Chinese computers has long been IBM—who else? Its model 5550 computer was designed to store 1,000 commonly used characters directly within its own system. But recently, Multitech, a division of Taiwan's Sertek International Inc., announced a stunning breakthrough—"fourth generation" Chinese computers. Utilizing ROM (read-only memory) chips, this unique design (model DCS-570) can handle 30,000 characters per second. Such speed allows display in a text mode, a major step forward in computerization not only of Chinese, but of any non-European ideographic language.

But what about that deadend in keyboard input identified by Professor Hsu? The major limitation, of course, involves the sheer multitude and complexity of Chinese characters, since each character is usually just one word. Consider the 26 standard letters of the Latin alphabet: the Chinese "counterpart" totals something like 50,000 characters (not all of which, of course, are in common use).

A view of equipment in the image processing lab.

Just to be barely newspaper literate, one needs to know about 2,000, and to be college educated, about 5,000. Coding and fitting together a sufficient number of characters or character parts on a keyboard of reasonable size is just unsurmountable, a fact that worked against truly common use of Chinese-character typewriters long before computers were in general use.

In any case, Professor Hsu finally concluded that handwriting would continue to prevail among the Chinese, not only because of computer keyboard limitations, but because written Chinese is an embodiment of Chinese culture, combining artistic beauty of shape with complex expressions of meaning that are deeply rooted in ancient Chinese history.

Therefore, he says, computer recognition of handwritten characters is the real challenge for today's Chinese computer designers.

The difficulties are evident, beginning with poorly written characters and ending with the necessary simplification of characters which are just too complex. And somewhere between is the problem of everybody's personal style of writing, something Chinese admire and strive for.

It may sound like a joke, but he says that we may finally see the development of computers individually programmed to recognize the operator's personal script—just like having a personal robot-secretary.

But today, in his secluded laboratory, the task is still to teach the computer to read properly. Or, to be more exact, to find out how best to teach the computer to begin with, and then how to improve its consequent recognition rate and overall speed.

So far, the "student-computer" can "recognize" a single Chinese character in 20 seconds, with a 94.3 percent accuracy rate. But this, Hsu says, is far from the eventual expectation. From the experimental background already built up, at least 50 characters should be recognizable in one second, he holds. That notably beats the top keyboard-input achievements, now stagnating at 40 characters per minute.

Obviously, computer adaptations to handwritten characters are a harder task than for standardized printed forms. And by the time the computer can be organized to cope well with handwriting, computer adaptations to printed Chinese characters will be more advanced.

Fang-Hsuan Cheng, a doctoral student, focuses the camera on a board with some characters written on it. And instantly, the monitor flickers out the same characters. Yes, the computer has actually read and displayed them, but they look somewhat out of shape—maybe too stylized.

Professor Hsu now explains "image processing," "stroke ex traction," "radical extraction," and "radical matching," all of which boils down to a kind of exercise in "divide and conquer": taking apart the characters according to the rules for writing them, or writing backwards, from the end to the beginning.

Take the character for "word". It is composed of just eight strokes and is one of the simplest. The computer has to count the strokes, then determine the length, direction, and position of each of them in order to recognize the word itself.

More difficult characters, in two complex parts, are first dealt with via the part called the "radical;" the computer first extracts that portion and examines its position, direction, and length. Since a great number of compound characters share the same radicals, by using them as reference, it becomes possible to more quickly identify a large number of characters. At most, for example, 2,000 radical "reference patterns" are needed to recognize 13,000 characters. (For initiates, that also saves on memory requirements, since each pattern needs only 90 bytes, and the whole program takes up, at the most, 180K-bytes.)

"Some of our hardware is already years old and needs to be replaced," remarks Professor Hsu, pointing out a computer in the midst of one array of equipment. The grouping looked fairly disorganized to an outsider.

Doctoral student Cheng Fang-hsuan works with the fledgling optical recognition system.

Basically, there were just four "parts" to that system. First, the camera, looking like any home-video camera; second, an image processor; third, a pair of computers; fourth, a monitor.

For the more technical-minded, a communication link (IEEE-488) transmits image data from an image processor to a hard disk in computer PDP 11/23. The recognized result passes through an RS-232C, via the Chinese telegraphic code, to a personal computer with a data base of characters.

Is this developing "optical character reader" someday going to be efficient enough to make keyboard input totally obsolete? By no means, says Professor Hsu. The keyboard, apart from being more accurate, is potentially always much faster than handwriting—an experienced keyboard operator, for example, can average 50 words per minute, while even the fastest secretarial pen cannot write down more than 40 readable words per minute.

The keyboard adapts well to routine, and will continue to be very useful. But for most Chinese people, pen and paper will remain the main elements for writ­ten communication, and they will find "optical-read" computer systems in their future too.

Popular

Latest