Via the SIGIR FORUM comes this 2006 SIGIR Report that examines current issues and the future of multilingual information access (MLIA) . The report does a critical review of The Quaero Project and similar initiatives. Six sessions and multilingual participants shared research free from unrealistic expectatives, wishful thinking or SEO hearsays.
The keynowte speaker was David A. Evans, CEO of Clairvoyance Corporation (formerly professor of computer science and linguistics at Carnegie Mellon). Clairvoyance has substantial practical experience in multilingual applications for both Asian and European languages. The keynote position is summarized by this quote from its abstract:
“Despite the remarkable success of cross-language information retrieval (CLIR) and translingual information retrieval (TLIR) systems to perform on a par with monolingual IR systems in research and evaluation contexts, there has been relatively little commercial development (or success) of TLIR systems and applications. This is due, in part, to lack of demand in the marketplace, but also, in perhaps greater measure, to the special requirements that may be associated with TLIR applications – requirements that are not typically addressed (or assessed) in our research evaluations.”
In his presentation, David Evans presented a graph of current and projected worldwide demand for products for globalization, including multilingual search and software for machine translation. With a 5-year projected growth from 173 million US dollars (2004) to 263 million (2009), Dr Evans wryly observed
“These are not numbers which excite venture capitalists.”
Dr. Evans then described the failed attempts by his company and Japanese partners to introduce multilingual aspects into existing monolingual information management products.
His conclusions from these experiences were sobering:
1. In 2006, the market for multilingual globalization support is “not there yet”
2. Quality and scope of machine translation is a major gating factor
3. The demand for CLIR, per se, is low
4. To be successful today, CLIR Systems (already very complex) must be fashioned around “solutions” – integrated into systems that may need CLIR functionality only as a means to other ends
5. We must be (very) patient; or perhaps we should rethink our goals and refocus our applications”
In the discussion following the keynote, other participants from commercial firms offering similar products said that their organizations had reached the same conclusions.
MAIN THEMES EMERGING FROM THE WORKSHOP
From the workshop and the discussion by participants, a number of different themes emerged to both challenge researchers in this area and suggest new avenues of research and development:
1. Identification of the real world (commercial) use case? This was a main topic of the keynote paper and a running theme throughout the workshop.
2. The difficulty of technology transfer into existing applications (see the paper Braschler et al).
3. The need for new evaluation methodologies which evaluate the whole system including aspects relating to usage and not just system performance from the technical perspective
4. Following on from 3, the importance of replication of research results, implying a need for appropriate tools (see the paper on Data Curation)
5. Again related to 3, the need to study user behaviour – and to define reliable techniques for such studies (papers by He/Oard, and Clough et al)
6. The need to study the relationship between cross-language retrieval, machine translation and multilingual summarization (see presentations by Chin-Yew Lin & D.K.Evans))
7. The pressure to move studies from text to mixed media and “new” genres (see Ao Feng, Gey, Jones)
8. The need for digital content to be prepared with access in mind – and the need for markup. The potential of the semantic web for CLIR must be investigated (information processing and retrieval must move from words or features to concepts via markup)
Dr. E. Garcia