New Tools for Old Tasks: A Digital Approach to the Investigation of the Malay Language

Zuraidah Mohd Don and Gerry Knowles


Digital tools designed for linguistic analysis offer new ways of approaching old tasks, and they now routinely enable tasks which were formerly impossible. The work for this paper is based on MaLex, which is a collection of data tables and procedures designed to represent the intuitive knowledge of speakers of Malay, and provides the infrastructure for the solution of problems in linguistic analysis.

This paper reports the use of the MaLex parser to investigate the adjectival system of Malay, including superlatives and the formation of manner adverbials. Although linguists have always been able to identify possible syntactic rules and try them out on small datasets, the automatic parser is able to extract examples from a large corpus, and it is much more effective than a linguist in ascertaining the ordering of rules, and tracing their interaction. Since the examples are also syntactic constituents, they are the appropriate units for translation, and in this case they are translated into English. The phonological component concatenates the phonological representations of constituents to form higher level structures, and an extension which is planned but not yet completed is intended to increase the range of waveform annotations that can be used as input for linguistic analysis.

Malay is a suitable language for this research, because although it is under-investigated in relation to its importance as one of the main languages of ASEAN, it has extensive written records which make it possible to compile large corpora for research. A human linguist can get started on a very small amount of data, and the same is true of the approach pioneered by MaLex. For this reason, MaLex could prove to be a suitable model for the digital investigation of insufficiently researched languages.