Unicode discussion
Advertisement

Introduction[]

Encoding of Malayalam chillu is among the most debated topics in Unicode Indic list. UTC assigned Eric Muller to study various arguments and submit a final proposal on this issue. As per Eric's report, on May 19th, 2006 UTC has decided to encode chillu atomically and placed that in the pipleline.

In August, 2006, Kerala Chief minister wrote to Unicode consortium that his government is studying Malayalam encoding issues, specifically the chillus. He asked UTC to go slow on the chillu standardization track. Meanwhile, he has promised to form a public consensus. As per his request, voting on chillus is delayed in UTC.

In L2/06-261 Naga Ganeshan proposes an alternate way to encode chillus. His proposal is to use MALAYALAM CONSONANT SIGN CILLU to produce all chillus. For example, Chillu-RA/RRA would be encoded as <RA + CHILLU SIGN>. Problems with this proposal are listed here.

Issues[]

ര്‍‌ represents both RA and RRA[]

Malayalam-one-chillu-4-two-chars

Keralapanineeyam: peethika: 4.varnnavikaarangal

The lexicographic analysis by A.R.Rajarajavarma in Keralapanineeyam reveals that the ര്‍ represents both the base characters RA and RRA. It is a well established fact that ര്‍ represents the base character RA(ര) in many Malayalam words as seen in the following example joins:

അവര്‍ + അല്ല > അവരല്ല
മലര്‍ + അല്ല > മലരല്ല

Simillarly, ര്‍ represents RRA(റ) also in some others:

ഞായര്‍ + അല്ല > ഞായറല്ല
ചാവേര്‍ + അല്ല > ചാവേറല്ല
പയര്‍ + അല്ല > പയറല്ല
പെരിയാര്‍ + അല്ല > പെരിയാറല്ല

While above words are of Dravidian origin, lots of words with Chillu-RRA are of Arabic or Urdu origin:

കരാര്‍ + അല്ല > കരാറല്ല
ഖദര്‍ + അല്ല > ഖദറല്ല
Malayalam-chaaver

ചാവേര്‍ in dictionary by C.Madhavan Pillai

Malayalam-njaayar

ഞായര്‍ in dictionary by C.Madhavan Pillai

Malayalam-khadar

ഖദര്‍ in dictionary by C.Madhavan Pillai

Etymology of ARR's example[]

Consider the example given by A.R.Rajarajavarma for this: കാര്‍മേഘം /kaarr_mEgham/ (fourth word in the image). Sabdatharavali says etymologically, adjective കാര്‍- is derived from കരു and കറു. ARR specifies കരു as the very basic root of all the related words. In Sabdatharavali the കാര്‍- is described as a form of കറുത്ത or കറുപ്പ് - meaning 'black' (never as the alternate ര-form: കരി). All these statements will be true only if, കരു generates കറു which in turn generates കാര്‍-.

Thus completing the evidence that the base letter of ര്‍ in this example is RRA(റ).

Transliteration of ര്‍ as /r/ in foreign texts[]

Some foreign authors like Daniels/Bright have transliterated കാര്‍‌മേഘം by single 'r' - meaning ര. However, there are inherent problems with considering foreign text (w.r.t. Malayalam) as the most authoritative references. It is like looking at English-Arabic dictionary to study about English. It is true that, in a reputed text, the basic principles would be correct. However, finer details might be missing. Reason being, only non-natives will refer that text. They wouldn't notice the finer problems and thus no feedback would reach the publisher for correction. So it could be argued that a native language studies must supersede other language studies. Here in this case, ARR should supersede Daniels/Bright.

With the same argument, Gundert's dictionary cannot be taken as the authentic book on Malayalam grammar or lexicon. First of all, he was a foreigner. Second, it was written 130+ years back, without much studies on the root of the language. Later, many scholars have defined the grammar and structure of the language. Its importance of his dictionary is that it is the first comprehensive one. In that sense, he is for Malayalam what Thomas Blount for English. That is, his dictionary only deserves the authenticity comparable to Glossographia.

Collation order of ര്‍ in dictionaries[]

Dictionaries give you collation ordering. That does not prove much about the glyph's base character. For example, two code points can be different only in tertiary numbering. That would put them very close or indistinguishable in a collation sorting.

Also, dictionaries differ a lot in the assumed collation ordering. There are many variations exist right now. For example, in some, ല് used as a half-ല as in പില്ക്കാലം as well as ത as in വാത്സല്യം.

Native user behaviour[]

In this context one more thing to be noted. Malayalees always pronounce ര്‍ as RRA_dead; not as RA_dead. So, whenever, they need to put RRA_dead in writing they invariably use this chillu.

Possible solutions and their implications[]

Encode ര്‍ as chillu-ര only[]

That is, only RA + CILLU SIGN will form ര്‍. This would cause the incorrect Unicode spelling for ഞാര്‍, ഞായര്‍, പയര്‍, വയര്‍, കാര്‍മേഘം etc. Since soundness is the most important property of an encoding system, this scheme should be rejected.

Both RA and RRA with CILLU SIGN form ര്‍[]

This solution suggests that, ര്‍ can be written in two ways: <RA, CILLU SIGN> and <RRA, CILLU SIGN>. That means, a document (eg: a wiktionary.org document) written by multiple people using various inputting tools can quite possibly have different spellings for the same rendering of a word. The reader or writer will not be able to ensure spelling correctness by means of visual inspection.

Along with this soundness issue, this solution opens a serious IDN security issue. For example, let there be a site called www.ഞായര്‍.com in which ര്‍ is written using <RA, CILLU SIGN>. An attacker would spoof this address by giving www.ഞായര്‍.com with <RRA, CILLU SIGN> and redirecting web traffic to his malicious website.

Same problems exist if we allow both <RA, CILLU SIGN> and <RRA, VIRAMA> to form ര്‍.

Notes[]

Fallback for Chillu-KA[]

When chillus are encoded atomically, claim of a font about supporting Malayalam Unicode can be easily verified - It has to implement all the characters in the Malayalam Code chart.

When Chillus are implemented by a chillu sign, font designer is free to choose a subset from the collection of all Chillus. This scenario would be similar to the selection of various conjuncts to be implemented in a font. If one leaves out some chillus there can be problems.

Imagine the scenario in which a traditional font has implemented chillu-KA and a modern font does not. Since chillu-KA is not used contemporary texts, this is a probable scenario. So, the rendering engine needs to fallback to a reasonable output when font has cmap entry for cillu sign but doesn't support cillu form for <KA, cillu sign >. Following are its choices:

Possible solutions and their implications[]

VIRAMA fallback for Chillu-KA[]

This solution suggests that, ക് can be written in two ways: <KA, VIRAMA> and <KA, CILLU SIGN>. That means, a document (eg: a wiktionary.org document) written by multiple people using various inputting tools can quite possibly have different spellings for the same rendering of a word. The reader or writer will not be able to ensure spelling correctness by means of visual inspection.

Along with this soundness issue, this solution opens a serious IDN security issue. For example, let there be a site called www.ദൃക്‌‌സാക്ഷി.com in which ക് is written using <KA, VIRAMA>. An attacker would spoof this address by giving www.ദൃക്‌‌സാക്ഷി.com with <KA, CILLU SIGN> and redirecting web traffic to his malicious website.

A separate vertical diacritical tail as fallback[]

Since this diacritical vertical tail is not a native sign in Malayalam, this can not be read by a reader. Thus a correct word written using chillu-KA in traditional font will become unreadable in a modern font. This exactly what we need. This will inform the reader that a special form of KA that is not possible in the current font is written in the text.

Bloating of Malayalam Code page[]

One of the argument for encoding a chillu sign is that there are so many Chillus and that it will bloat Malayalam code page. This argument is exaggerated. Malayalam has only 7 chillus. Those are for following letters: ന, ണ, ല/ത, ള/ഴ, റ/ര, ക, യ.

Occasionally in older texts, conjunct forms for Consonant + Visible Virama are found. They should not be confused as chillus.

Naming of ര്‍[]

Regarding ര്‍, there is generalization/ambiguity embedded in the deep structure of the language itself. We know that ര് in most of the words is Chillu-RA and in some words it is Chillu-RRA. At the script level, Malayalam does not provide anyway to resolve ര് between the RA and RRA.

Suppose we have named ര് as Chillu RA. Then for the word ഞായര്, Unicode will have to describe that the component letters includes Chillu RA. That is wrong. It has to be Chillu RRA. But we got only encoding for ര്. So it has to be named some thing meaning 'chillu of letters RA and RRA'.

Unfortunately, Unicode cannot be better than the script.

Conclusion[]

Polyvalence of certain chillus prevent CILLU SIGN from being a valid proposal for producing Malayalam chillus. Fallback of CHILLU SIGN also has to be given consideration.

Advertisement