I am compiling these notes as I explore the Tamil script தமிழ் எழுத்து tamiḻ eḻuttu as used for the Tamil language. They may be updated from time to time.
The page contains brief notes on general script features. See also the companion document, Tamil Character Notes, which describes the characters used in Tamil script one by one.
For more detailed information, especially about the history and phonology of the Tamil script, follow the links in the text and at the bottom of the page. When you see red text (examples of Tamil) you can click on it to reveal the component characters.
The script is an abugida, ie. consonants carry an inherent vowel sound that is overridden using vowel signs.
Text runs from left to right.
Example of Tamil:
மனிதப் பிறிவியினர் சகலரும் சுதந்திரமாகவே பிறக்கின்றனர்; அவர்கள் மதிப்பிலும், உரிமைகளிலும் சமமானவர்கள், அவர்கள் நியாயத்தையும் மனச்சாட்சியையும் இயற்பண்பாகப் பெற்றவர்கள். அவர்கள் ஒருவருடனொருவர் சகோதர உணர்வுப் பாங்கில் நடந்துகொள்ளல் வேண்டும்.
Consonants carry an inherent vowel ʌ, usually written a.
There are less consonants than in other Indic scripts. Tamil has no aspirated consonants, and symbols are allocated on a phonemic basis, rather than phonetic. The latter means that க, for example, may be pronounced as the allophones k ɡ x ɣ or h, according to where it appears relative to other sounds in a word, but its pronunciation doesn't change the word.
Tamil is diglossic: the classic form is preferred for writing and public speaking, and is mostly standard across the Tamil-speaking regions; the colloquial, spoken form differs widely from the written.
Because the core set of Tamil consonants is quite a lot smaller than that of most indic scripts, Tamil adds additional 'grantha' letters to cover sounds in Sanskrit and English.
For compatability with modern communication it also presses U+0B83 TAMIL SIGN VISARGA (called āytam ஆய்தம்) into service to produce fricative sounds from stops. ஃப gives f, eg. ஃபீசு fiːsɯ (fees). ஃஜ gives z, eg. ஃஜிரொக்ஸ் ziroks (Xerox).
The The Unicode Standard v5.2, p289, also describes a method of extension that uses superscript letters to represent transcriptions of languages such as Sanskrit and Saurashtra, eg. ப² = pha, ப³ = ba, and ப⁴ = bha.
Plosives are unvoiced if they occur word-initially or doubled. Elsewhere they are voiced, with a few becoming fricatives intervocalically. Nasals and approximants are always voiced.
The consonants are classified into three categories: vallinam (hard consonants), mellinam (soft consonants, including all nasals), and idayinam (medium consonants), which are important for the rules of pronunciation.
There are rules for the pronunciation of consonants, in particluar plosives, for the written form of Tamil, that make for complementary distribution. These rules break down to varying degrees when dealing with Sanskrit loan words and the colloquial spoken form of Tamil (particularly in northern areas). For more read Tamil phonology and [Krishnamurthi] pp23-28.
Consonant clusters are normally represented with a dot over the character(s) not followed by a vowel, called puḷḷi (the Tamil virama), rather than using conjunct glyphs, like most other Indic scripts. There are more conjunct forms in older versions of the Tamil script. The modern script has two common exceptions: க்ஷ kʃa and க்ஷ ʃri.
There are independent and combining forms of all vowels, except the inherent vowel, which has no combining form.
Independent vowel forms used to be used at the beginning of metrical groups, but now they are used at the beginning of a word, eg. இந்த inta (this), but also internally to represent 'overlong' vowel sounds, eg. பெரீஇஇய periːiiya (reeeeally big).
Some vowel signs precede the consonant or consonant cluster, and others are represented by glyphs on both sides of it.
Some vowel signs produce significantly different ligated shapes as they combine with the base consonant.
Alternative vowel forms. The three two-part vowel signs can be written in two different ways. The single code point per vowel sign, is the preferred form and the form in common use for Tamil.
க 0B95 + ொ 0BCA ≡ க 0B95 + ெ0BC6 + ா 0BBE
க 0B95 + ோ 0BCB ≡ க 0B95 + ே0BC7 + ா 0BBE
க 0B95 + ௌ 0BCC ≡ க 0B95 + ெ0BC6 + ௗ 0BD7
Whichever approach you use, the vowel signs must come after the consonant or consonant cluster that they surround. In the case of multi-character vowel signs, the order is also important and should be as shown above.
Although modern Tamil uses fewer conjunct ligatures than most other indic scripts, there are still many ligatures needed for a Tamil font, mostly for combinations of base consonant and vowel sign.
See The Unicode Standard v5.2, pp 291-294, for a list and description of Tamil ligatures. You can also look up some of the slightly less common ligatures in the 'shape' view of the Tamil picker.
There are a set of Tamil numbers, but modern Tamil text typically uses Western digits.
The Tamil system inserts characters to indicate tens, hundreds, and thousands. For a description of the algorithm, see CSS3 Lists and Unicode Technical Note #21.
Tamil speakers tend to think of grapheme clusters containing consonant plus vowel as a single entity. In some cases, people want to process Tamil using these grapheme clusters as a single unit.
To assist with this Unicode provides named character sequences that apply standardised names to whole syllables. These can then be mapped to the private use area for applications wanting to work with Tamil in this way. See The Unicode Standard v5.2, pp 294-296. For normal Tamil data interchange, however, the standard codepoints should be used.
Western punctuation appears to be used generally.
The Unicode Standard v5.2, p 294 mentions that the danda and double danda are sometimes used, along with other unified punctuation in the Devanagari block
This is a list of main characters or character combinations needed for Tamil. Clicking on these characters will open a page in another window. If the character is underlined, the new page will display additional information about that character.
|Consonants||க ங ச ஞ ட ண த ந ப ம ய ர ல வ ழ ள ற ன|
|Grantha consonants||ஶ ஜ ஷ ஸ ஹ க|
|Independent vowels||அ ஆ இ ஈ உ ஊ|
|Vowel signs||ா ி ீ ு ூ ெ ே ை ொ ோ ௌ ௗ|
|Symbols||் ஃ ௺ ௹ ௳ ௴ ௵ ௶ ௷ ௸ ௐ|
|Numbers||௦ ௧ ௨ ௩ ௪ ௫ ௬ ௭ ௮ ௯ ௰ ௱ ௲|
Content first published 3 February, 2010. This version 2014-08-27 9:53 GMT
Copyright © 2010-2014 Richard Ishida, All Rights Reserved.