UTN #17: TAB to Unicode Conversion
[Unicode]   Technical Notes
 

Unicode Technical Note #17

tab to unicode conversion

Version 1
Authors P.Chellappan (chellappan@vsnl.com)
Date 23 September 2004
This Version http://www.unicode.org/notes/tn17/tn17-1.html
Previous Version none
Latest Version http://www.unicode.org/notes/tn17/


Summary

TAB is the official Tamil bilingual encoding scheme of the Government of Tamilnadu, which has the largest Tamil speaking population in the world. A vast amount of Tamil textual information in digital libraries, online newspapers, magazines etc., is available today in this encoding scheme. As Unicode is fast becoming the encoding by choice, there is a need for conversion from TAB encoded text to Unicode.

TAB is a glyph encoding scheme, while Unicode is a character encoding scheme. Hence there exists a one-to-one, one-to-many, many-to-one or many-to-many relationship between the Tamil alphabets in TAB and those in Unicode.

This note is split into two parts. The first part describes, in a simple C like pseudo code, how to determine the string sequence in TAB that goes to make a Tamil alphabet. The second part provides a cross mapping table to convert this sequence into the corresponding Unicode string sequence.

Status

This document is a Unicode Technical Note. It is supplied purely for informational purposes and publication does not imply any endorsement by the Unicode Consortium. For general information on Unicode Technical Notes, see http://www.unicode.org/notes/.

Contents

The body of this note is contained in the file "tab_to_unicode.pdf".