Aim
The Tamil digitisation project has been started with the aims to develop a software to convert the
printed Tamil books into digital form, and to publish through the Internet a collection of valuable
books in digital form. Several Tamil Digitisation projects have been undertaken by many
institutions world-wide for a long time now. “Noolaham” Institution and “Mathurai” Project have
significant contributions through their projects still now.
Tamil Digitisation tasks have scanned images of the printed documents as initial data. These images
themselves are not very useful in that searching for a word or phrase in the contents or editing
contents, or using for automated translation or any other researches that need contents of the books
are not viable, if not impossible. Once the letters and symbols in such images are recognised and
obtained in editable form, they would be very useful in many context. This is the main objective of
the Tamil digitisation project, and it has made a significant contribution to power-up the researches
that need Tamil contents, and to make the books in electronically readable form being convertible to
any formats convenient to different readers.
The main participants of this project, which is sponsored by Information and Communications
Technology Agents (ICTA), are University of Colombo, School of Computing and the University
of Jaffna, Department of Computer Science. Finding ways of possibilities to take forward projects
and complete them successfully by the collaboration of Institutions lying in two-ends of Sri Lanka
that have differences in many aspects - One in Jaffna and One Colombo – is another objective of
this project.
Projeect Tasks
Interest Group: The foremost task was to set up an interest group consisting of Tamil Scholars who have been contributing in various ways to the development of Tamil Language related matters in order to publicise the outcome of the Tamil digitisation project. Time constraint of the project and the operational constraints bound the selection to be made within the peninsula.
Identifying Printable Tamil Letters: For the development of Tamil digitisation software identifying a defined set of Tamil letters that are in use is an important task. The kind of letters that are in use are given below:
௧ ௨ ௩ ௪ ௫ ௬ ௭ ௮ ௯ ௦
அ ஆ இ ஈ உ ஊ எ ஏ ஐ ஒ ஓ ஔ ஃ
க் ங் ச் ஞ் ட் ண் த் ந் ப்
ம் ய் ர் ல் வ் ழ் ள் ற் ன்
க கா கி கீ கு கூ கெ கே கை கொ கோ கௌ
கு ஙு சு ஞு டு ணு து நு பு
மு யு ரு லு வு ழு ளு று னு
கூ ஙூ சூ ஞூ டூ ணூ தூ நூ பூ
மூ யூ ரூ லூ வூ ழூ ளூ றூ னூ
ஸ ஷ ஜ ஹ க்ஷ ஸ்ரீ ஶ
ஸு ஷு ஜு ஹு க்ஷு ஶு
௹ ௺ ௸ ௳ ௴ ௵ ௶ ௷
Defining Criteria for the Selection of Documents for the Project: Defining selection criteria for the Documents to include in the project is another main task under the constraint of the project duration set by the sponsors. A set of Criteria are set with the help of the interest group.
Selection of Documents: The Interest group have identified a set of books as stage I selection.
Saving Documents as Image: For the task of Digitisation, scanned images and photographic images of documents selected according to the defined selection criteria were obtained and saved in prescribed format and resolution. Noolaham institution made a remarkable contribution to complete this task by giving documents from the Noolaham Project and images obtained by experts and by providing photographic equipment for imaging.
Development of Tamil Digitisation Software -“அகரமறியி”: University of Colombo, School of Computing has developed a software collaborating with Department of Computer Science, University of Jaffna. The software is named “அகரமறியி” (“Aharamariyi” - Identifier of Tamil Letters).
Digitising Scanned Images of Documents: A set of documents identified as per the defined selection criteria have been scanned/photographed were saved as images, and about 1000 pages have been digitised and proof-read and saved in Stage-1.
Publishing Digitised Documents in the Internet: Digitised documents would be made available at http://www.csc.jfn.ac.lk/tdp
Copyright © 2013 - All Rights Reserved - Department of Computer Science, Faculty of Science, University of Jaffna