Sunday, December 1, 2013

Preeti to Unicode converter

In many Nepali news sites that use unicode, one can see several strange non devanagiri characters and some errors that look like result of typos. My understanding is that they type in TTF fonts like Preeti, Kantipur later converting into unicode using one of the many unicode converters available and those errors are due to the converters. I have made a converter which takes care of these issues.

Try it here.



I looked into source code and most of them have convoluted if statements making it hard to extend and test. Errors can be attributed to these causes
  • incomplete mapping of characters 
  • converters dont take into consideration variation in typing ie ordering of shirbindu, chandara bindu and halanta 
  • these converters are available for Preeti and text typed in other fonts need to be treated slightly differently. 
This converter nep-ttf2utf an open source software which looks to solve these problems. It began as a port of foss-np's 2utf8 into python and javascript. It features
  • Supports Kantipur, Preeti, PCS Nepali and FONTASY HIMALI TT, Sagarmatha. Easily extensible to support other fonts. 
  • clean regex based conversion rules
  • well tested to ensure correct conversion, and has test suites and prevent regression of issues. 

7 comments:

  1. Actually, it would be more useful if you could provide the reverse (Unicode 2 Preeti ) Translation ! since unicode translation is really easy and can be done manually if required, whereas Preeti is a dinosaur method with heaps of difficulty albeit sometimes Preeti is required for us ( like editing old docs) !

    ReplyDelete
    Replies
    1. There are few good tools available on the web that do Unicode to preeti.
      Great to hear that you can manually translate Preeti to Unicode translation rather easily, good luck.

      Delete
  2. बच्चालाई औषधी सफलतापूर्वक खुर्ाउनको लागग सुझार्हरू

    ReplyDelete
  3. SofDk]gdf ;xeflutfsf] nflu lgod
          
              ;Dddf # k6s, jf ;f] eGbf a9L k6s k};f k7fpg] u|fxs dxfg'efjnfO{ @))) o]g Sof; Aofs k|bfg k|bfg ul/g]5.



    o;          cjlwleq slDtdf ! k6s /]ld6 ug{' x'g] ;Dk"0f{ u|fxs dxfg'efax?n] hfkflgh ld7fO{ ;]6 k|fKt ug{' x'g]5.

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. How to type () in kalimati unicode

    ReplyDelete