Planet SMC

March 25, 2018

Balasankar C

FOSSAsia 2018 - Singapore

Heya,

So I attended my first international FOSS conference - FOSSAsia 2018 at Lifelong learning institute, Singapore. I presented a talk titled “Omnibus - Serve your dish on all the tables” (slides, video) about the tool Chef Omnibus which I use on a daily basis for my job at GitLab.

The conference was a 4-day long one and my main aim was to network with as many people as I can. Well, I planned to attend sessions, but unlike earlier times when I attended all the sessions, these days I am more focussed on certain topics and technologies and tend to attend sessions on those (for example, devops is an area I focuses on, block chain isn’t).

One additional task I had was attend the Debian booth at the exhibition from time to time. It was mainly handled by Abhijith (who is a DM). I also met two other Debian Developers there - Andrew Lee(alee) and Héctor Orón Martínez(zumbi).

I also met some other wonderful people at FOSSAsia, like Chris Aniszczyk of CNCF, Dr Graham Williams of Microsoft, Frank Karlitschek of NextCloud, Jean-Baptiste Kempf and Remi Denis-Courmont of VideoLan, Stephanie Taylor of Google, Philip Paeps(trouble) of FreeBSD, Harish Pillai of RedHat, Anthony, Christopher Travers, Vasudha Mathur of KDE, Adarsh S of CloudCV (and who is from MEC College, which is quite familiar to me), Tarun Kumar of Melix, Roy Peter of Go-Jek (with whom I am familiar, thanks to the Ruby conferences I attended), Dias Lonappan of Serv and many more. I also met with some whom I know knew only digitally, like Sana Khan who was (yet another, :D) a Debian contributor from COEP. I also met with some friends like Hari, Cherry, Harish and Jackson.

My talk went ok without too much of stuttering and I am kinda satisfied by it. The only thing I forgot is to mention during the talk that I had stickers (well, I later placed them in the sticker table and it disappeared within minutes. So that was ok. ;))

PS: Well, I had to cut down quite a lot of my explanation and drop my demo due to limited time. This caused me miss many important topics like omnibus-ctl or cookbooks that we use at GitLab. But, I had a few participants come up and meet me after the talk, with doubts regarding omnibus and its similarity with flatpak, relevance during the times of Docker etc, which was good.

Some photos are here:

Abhijith in Debian Booth

Abhijith in Debian Booth

Abhijith with VLC folks

Abhijith with VLC folks

Andrew's talk

Andrew's talk

With Anthony and Harish: Two born-and-brought-up-in-SG-Malayalees

With Anthony and Harish: Two born-and-brought-up-in-SG-Malayalees

Chris Aniszczyk

With Chris Aniszczyk

Debian Booth

At Debian Booth

Frank Karlitschek

With Frank Karlitschek

Graham Williams

With Graham Williams

MOS Burgers - Our breakfast place

MOS Burgers - Our breakfast place

Premas Cuisine - The kerala taste

Premas Cuisine - The kerala taste

The joy of seeing Malayalam

The joy of seeing Malayalam

With Sana

With Sana

Well, Tamil, ftw

Well, Tamil, ftw

Zumbi's talk

Zumbi's talk

March 25, 2018 12:00 AM

March 04, 2018

Santhosh Thottingal

Typoday 2018

Santhosh and I jointly presented a paper at Typoday 2018. The paper was titled ‘Spiral splines in typeface design: A case study of Manjari Malayalam typeface’. The full paper is available here. The presentation is available here.

Typoday is the annual conference where typographers and graphic designers from academia and industry come up with their ideas and showcase their work. Typoday 2018 was held at Convocation Hall, University of Mumbai.

 

by Kavya Manohar at March 04, 2018 05:11 PM

February 10, 2018

Santhosh Thottingal

Manjari 1.5 version released

A new version of Manjari typeface is available now. Version 1.5 is mainly a bug fix release.

In version 1.3, the build tooling of the project was changed from fontforge to fontmake. Two weeks back a few people reported that the font no longer works in MS Word and Wordpad. Font selector lists the font, but when selected, the content remains same. It works in all other applications without any issues. Because of that the bug went unnoticed.

Debugging the issue was not easy since font works everywhere else. I did a line by line diff of the ttx format(XML font format) of old and new version fonts.  Found that the OS/2 ulUnicodeRange, ulCodePageRange values were set to 0 in version 1.3.  Apparently these values are really checked by MS Word and Wordpad. If these are missing Wordpad and Word just rejects the font.  Correct values for these fields are set now.

New version 1.5 is available now. You can download the latest fonts from https://smc.org.in/fonts/#manjari

by Santhosh Thottingal at February 10, 2018 07:02 AM

February 09, 2018

Rajeesh K Nambiar

Sundar — a new traditional orthography ornamental font for Malayalam

There is a dearth of good Unicode fonts for Malayalam script. Most publishing houses and desktop publishing agencies still rely on outdated ASCII era fonts. This not only causes issues with typesetting using present technologies, it makes the ‘document’ or ‘data’ created using these fonts and tools absolutely useless — because the ‘document/data’ is still Latin, not Malayalam.

Rachana Institute of Typography (rachana.org.in) has designed and published a new traditional orthography ornamental Unicode font for Malayalam script, for use in headings, captions and titles. It is named after Sundar, who was a relentless advocate of open fonts, open standards and open publishing. He dreamed of making available several good quality Malayalam fonts, particularly created by Narayana Bhattathiri with his unique calligraphic and typographic signature, freely and openly to the users. The font is licensed under OFL.

The font follows traditional orthography for Malayalam, rather than the unpleasing reformed orthography which was solely introduced due to the technical limitations of typewriters in the ’70s. Such restrictions do not apply to computers and present technology, so it is possible to render the classic beauty of Malayalam script using Unicode and Opentype technologies.

‘Sundar’ is designed by K.H. Hussain — known for his work on Rachana and Meera fonts which comes pre-installed with most Linux distributions; and Narayana Bhattathiri — known for his beautiful calligraphy and lettering in Malayalam script. Graphic engineers of STM Docs (stmdocs.in) did the vectoring and glyph creation. Yours truly took care of the Opentype feature programming. The font can be freely downloaded from rachana.org.in.

The source code of ‘Sundar’, licensed under OFL is available at https://gitlab.com/rit-fonts/Sundar.

by Rajeesh at February 09, 2018 07:53 AM

January 17, 2018

Balasankar C

Introduction to Git workshop at CUSAT

Heya,

It has been long since I have written somewhere. In the last year I attended some events, like FOSSMeet, DeccanRubyConf, GitLab’s summit and didn’t write anything about it. The truth is, I forgot I used to write about all these and never got the motivation to do that.

Anyway, last week, I conducted a workshop on Git basics for the students of CUSAT. My real plan, as always, was to do a bit of FOSS evangelism too. Since the timespan of workshop was limited (10:00 to 13:00), I decided to keep everything to bare basics.

Started with an introduction to what a VCS is and how it became necessary. As a prerequisite, I talked about FOSS, concept of collaborative development, open source development model etc. It wasn’t easy as my audience were not only CS/IT students, but those from other departments like Photonics, Physics etc. I am not sure if I was able to help them understand the premise clearly. However, then I went on to talk about what Git does and how it helps developers across the world.

IIRC, this was the first talk/workshop I did without a slide show. I was damn lazy and busy to create one. I just had one page saying “Git Workshop” and my contact details. So guess what? I used a whiteboard! I went over the basic concepts like repositories, commits, staging area etc and started with the hand-on session. In short, I talked about the following

  1. Initializing a repository
  2. Adding files to it
  3. Add files to staging areas
  4. Committing
  5. Viewing commit logs
  6. Viewing what a specific commit did
  7. Viewing a file’s contents at a specific commit
  8. Creating a GitLab account (Well, use all opportunity to talk about your employer. :P)
  9. Creating a project in GitLab
  10. Adding it as a remote repository to your local one
  11. Pushing your changes to remote repository

I wanted to talk about clone, fork, branch and MRs, but time didn’t permit. We wound up the session with Athul and Kiran talking about how they need the students to join the FOSSClub of CUSAT, help organizing similar workshops and how it can help them as well. I too did a bit of “motivational talk” regarding how community activities can help them get a job, based on my personal experience.

Here are a few photos, courtesy of Athul and Kiran:

January 17, 2018 12:00 AM

January 12, 2018

Santhosh Thottingal

മലയാളത്തിലെ ‘ഉ’കാര ചിഹ്നങ്ങൾ

പരിഷ്കരിച്ച മലയാള ലിപിയാണല്ലോ ഇന്നു പാഠപുസ്തകത്തിലുള്ളതും വിദ്യാലയങ്ങളിൽ പഠിപ്പിക്കുന്നതും. അതുകൊണ്ടു തന്നെ ഔപചാരിക വിദ്യാഭ്യാസത്തിൽ മലയാളത്തിന്റെ തനതുലിപിയുടെ ശൈലീഭേദങ്ങൾ പരിചയിക്കുവനുള്ള അവസരം നമുക്കു കിട്ടാറില്ല. പക്ഷേ ചുമരെഴുത്തുകളിലും, ബസ്സിലെ ബോർഡുകളിലും, തനതുമലയാളം എഴുതിശീലിച്ച മുതിർന്നവരുടെ കയ്യെഴുത്തിലുമൊക്കെയായി ഈ ലിപിരൂപങ്ങൾ നമ്മുടെ മുന്നിലുണ്ടു താനും. ലിപിപരിഷ്കരണത്തിന്റെ ഭാഗമായി വേർപെട്ട കൂട്ടക്ഷരങ്ങൾ മിക്കതും തെറ്റുകളൊന്നുമില്ലാതെ നമ്മുടെ കയ്യെഴുത്തുകളിൽ അറിഞ്ഞോ അറിയാതെയോ കൂടിച്ചേരാറുണ്ട്. പക്ഷേ വേർപെട്ട ചിഹ്നങ്ങൾ, പ്രത്യേകിച്ച് ു, ൂ ചിഹ്നങ്ങൾ വ്യഞ്ജനത്തോടു ചേർത്തെഴുതുമ്പോൾ ശൈലികൾ  കൂടിക്കുഴഞ്ഞ് പോവുകയും ചെയ്യുന്നു. ചുവടെയുള്ള ചിത്രം നോക്കുക.

ഉ-ചിഹ്നങ്ങളുടെ ഉപയോഗം ചുമരെഴുത്തിൽ.

പച്ചയടയാളത്തിനുള്ളിൽ പരിഷ്കരിച്ച ലിപി, നീലയിൽ തനതു ലിപി എന്നിവ കാണാം. ചുവന്ന അടയാളമിട്ടു സൂചിപ്പിച്ചിരിക്കുന്നത്  മലയാളത്തിൽ പതിവില്ലാത്ത ശൈലിയാണ്. മലയാളത്തിലെ ഉകാര ചിഹ്നങ്ങൾ തന്നെ എട്ടുവിധമുണ്ട്. ഒട്ടും എഴുത്തുപരിശീലനം ഇല്ലെങ്കിൽ ഇത്തരം പിശകുകൾ കടന്നുകൂടുകതന്നെ ചെയ്യും.

ഈ ചിഹ്നങ്ങളും അവ വ്യഞ്ജനങ്ങളിൽ എങ്ങനെ ചേരുന്നുവെന്നും ചെറുതായി ഈ ലേഖനം പരിചയപ്പെടുത്തുന്നു. ഒപ്പം പഴയകാലങ്ങളിൽ ഇതെങ്ങനെയാണ് പരിചയപ്പെടുത്തിയിരുന്നതു് എന്നതു് ചില പ്രാചീന മലയാള പുസ്തകങ്ങളെ അടിസ്ഥാനമാക്കി വിശദീകരിക്കുന്നു.

തനതുലിപിയിലെ പലശൈലിയിലുള്ള ഉ-ചിഹ്നങ്ങൾ  ആവർത്തിച്ചു ചേർത്തുപയോഗിച്ചിരിക്കുന്നു.

സ്വരചിഹ്നങ്ങൾ മലയാളത്തിൽ

മുപ്പത്തിയേഴ് വ്യഞ്ജനാക്ഷരങ്ങളാണ് മലയാളത്തിനുള്ളത്, പതിനഞ്ചു സ്വരങ്ങളും (അത്ര പ്രചാരത്തിലില്ലാത്ത ൠ, ഌ, ൡ എന്നിവ ചേർത്താൽ 18). സ്വരാക്ഷരങ്ങൾ സ്വതന്ത്രമായി നിൽക്കുന്നത് പൊതുവിൽ വാക്കുകളുടെ തുടക്കത്തിൽ മാത്രമാണ്. അല്ലാത്തപ്പോഴെല്ലാം വ്യഞ്ജനശബ്ദങ്ങളെ പരിഷ്കരിച്ചുകൊണ്ട് അവയോട് ചേർന്നു നിൽക്കും. ഇങ്ങനെ ചേർന്നുനിൽക്കുന്നത് സ്വരാക്ഷരങ്ങളല്ല, മറിച്ച് അവയെക്കുറിക്കുന്ന സ്വരചിഹ്നങ്ങളാണ്. സ്വരചിഹ്നങ്ങൾ വ്യഞ്ജനത്തോടു ചേരുമ്പോഴുണ്ടാകുന്ന ലിപിരൂപങ്ങളുടെ വൈവിദ്ധ്യം മലയാളത്തിന്റെ ഒരു സവിശേഷതയാണ്. സ്വരചിഹ്നങ്ങൾ വ്യഞ്ജനങ്ങളുടെ ഇടതും വലതും ഒക്കെയായി വിന്യസിക്കപ്പെടും. പട്ടിക കാണുക.

മലയാളത്തിലെ സ്വരാക്ഷരങ്ങൾ, സ്വരചിഹ്നങ്ങൾ.

പട്ടികയിൽ കാണുന്നതുപോലെ  ു, ൂ, ൃ ചിഹ്നങ്ങൾ വ്യഞ്ജനത്തോടൊട്ടിനിന്ന് അവയുടെ രൂപത്തെത്തന്നെ മറ്റുന്നു. മലയാളലിപികൾ 1971ൽ പരിഷ്കരിക്കപ്പെട്ടതിനു ശേഷമാണ് വ്യഞ്ജനത്തിന്റെ വലതുഭാഗത്തു വേർപെട്ടുനിൽക്കുന്ന വിധത്തിൽ ഈ ചിഹ്നങ്ങൾ പ്രചാരത്തിലായത്. പരിഷ്കരിച്ച ലിപിയിൽ ഈ ചിഹ്നങ്ങൾക്കു വേറിട്ട വിധത്തിലുള്ള രൂപങ്ങൾ മാത്രമേയുള്ളൂ. 5, 6, 7 വരികളിലായി ഇതും പട്ടികയിൽ കാണാം.

ു, ൂ ചിഹ്നങ്ങൾ എങ്ങനെയൊക്കെ വ്യഞ്ജനത്തോടു ചേരാം?

സ്വരങ്ങളിലെ ചിഹ്നരൂപങ്ങളിൽ ഏറ്റവും വ്യത്യസ്തമായ ശൈലികളിൽ വ്യഞ്ജനത്തോടു ചേരുന്നത് ‘ു’, ‘ൂ’ ചിഹ്നങ്ങളാണ്. ചേരുമ്പോൾ വ്യഞ്ജനത്തിന്റെ രൂപത്തെത്തന്നെ അവ മാറ്റുകയും ചെയ്യും. ഓരോ വ്യഞ്ജനത്തിലും ഈ ചിഹ്നം വരുത്തുന്ന മാറ്റം വ്യത്യസ്തമാണ്. ഒരുപക്ഷേ ലിപി പരിണമിച്ചു വന്നപ്പോൾ  ഉണ്ടായ മാറ്റമാകാം ഇതിനു കാരണം. ഉ, ഊ ചിഹ്നങ്ങൾ വ്യഞ്ജനത്തോടു ചേരുമ്പോഴുള്ള ശൈലീവ്യതിയാനങ്ങൾ തനതുലിപിയിൽ ആകെ എട്ടു വിധത്തിലാകാം.  അവ ക്രോഡീകരിക്കികയാണ് ഇവിടെ.

 

മലയാളത്തിലെ ഉ-ചിഹ്നങ്ങൾ
  • ഉകാരം വ്യഞ്ജനത്തോടു ചേരുമ്പോൾ നാലുവിധത്തിലുള്ള മാറ്റങ്ങൾ വ്യഞ്ജനാക്ഷരങ്ങൾക്കു വന്നുചേരുന്നു.
  • ക, ര ഇവയോടു ഉകാരം ചേരുമ്പോഴുള്ള രൂപവ്യതിയാനം കുനിപ്പ് എന്ന പേരിൽ ചിത്രത്തിൽ സൂചിപ്പിച്ചിരിക്കുന്നു. ഈ രണ്ട് അക്ഷരങ്ങൾക്കും പിന്നെ കാരത്തിലവസാനിക്കുന്ന  എല്ലാ കൂട്ടക്ഷരങ്ങൾക്കും ഇതു ബാധകമാണ്. ങ്ക, ക്ക, സ്ക, സ്ക്ക ഇവയെല്ലാം അതിൽപ്പെടുന്നു. കാരത്തിൽ അവസാനിക്കുന്ന കൂട്ടക്ഷരങ്ങൾക്കായി പ്രത്യേകചിഹ്നമുള്ളതുകൊണ്ട് അവിടെ ഒരിക്കലും കുനിപ്പുപയോഗിക്കേണ്ടി വരികയില്ല.
  • ഗ, ഛ, ജ, ത, ഭ, ശ, ഹ -ഇവയോട് ഉകാരം ചേരുമ്പോഴുള്ള  ശൈലി  ഇടത്തേയ്ക്കുനീണ്ടു വലത്തോട്ടു തിരിച്ചുവരുന്ന ഒരു വാല് രൂപമാണ്. ഈ അക്ഷരങ്ങളിലവസാനിക്കുന്ന കൂട്ടക്ഷരങ്ങൾക്കും ഈ രീതി തന്നെയാണ് പിന്തുടരുക.
  • ണ, ന എന്നിവയോട് ഉകാരം ചേരുമ്പോൾ ഉണ്ടാകുന്ന രൂപവ്യതിയാനം ഉള്ളിലേയ്ക്കുള്ള ഒരു ചുരുട്ടാണ്. ഈ അക്ഷരങ്ങളിലവസാനിക്കുന്ന കൂട്ടക്ഷരങ്ങൾക്കും ഇതു ബാധകമാണ്. ഉദാഹരണത്തിന് ണ്ണ, ന്ന, ക്ന ഇവയൊക്കെ.
  • മറ്റു് 24 വ്യഞ്ജനാക്ഷരങ്ങളും ഉപയോഗിക്കുന്നത് ചുഴിപ്പ്/കുണുക്ക് എന്ന രൂപമാണ് . ഏറ്റവും കൂടുതൽ ഉപയോഗിക്കുന്ന ശൈലി ഇതായതു കൊണ്ടു തന്നെ എല്ലാവ്യഞ്ജനങ്ങൾക്കും ഈ ശൈലി പകർത്തിയുപയോഗിക്കുക എന്ന പിഴവ് വളരെ വ്യാപകമായി കാണാറുണ്ട്. ഇതുതന്നെയാണ് ആദ്യത്തെ ചുമരെഴുത്തിൽ ചുവന്ന അടയാളത്തിൽ കാണിച്ചിരിക്കുന്നതും.
മലയാളത്തിലെ ഊ-ചിഹ്നങ്ങൾ

ഊകാരം വ്യഞ്ജനത്തോടു ചേരുമ്പോളും നാലുവിധത്തിലുള്ള മാറ്റങ്ങൾ വ്യഞ്ജനാക്ഷരങ്ങൾക്കു വന്നുചേരുന്നു.

  • ക, ഭ, ര, ഗ, ഛ, ജ, ത, ഭ, ശ, ഹ  ഇവയുടെ ഊകാര ചിഹ്നങ്ങൾ രണ്ടുവിധത്തിൽ ഉണ്ടാകാം: കുനിപ്പിട്ട വളപ്പുരൂപവും കുനിപ്പിട്ട വാലുരൂപവും.
    • കുനിപ്പിനുശേഷം അക്ഷരത്തെച്ചുറ്റി ഇടത്തേയ്ക്ക് വളഞ്ഞുപോകുന്നതാണ് കുനിപ്പിട്ടവളപ്പ്. കുനിപ്പിനു ശേഷം ഇടത്തേയ്ക്കുപോയി വലത്തേയ്ക്കു തിരിച്ചുവരുന്നതാണ് കുനിപ്പിട്ടവാല്.
    • ക, ര, ഭ എന്നിവയ്ക്ക് കുനിപ്പിട്ടവളപ്പാണ് ഇന്നുകൂടുതൽ പ്രചാരത്തിലുള്ളത്
    • ഗ, ഛ, ജ, ത, ഭ, ശ, ഹ ഈ അക്ഷരങ്ങൾക്ക് കുനിപ്പിട്ട വാല് രൂപവും. ചിത്രത്തിൽ ആദ്യം സൂചിപ്പിച്ചിരിക്കുന്നത് ഇതാണ്.
  • ചുരുട്ട് പുറത്തേയ്ക്ക് നീട്ടുന്ന ചുരുട്ടുവാല് എന്നു പേരിടാവുന്ന രൂപമാണ് ന, ണ എന്നീ അക്ഷരങ്ങളുടെ ഊകാരരൂപങ്ങൾ.
  • മറ്റു വ്യഞ്ജനങ്ങൾ 24 എണ്ണവും ഇരട്ടച്ചുഴിപ്പായി വ്യഞ്ജനങ്ങളോട് ചേരുന്നു. ഇതും എല്ലാ വ്യഞ്ജനത്തിനുമുള്ള പൊതുരൂപമായിക്കരുതി പിഴവുവരുത്തുന്നതും സാധാരണമാണ്.

കുറിപ്പ്: ചിഹ്നത്തിന്റെ രൂപങ്ങൾക്ക് പൊതുവായ പേരുകൾ കണ്ടെത്താൻ സാധിക്കാത്തതിനാൽ ലേഖിക തന്നെ ഇട്ട പേരുകളാണ്   കുനിപ്പ്, ചുഴിപ്പ്, ചുരുട്ട്, വാല് എന്നിവ. പകരം വേറെ പേരുകളുണ്ടാവാം.

‘ഉ’കാരചിഹ്നങ്ങൾ പ്രാചീനലിപിപാഠങ്ങളിൽ

ആധുനിക പാഠപുസ്തകങ്ങളിൽ നിന്ന് ഈ ലിപിരൂപങ്ങൾ പരിചയപ്പെടുവാൻ സാധിക്കില്ല. പക്ഷേ പ്രാചീന ലിപിപാഠപുസ്തകങ്ങൾ പലതിലും വളരെ വിശദമായി ഇവയെ വിവരിക്കുന്നുണ്ടു താനും. കയ്യെഴുത്തുപ്രതികളിൽ നോക്കിയാലാണ് ലിപിശൈലികളുടെ യഥാർത്ഥ ചിത്രം വ്യക്തമാകൂ. അച്ചടി പലതിനേയും മാനകീകരിക്കുമ്പോൾ വൈവിധ്യം നഷ്ടമായിട്ടുമുണ്ടാകും.

മലയാളലിപികൾ ആണിയച്ചുകളായി(movable types, ജംഗമാച്ചുകൾ) ആദ്യം അച്ചടിക്കപ്പെടുന്നത് റോമിൽ 1772 ലാണ്. ഫാദർ ക്ലെമന്റ് പിയാനിസ് തയ്യാറാക്കിയ ‘ആൽഫബെത്തും ഗ്രന്ഥോനിക്കോ മലബാറിക്കും‘ എന്ന ഈ ലത്തീൻ പുസ്തകം പാശ്ചാത്യമിഷനറിമാർക്ക് മലയാളലിപികൾ പഠിക്കാനുതകുന്ന വിധത്തിൽ തയ്യാറാക്കിയതായിരുന്നു. ഇതേ അച്ചുകളുപയോഗിച്ചാണ് ആദ്യ സമ്പൂർണ്ണമലയാളപുസ്തകമായ സംക്ഷേപവേദാർത്ഥം പിന്നീട് അച്ചടിക്കുന്നത്.

ആൽഫബെത്തും ഗ്രന്ഥാണിക്കോ മലബാറിക്കം ഉ-ചിഹ്നങ്ങളെപ്പറ്റി പ്രതിപാദിക്കുന്ന ഭാഗം

‘മലയാളത്തിലെ ‘ഉ’, ‘ഊ’ ചിഹ്നങ്ങളുടെ വൈവിധ്യത്തെക്കുറിച്ച് ‘ആൽഫബെത്തും ഗ്രന്ഥോനിക്കോ മലബാറിക്കും’ വിശദമായി പ്രതിപാദിക്കുന്നുണ്ട്. ഈ പുസ്തകത്തിന്റെ മലയാളപരിഭാഷ, വിശദീകരണങ്ങളോടു കൂടി ഫാദർ ഇമ്മനുവേൽ ആട്ടേൽ രചിച്ചത് നമുക്കിന്ന് ലഭ്യമാണ്. മേൽപ്പറഞ്ഞ രൂപങ്ങളിൽ പലതും അതിൽ നമുക്കു കാണാം. അദ്ദേഹത്തിന്റെ വിശദീകരണത്തിൽ മലയാളത്തിലെ ഊകാരങ്ങൾക്ക്  നാമിന്നു മനസ്സിലാക്കുന്ന കുനിപ്പിട്ടവാല് എന്ന രൂപമില്ല. ക, ഗ, ഛ, ജ, ത, ശ, ഹ, ര, ഭ ഇവയെല്ലാം കുനിപ്പിട്ട വളപ്പ് രൂപത്തിലാണ്. രകാരത്തിനു മേൽ ‘ൂ’ ചിഹ്നം ചേരുമ്പോൾ വാലുരൂപത്തിലാണെങ്കിൽ കുറച്ചുകൂടി വ്യക്തത വന്നേനെ എന്നൊരു അഭിപ്രായവും അദ്ദേഹം പ്രകടിപ്പിക്കുന്നുണ്ട്. ചുരുട്ടുവാല്, ചുഴിപ്പ് രൂപങ്ങളെല്ലാം അതേപോലെ തന്നെ അന്നും ഉപയോഗിച്ചിരുന്നു. ആ പുസ്തകത്തിലെ ര,ത ഇവയുടെ രൂപങ്ങൾ ഇന്നത്തേതിൽ നിന്നും നല്ല വ്യത്യാസമുണ്ടായിരുന്നെന്നു കൂടി ശ്രദ്ധിക്കണം. അതുകൊണ്ടുതന്നെ രൂ, തൂ ഇവ രണ്ടും വളപ്പിട്ടെഴുതിയാലും തമ്മിൽ മാറിപ്പോകുകയില്ല.

ആൽഫബെത്തും ഗ്രന്ഥാണിക്കോ മലബാറിക്കം എന്ന ലത്തീൻ പുസ്തകത്തിന്റെ മലയാളപരിഭാഷയിൽ നിന്നും
ആൽഫബെത്തും ഗ്രന്ഥാണിക്കോ മലബാറിക്കം എന്ന ലത്തീൻ പുസ്തകത്തിന്റെ മലയാളപരിഭാഷയിൽ നിന്നും

1800കൾക്കും മുമ്പ് മലയാളം പഠിച്ച് ഒരു വിദേശിയെഴുതിയ പാഠപുസ്തകത്തിലെ വിശദീകരണമാണിത്, അച്ചടിക്കപ്പെട്ട ആദ്യ മലയാള അക്ഷരമാലാപാഠപുസ്തകത്തിലേതും.

പിന്നീട് മലയാളലിപി പാഠപുസ്തകത്തിനു സമാനമായി അവതരിപ്പിക്കുന്നത് 1863ൽ റെവ. ജോർജ്ജ് മാത്തനാണ്. ചുഴിപ്പു ചേർന്ന ‘ഉ’കാരചിഹ്നങ്ങൾ പൊതുരൂപമായും മറ്റുള്ളവയെ ഒരു അപവാദം എന്ന നിലയിലുമാണ് അദ്ദേഹം അവതരിപ്പിക്കുന്നത്(ചിത്രം കാണുക). ഊ ചിഹ്നത്തിന്, ക, ഗ, ഛ, ജ, ത, ശ, ഹ, ര, ഭ  ഇവയോടൊപ്പം  കുനിപ്പിട്ട വാല്, കുനിപ്പിട്ട വളപ്പ് എന്നീ രൂപങ്ങളിൽ ഏതുമാകാം എന്നാണ് മാത്തന്റെ “മലയാഴ്മയുടെ വ്യാകരണം” എന്ന പുസ്തകം പറയുന്നത്.

റവറന്റ് ജോർജ്ജ് മാത്തന്റെ മലയാള വ്യാകരണ പുസ്തകത്തിൽ നിന്ന്

 

 

റവറന്റ് ജോർജ്ജ് മാത്തന്റെ മലയാള വ്യാകരണപുസ്തകത്തിൽ നിന്നും

രണ്ടു നൂറ്റാണ്ടുകൾക്കുമുമ്പ്, അച്ചടി സാങ്കേതികവിദ്യയിലൂടെ ഭാഷയുടെ ആദ്യകാല വളർച്ചയ്ക്കു വഴിതുറന്നവർ മലയാളത്തിന്റെ  ലിപിരൂപങ്ങളെക്കുറിച്ചു വിവരിച്ചതാണ് നമ്മൾ കണ്ടത്. എഴുത്തുരൂപത്തിലെ വൈവിദ്ധ്യത്തെ മുഴുവനും അച്ചുകളിലേയ്ക്ക് അവർ കൊണ്ടുവരാൻ ശ്രമിയ്കുകയും ചെയ്തു.

1971ലെ ലിപിപരിഷ്കരണത്തിന്റെ ലക്ഷ്യം സാങ്കേതികപരിമിതികളെ അതിജീവിയ്ക്കുവാനായി ലിപിയുടെ ലളിതവൽക്കരണമായിരുന്നു. പക്ഷേ അതിന്റെ ഫലമായി നിലനിൽക്കുന്ന രൂപങ്ങളെക്കൂടാതെ വേർപെട്ടുനിൽക്കുന്ന ഒരു പുതിയ ശൈലികൂടി കടന്നുവരികയാണുണ്ടായത്. അതുവരെ നിലനിന്ന രൂപങ്ങളെ പൊടുന്നനെ ഇല്ലതാക്കാൻ ആവില്ലല്ലോ. പരിചയക്കുറവും പരിശീലനക്കുറവും കൊണ്ട് ഏത്ചിഹ്നരൂപം ഏത് വ്യഞ്ജനത്തിനൊപ്പം ചേരുമെന്ന ആശയക്കുഴപ്പം ഭാഷ ഉപയോഗിക്കുന്നവരിൽ വ്യാപകമാവുകയും ചെയ്തു.

by Kavya Manohar at January 12, 2018 02:10 PM

January 06, 2018

Santhosh Thottingal

Stylistic Alternates for ച്ച, ള്ള in Manjari and Chilanka fonts

The ligatures for the Malayalam conjuncts ച്ച, ള്ള have less popular variants as shown below

The second form is not seen in print but often in handwritten Malayalam. I have seen it a lot in bus boards especially at Thiruvananthapuram. There are no digital typefaces with the second style, except the Chilanka font I designed. It uses the second variant of ച്ച. I got lot of appreciation for that style variant, but also recieved request for the first form of ച്ച. I had a private copy of Chilanka with that variant and had given to whoever requested. I also recieved some requests for the second style of ള്ള. For the Manjari font too, I recieved requests for second variant.

Today I am announcing the new version of Manjary and Chilanka font, with these two forms as optional variants without the need for a different copy of a font. In a single font, you will get both these variants using the Opentype stylistic alternatives feature.

The default styles of ച്ച and ള്ള are not changed in new version. The fonts comes with an option to chose a different form.

Choosing the style for webfonts using CSS

Use the font-feature-settings CSS style to choose a style. For the element or class in the html, use it as follows:

For style 1:

font-feature-settings: "salt" 1;

For style 2:

font-feature-settings: "salt" 2;

Choosing the style variant in LibreOffice

In the place of the font name in font selector, append :salt=1 for first style, :salt=2 for second style. So you need to give Manjari Regular:salt=2 as the font name for example to get second style.

Choosing the style variant in XeLaTeX

fontspec allows to choose alterate style variants. Use Alternate=N syntax. Note that N starts from 0. So for style1, use Alternate=0 and for style2 use Alternate=2. Refer section 2.8.3 of fontspec documentation.

\documentclass[11pt]{article}
\usepackage{polyglossia}
\newfontfamily{\manjari}[Script=Malayalam]{Manjari}
\begin{document}

\manjari{\addfontfeature{Alternate=1}കാച്ചാണി, വെള്ളയമ്പലം}

\end{document}

This will produce the following rendering:

Choosing the style variant in Inkscape

Inkscape font selection dialog has a feature to chose font style variants. It uses the property values of CSS font-feature-settng.

In Adobe, Indesign, selecting the ligature will give stylistic alternative(s) if any to choose.

Updated fonts

Updated fonts are available in SMC’s font download microsite https://smc.org.in/fonts

by Santhosh Thottingal at January 06, 2018 09:35 AM

December 10, 2017

Santhosh Thottingal

Number spellout and generation in Malayalam using Morphology analyser

Writing a number 6493 as six thousand four hundred and ninety three is known as spellout of that number. The most familiar example of this is in cheques. Text to speech systems also need to convert numbers to words.

Source: https://commons.wikimedia.org/wiki/File:Sample_cheque.jpeg by User:Tshrinivasan

The reverse process of this, to convert a phrase like six thousand four hundred and ninety three to number 6493 – the number generation, is also common. In software, it is often required in Speech recognition and in general any kind of semantic analysis of text.

Numbers and its conversion to English words is not really a complex problem to solve with a computer. But how about other languages? In this article, I am discussing the nature of these words in Malayalam and an approach to parse the number and numbers written in words.

Malayalam number spellout

In Malayalam, the spellout of numbers forms a single word. For example, a number 108 is നൂറ്റെട്ട് – a single word. This word is formed by adjective form of നൂറ്(100) and എട്ട്(8). While these two words are glued, Malayalam phonological rules are also applied, resulting this single word നൂറ്റെട്ട്. This word formation characteristics are present for almost all possible numbers you can imagine. Parsing the number നൂറ്റെട്ട് and interpreting it as 108 or converting 108 to നൂറ്റെട്ട് is an interesting problem in Malayalam computing.

I came across this problem while I was trying to develop a dictionary based spellchecker years back. Such a dictionary should have all these single words for all possible numbers, right? Then how big it will be? Later when I was researching on Malayalam morphology analyser, I again encountered this problem. You cannot have all these words in lexicon as entries – it is not practical. At the same time, you should be able to parse these words and and also generate with correct morpho-phonological rules of Malayalam.

Like I mentioned in my introduction article of my Malayalam morphological analyser,  project, Malayalam is a heavily agglutinative language. While I was learning the Finite transducer technology, Malayalam number words were one of the obvious candidates to try out. These numbers perfectly model Malayalam word formations. They get agglutinated and inflected, during which morpho-phonological rules get applied. നൂറ്റെട്ടിലായിരുന്നു, നൂറ്റെട്ടിനെ, നൂറ്റെട്ടോ? നൂറ്റെട്ടാം, നൂറ്റെട്ടാമത്തെ, നൂറ്റെട്ടര  – All are examples of words you get on top number word നൂറ്റെട്ട്. Also, it is not two word agglutination, പതിനാറായിരത്തൊരുനൂറ്റെട്ട് – 16108 is an example where പതിനാറ്(16), ആയിരം(1000), നൂറ്(100), എട്ട്(8) – all joined to form a single word. In fact this is a common word you often see in literature because of this myth about Lord Krishna. The current year, 2017 is often written as രണ്ടായിരത്തിപ്പതിനേഴ്.

Let us examine a nature of these word formation.

Ones

Numbers between 0 and 9 has words as പൂജ്യം, ഒന്ന്, രണ്ട്, മൂന്ന്, നാല്, അഞ്ച്, ആറ്, ഏഴ്, എട്ട്, ഒമ്പത് respectively. The word ഒമ്പത് is sometimes written as ഒൻപത് too, which is phonetically similar to ഒമ്പത്. Each of these words ending with Virama(്) is sometimes written with Samvruthokaram too. ഒന്ന് – ഒന്നു്, രണ്ടു്, മൂന്നു്, നാലു് etc.

Tens

Number 10 is പത്ത്. Multiples of tens till 80 follows the rough pattern:

Adjective form of [രണ്ട്|മൂന്ന്|നാല്|അഞ്ച്|ആറ്|ഏഴ്|എട്ട്] + പത്.

So, they are ഇരുപത്(20), മുപ്പത്(30), നാല്പത്(40), അമ്പത്(50), അറുപത്(6), എഴുപത്(70), എൺപത്/എമ്പത്(80). But at 90, a new form emerges – തൊണ്ണൂറ് – Which has no root on ഒമ്പത് (9). Instead it is more like something before നൂറ്(100).

The numbers 11-19 are unique words. പതിനൊന്ന്, പന്ത്രണ്ട്, പതിമൂന്ന്, പതിനാല്, പതിനഞ്ച്, പതിനാറ്, പതിനേഴ്, പതിനെട്ട്, പത്തൊമ്പത് respectively.

All other two digit numbers between the multiples of tens follow the following pattern

[Word for 10x] + [Word for Ones]

So, 21 is ഇരുപത്(20)+ ഒന്ന്(1). But to form a single word, An adjective form is used, which is similar to female gender inflection of Malayalam nouns- ഇരുപത്തി + ഒന്ന് . Phonological rules should be applied to combine these two words. The vowel sign ി(i) at the end of ഇരുപത്തി  will introduce a new consonant യ(ya). Also the first letter of ഒന്ന് – the vowel ഒ will change to its vowel sign form ൊ. So we get ഇരുപത്തി + യ + ൊന്ന്. It results ഇരുപത്തിയൊന്ന്. This phonological rule is actually Agama Sandhi / ആഗമ സന്ധി as per Malayalam grammer rules. But, ഇരുപത്തിയൊന്ന് has a more propular form, ഇരുപത്തൊന്ന് which is generated by dropping ി + യ from the generation process.

The words for 20s can be generated similarly. ഇരുപത്തിരണ്ട്(22), ഇരുപത്തിമൂന്ന്(23), ഇരുപത്തിനാല്(24),  ഇരുപത്തിയഞ്ച്/ഇരുപത്തഞ്ച്(25), ഇരുപത്തിയാറ്/ഇരുപത്താറ്(26), ഇരുപത്തിയേഴ്/ഇരുപത്തേഴ്(27), ഇരുപത്തിയെട്ട്/ഇരുപത്തെട്ട്(28), ഇരുപത്തിയൊമ്പത്/ഇരുപത്തൊമ്പത്(29). For all other two digit numbers the pattern is same. Note that തൊണ്ണൂറ് (90) has the prefix form തൊണ്ണൂറ്റി. So 98 is തൊണ്ണൂറ്റിയെട്ട്/തൊണ്ണൂറ്റെട്ട്.

Hundreds

100 is നൂറ്. Its prefix form is നൂറ്റി. Multiples of 100s is somewhat similar to multiples of 10s we saw above. They are ഇരുന്നൂറ്(200), മുന്നൂറ്(300), നാനൂറ്(400), അഞ്ഞൂറ്(500), ആറുനൂറ്(600), എഴുന്നൂറ്(700), എണ്ണൂറ്(800), തൊള്ളായിരം(900). Here also the 900 deviates from others. The word is related to 1000(ആയിരം) than 100 – Just like the case of 90-തൊണ്ണൂറ് we discussed above.

Forming 3 digits numbers is, in general the prefix of multiple of hundred followed by Tens we explained above. So 623 is അറുനൂറ് + ഇരുപത്തിമൂന്ന്  = അറുനൂറ്റിയിരുപത്തിമൂന്ന് or the more popular and short form അറുനൂറ്റിരുപത്തിമൂന്ന്. 817 is എണ്ണൂറ്റി+ പതിനേഴ് = എണ്ണൂറ്റിപ്പതിനേഴ് with gemination of consonant പ as per phonological rule. 999 is തൊള്ളായിരത്തിത്തൊണ്ണൂറ്റിയൊമ്പത് or തൊള്ളായിരത്തിത്തൊണ്ണൂറ്റൊമ്പത്  or തൊള്ളായിരത്തിത്തൊണ്ണൂറ്റിയൊൻപത്.

Numbers between 100-199 may optionally prefixed by ഒരു – Adjective form of ഒന്ന്(1).  101 – ഒരുന്നൂറ്റിയൊന്ന് 122-ഒരുന്നൂറ്റിയിരുപത്തിരണ്ട് etc. നൂറ്(100) can be also ഒരുന്നൂറ്

Thousands

1000 is ആയിരം. ആയിരത്തി is prefix for all other 4 digit numbers till 1 lakh(ലക്ഷം 100000). Multiples of 1000 can be generated by suffixing ആയിരം. For example, 4000  is നാല് + ആയിരം = നാലായിരം. 6000 – ആറായിരം. But 5000 is അയ്യായിരം, and അഞ്ചായിരം is less popular version. 8000 is എട്ട് + ആയിരം = എട്ടായിരം, but എണ്ണായിരം is popular form.  10000 is പത്ത് + ആയിരം = പത്തായിരം. But പതിനായിരം is the more familiar version. പതിനായിരം is the suffix for multiples of 10K. They are ഇരുപതിനായിരം, മുപ്പതിനായിരം, നാല്പതിനായിരം, അമ്പതിനായിരം, അറുപതിനായിരം, എഴുപതിനായിരം, എൺപതിനായിരം, തൊണ്ണൂറായിരം. 3000 is മുവ്വായിരം than മൂന്നായിരം. So 73000 is എഴുപത്തിമുവ്വായിരം or എഴുപത്തിമൂന്നായിരം.

Numbers between 1000-1999 may optionally prefixed by ഒരു – Adjective form of ഒന്ന്(1).  1008 – ഒരായിരത്തിയെട്ട് 1122-ഒരായിരത്തിയൊരുന്നൂറ്റിയിരുപത്തിരണ്ട് etc. ആയിരം(1000) can be also ഒരായിരം.

Lakhs & Crores

100, 000 is ലക്ഷം. ലക്ഷത്തി is prefix. 1,00, 00, 000 is കോടി. കോടി itself is prefix. 12,00,90 is  പന്ത്രണ്ടുലക്ഷത്തിത്തൊണ്ണൂറ്. 99,00,00,00,00,00,00 is തൊണ്ണൂറ്റൊമ്പതുലക്ഷംകോടി.

Why morphology analyser?

From the above explanation of word formation for numbers in Malayalam, one can see that there are patterns and there are lot of exceptions. But still, isn’t it possible to write a generator using just a rule based program in a programming language. I  would agree. Yes, it is possible. But other than mapping these numbers to word forms, handling exceptional rules, there are a few other things also we saw. When words are agglutinated, there are phonological rules in action. Also, I said that these words can be inflected again. We also want the bidirectional conversion – not just word generation, but converting those words back into a number. All these will make such a program so complicated and it has to duplicate so many things from morphology analyser. That is why I used morphology analyser here.

What are the morphemes in a string like ആയിരത്തിത്തൊള്ളായിരത്തിത്തൊണ്ണൂറ്റിയാറ്? ആയിരം, തൊള്ളായിരം, തൊണ്ണൂറ്, ആറ്? Sounds good, but we see that  തൊള്ളായിരം is ഒമ്പത്, നൂറ്. and തൊണ്ണൂറ് is ഒമ്പത്, പത്ത്. So expanding it, we get ആയിരം, ഒമ്പത്, നൂറു, ഒമ്പത്, പത്ത്, ആറ്. But this sequence does not make any sense of the single word it created. What is missing? Can we consider തൊള്ളായിരം, തൊണ്ണൂറ് as single morphemes? We can, but…

  • If  തൊള്ളായിരം is a morpheme, it means, it is in a lexicon. That makes all other 3 digit number also eligible to be listed as items in lexicon. So ultimately, we go back to the large lexicon/dictionary issue I mentioned in the beginning of the article.
  • Semantically, any number spellout is originated from Ones and their place value. So തൊണ്ണൂറ് is 9<tens>.

I have not seen any morphology analyser dealing with number spellout. It seems Malayalam numbers are so unique in this aspect. I read a few academic papers on dealing with this complexity using Rule based approaches(See References) and an automata like paradigm language(Richard Gillam – A Rule-Based Approach to Number Spellout).

The approach I derived after trying out some choices is as follows:

  • Introduce morphology tags for positional values. This is similar to POS tags, but here we apply for number spellouts. <ones>, <tens>, <hundreds>, <thousands>, <lakhs>, <crores> are those tags.
  • Parse a spellout to reach the atomic morphemes in a number spellout – they are ഒന്ന്, രണ്ട്, മൂന്ന്, നാല്, അഞ്ച്, ആറ്, ഏഴ്,എട്ട്, ഒമ്പത്, പൂജ്യം.
  • These morphemes will have the tags mentioned above.

To illustrate this, let use use some examples,

As you can observe, only the atomic numbers are used as morphemes and place values are indicated using tags. You can also see that the analysis is easy to interpret for a program to generate the number.

For example, if the analysis is രണ്ട്<ones><thousands> ഒന്ന്<tens> ഏഴ്<ones>,  replace the words with its numbers, tags by position value. You get

2*1*1000 + 1*10 + 7*1  =  2000+10+7 = 2017

I said that, the advantage of morphology analyser is you can generate the word from analysis strings. The bidirectional property. This means, if you have a number, you can generate the spellout. For that we first need to some maths on the number. For example, for same number 2017, we can divide incrementally by lakhs, thousands, hundreds, tens and arrive at the following formation

2017 = 2*1000 + 0*100 + 1*10+ 7*1

Which can be converted to:

രണ്ട്<thousands>ഒന്ന്<tens>ഏഴ്<ones>

The morphology analyser can easily generate the word രണ്ടായിരത്തിപ്പതിനേഴ് by applying all grammatical rules.

 

If you are eager to try out this conversion, I wrote a quick javascript based number to word convertor using the APIs of morphology analyser.

See the Pen Malayalam number parser by Santhosh Thottingal (@santhoshtr) on CodePen.

I did not write a convertor from the spelled out word to number. You are free to write one. The web interface of mlmorph is available for trying out some analysis too – https://morph.smc.org.in/

Inflections

Some illustrations on inflected spellout analysis

Ordinals

Ordinal form of numbers are used to show position. Examples are first, third etc. In Malayalam examples are ഒന്നാം, പതിനെട്ടാം ഏഴാമത്, ഒമ്പതാമത്തെ etc.  Supporting those forms is just like inflections. See the below screenshot

Technical details

Known issues

  • Some commonly used forms like മുപ്പത്തിമുക്കോടി is not supported yet.There are also variations like മുവ്വായിരം, മൂവായിരം.
  • If there are are multiple ways to generate a number word, the system generates all such forms. But some of these forms may be very obscure and not used at all.
  • There is a practice to insert space after some prefixes like ആയിരത്തി, ലക്ഷത്തി, കോടി. In the model I assumed the words are generated as single word.

Summary

We analysed the word formation for the spellout of the numbers in Malayalam. Usage of morphology analyser for analysis and generation of these word forms are introduced. A demo program that converts numbers to its word forms considering all morphophonological rules are presented. Algorithm for spelled out word to number conversion is given with example. Programmable API and Web API is given for the system.

References

by Santhosh Thottingal at December 10, 2017 02:46 PM

November 26, 2017

Santhosh Thottingal

Towards a Malayalam morphology analyser

Malayalam is a highly inflectional and agglutinative language. This has posed a challenge for all kind of language processing. Algorithmic interpretation of Malayalam’s words and their formation rules continues to be an untackled problem.  My own attempts to study and try out some of these characteristics was big failure in the past. Back in 2007, when I tried to develop a spellchecker for Malayalam, the infinite number of words this language can have by combining multiple words together and those words inflected was a big challenge. The dictionary based spellechecker was a failed attempt. I had documented these issues.

I was busy with my type design  projects for last few years, but continued to search for the solution of this problem. Last year(2016), during Google summer of code mentor summit at Google campus, California, mentors working on language technology had a meeting and I explained this challenge. It was suggested that I need to look at Finnish, Turkish, German and such similarly inflected and agglutinated languages and their attempts to solve this. So, after the meeting, I started studying some of the projects – Omorfi for Finnish, SMOR for German, TRMorph for Turkish. All of them use Finite state transducer technology.

There are multiple FST implementation for linguistic purposes – foma, XFST – The Xerox Finite State Toolkit, SFST – The Stuttgart Finite State Toolkit and HFST – The Helsinki Finite State Toolkit. I chose SFST because of good documentation(in English) and availability of reference system(TRMorph, SMOR).  And now we have mlmorph  – Malayalam morphology analyser project in development here:  https://github.com/santhoshtr/mlmorph

I will document the system in details later. Currently it is progressing well. I was able to solve arbitrary level agglutination with inflection. Nominal inflection and Verbal inflections are being solved one by one. I will try to provide a rough high level outline of the system as below.

  • Lexicon: This is a large collection of root words, collected and manually curated, classified into various part of speech categories. So the collection is seperated to nouns, verbs, conjunctions, interjections, loan words, adverbs, adjectives, question words, affirmatives, negations and so on. Nouns themselves are divided to pronouns, person names, place names, time names, language names and so on. Each of them get a unique tag and will appear when you analyse such words.
  • Morphotactics: Morphology rules about agglutination and inflection. This includes agglutination rules based on Samasam(സമാസം) – accusative, vocative, nominative, genitive, dative, instrumental, locative and sociative. Also plural inflections, demonstratives(ചുട്ടെഴുത്തുകൾ) and indeclinables(അവ്യയങ്ങൾ). For verbs, all possible tense forms, converbs, adverbal particles, concessives(അനുവാദകങ്ങൾ) and so on.
  • Phonological rules: This is done on top of the results from morphotactics. For example, from morphotactics, ആൽ<noun>, തറ<noun>, ഇൽ<locative> will give ആൽ<noun>തറ<noun>ഇൽ<locative>. But after the phonological treatment it becomes, ആൽത്തറയിൽ with consonant duplication after ൽ, and ഇ becomes യി.
  • Automata definition for the above: This is where you say nouns can be concatenated any number of times, following optional inflection etc in regular expression like language.
  • Programmable interface, web api, command line tools, web interface for demos.

What it can do now? Following screenshot is from its web demo. You can see complex words get analysed to its stems, inflections, tense etc.

Note that this is bidirectional. You can give a complex word, it will give analysis. Similarly when you give root words and POS tags, it will generate the complex word from it. For example:

ആടുക<v><past>കൊണ്ടിരിക്കുക<v><present> =>  ആടിക്കൊണ്ടിരിക്കുന്ന

Covering all possible word formation rules for Malayalam is an ambitious project, but let us see how much we can achieve. Now the effort is more on linguistic aspects of Malayalam than technical. I will update about the progress of the system here.

 

 

by Santhosh Thottingal at November 26, 2017 06:50 AM

November 03, 2017

Santhosh Thottingal

Eureka magazine with Manjari font

Eureka childrens science magazine now prints in Manjari font I designed. Happiness is seeing your favorite childhood magazine in your font!

by Santhosh Thottingal at November 03, 2017 04:37 AM

October 29, 2017

Santhosh Thottingal

Indesign CC automatic hyphenation for Indian languages

More and more publishers are starting to use Indesign CC and Unicode. One of the many adavantages the publishers get with unicode and Indesign cc is automatic hyphenation. A few of my friends told me that they don’t know how to use hyphenation. Eventhough I never used Indesign before, I decided to figure out. In my Windows 10 virtual machine, I installed Indesign CC 2018.

Following is a tutorial on how to get perfect hyphenation for text in Indian languages in Indesign. I use Malayalam as example.

Indesign CC 2018 comes with Hunspell hyphenation dictionaries. These hyphenation dictionaries are written by me long time back. See https://github.com/smc/hyphenation

From menu Edit-> Preferences->Dictionary, set Language and Hyphenation as “Hunspell”

Create a text frame and add content to it. Make sure that the composer is set as Adobe World-Ready paragraph composer. You can access it from Paragraph settings as shown below. Without this settings, the Indic text won’t render correctly.

Tick the “Hyphenation” from the paragraph settings. Select an appropriate font for the content. Choose the language of the content as Malayalam or other Indic language you are working on. See screenshot below. Justify the content.

The content will get automatically hyphenated. If you resize the column width or insert more content, text will get automatically hyphenated.

The exported PDF will look like:

You can see the hyphenation rules in Installation folder: C:\Program Files\Adobe\Adobe InDesign CC 2018\Resources\Dictionaries\LILO\Linguistics\Providers\Plugins2\AdobeHunspellPlugin\Dictionaries

Patterns are available for Assamese, Bengali, Panjabi, Gujarati, Assamese, Marathi, Tamil, Telugu, Odia, Kannada and Malayalam.

I have not tried older Indesign versions, so I don’t know from which version this feature is available. But I don’t see a reason for not using latest version either.

by Santhosh Thottingal at October 29, 2017 06:57 AM

Scribus gets Malayalam Hyphenation support

Scribus now has support for Malayalam hyphenation.

I filed a bug report to add Malayalam hyphenation rules to Scribus and it is now added to scribus. The hyphenation rules are based on the TeX hyphenation patterns I wrote.

How to use

You need scribus 1.5.4 or later. It is not yet available as release while I am writing this. But once released you can get from https://www.scribus.net/downloads/

  • Start a new document. Add text frames and content. You will need narrow columns to have wordbreaking contexts. For example 2 columns as I use for demo here.
  • Select the text and set font as a Malayalam font like Manjari, Set the language as Malayalam.
  • In Hyphenation properties, set hyphenation character as blank, otherwise visible hyphens will appear.
  • Set the text justified.
  • From menu Extras->Hyphenate text. Done.

Here is the output:

Hyphenated two column content

 

by Santhosh Thottingal at October 29, 2017 05:19 AM

October 14, 2017

Santhosh Thottingal

Trufont now has SVG paste, drag and drop support

TruFont the font-editing application written with Python3, ufoLib, defcon and PyQt5 now has support for pasting SVG images as glyphs. It now also support drag and dropping SVG files. Trufont

For my font design workflow I mainly use Inkscape to desgin master drawings and then use fonteditor for further editing. I am migrating the fonts we maintained to Trufont from Fontforge(It is no longer developed). But, not having SVG support with Trufont was a blocker for me. So today I filed two patches and got merged to Trufont master.

There are still some known issues. Mainly the pasted svg is vertically flipped. The editor can flip it again. But the original issue need investigation.

by Santhosh Thottingal at October 14, 2017 04:14 PM

September 17, 2017

Rajeesh K Nambiar

Switching Raspbian to Pixel desktop

Official Raspbian images based on Debian Stretch by default has the Pixel desktop environment and will login new users to it. But if you have had a Raspbian installation with another DE (such as LXDE), here are the steps to install and login to the Pixel desktop.

  1. apt-get install raspberrypi-ui-mods
  2. sed -i 's/^autologin-user=pi/#autologin-user=pi/' /etc/lightdm/lightdm.conf
  3. update-alternatives --set x-session-manager /usr/bin/startlxde-pi
  4. sed -i 's/^Session=.*/Session=lightdm-xsession/' ${USER}/.dmrc

Make sure the user’s ‘.dmrc’ file is updated with the new startlxde-pi session as that is where lightdm login manager looks to decide which desktop should be launched.

by Rajeesh at September 17, 2017 05:40 AM

March 03, 2017

Nandaja Varma

Life as a Recurser - A fraction of the whole

I have been living in New York City as one among the incredible Recurse Center community for the past four weeks. Which means one third of my time as a Recurser is behind me and I can’t think of a better occasion to blog about RC. I have been procrastinating blogging for so long, but today is the day I finally do it!

What is Recurse Center, you ask?

The Recurse Center is a free, self-directed, educational retreat for people who want to get better at programming, whether they’ve been coding for three decades or three months.

But so far, RC has been that and a lot more for me. After the first week’s panic of overwhelming experiences(living in New York City, sharing a hipster looking workspace with extremely talented people from all around the world, planning what to focus on during my time here, etc. to name a few), RC has turned out to be one of the best experiences I have ever had in my life. I learn from the nicest and coolest bunch of nerds I have ever met, I live with two amazing fellow Recursers, and I live in the biggest city in the world! What more can I ask for?!

Getting in the learning mode was not easy. After all that globe trotting, it took some serious effort to get back to programming and to decide what to work on. After a lot of talking with people and thinking, I decided to make my computer science foundations stronger first. While I was at it, I decided to pick up a new programming language. Since I was thinking about re-writing the DHT part of PeARS in another programming language, it felt right to pick up Erlang which has amazing concurrency support. Erlang syntax didn’t appeal me much, so I decided to go with Elixir and turns out it works great in a distributed setup. I have solved a lot of algorithms, implemented couple of interesting data structures, etc. in Elixir so far and I think I am getting a hang of the language. Alongside, I am catching up with all the reading that I have been putting off. I read books on Algorithms & Data Structures, High performance Python, Metaprogramming in Elixir, Purely functional Data Structures(starting to, i.e.), etc. I attend some very interesting study groups like Python study group, algorithms study group, a white paper a week study group(YES!!), 7 databases for 7 weeks study groups, etc. I am in the process of figuring out if I can write a transpiler for Python byte code to work on BEAM(The Erlang VM). It is a bit ambitious since one is stack based and the other is register based, but I am researching the possibilities and I have people around to help.

The best thing about RC isn’t just learning, it is collaborative learning. I pair programmed with people on a lot of code I wrote, and it has influenced the way I write code and the way I think while doing it. I now do TDD which I always was too lazy to do. I now think in the functional way when coding and it make even my Python code cleaner. It has been great to know and learn what other people are working on at least to know my options. An open mind is something I packed with me , but I am trying hard not to lose focus on the things I want to accomplish by myself. I sometimes feel like I don’t do much and that I get too side tracked but the fact that I keep learning and working makes me happy.

A usual Thursday evening at RC

Don’t get the wrong impression that RC is just about learning 24/7 and burning yourself out. We have a lot of social events happening throughout the week. The RC alums drops by every Thursday and that’s a lot of fun. We have game nights with Pizza and beer. And did I mention that we have a 3 day weekend? We hang out outside the RC and sometimes even do the touristic site-seeing things, although the weather hasn’t totally been in our favor. Believe it or not, it is actually snowing outside! In the middle of march! I am not complaining though. It is cherry on top of the cake for me since this is the first time I am actually seeing snow showers! Also, I take kick-boxings lessons here. So much fun!

That's just the way we roll. ;-)

So things I look forward to doing in the next 8 weeks - Spring in New York(!!) Having a lot of fun, working on my transpiler, implementing things from Purely functional data structures, attending the interview prep on Fridays, building a web development framework in Elixir, implementing a lot more of the exotic data structures and algorithms, and anything else that might please me. :)

So long!

Life as a Recurser - A fraction of the whole was originally published by Nandaja (നന്ദജ) at Skill Will Prevail on March 03, 2017.

by Nandaja (നന്ദജ) (nandaja.varma@gmail.com) at March 03, 2017 05:12 PM

December 06, 2016

Nandaja Varma

A walk(followed by sitting and sleeping on benches, and what not) to remember

I am sitting near a fireplace in the outskirts of Threlkeld, a lake district village. The view from the window is absolutely breathtaking. I can see mist covered farms grazed by sheep, snow clad mountains, winter struck trees, and tiny little stone cottages. I can’t help but contemplate my journey this past couple of weeks around the United Kingdom and Ireland and I can’t control my urge to write about how incredible an adventure it has been. Backpacking in this part of the world is very easy in terms of navigating your way through, but not so much when it comes to your budget, especially when you have Indian Rupees in your bank account(which sadly has hit its record low). But I must say, being on a low budget made my journey 10 times interesting - thanks to all the incredible people I met along the way who helped me out or shared the same low on the budget situation.

My journey this time started in Cambridge where I met the rest of the PeARS team members and had an amazing time working, meeting great people from the university, exploring Cambridge, and a lot more. We even got to go up to the roof of King’s college chapel(which is a privilege reserved to the fellows in the King’s college), attend the Friday choir service, enjoyed tea in a room where Alan Turing possibly once did the same, and a lot more things which will remain in my memories as long as my brain is in a state to remember things. After our small get-together in Cambridge, Hrishi and I set out on a journey to explore the rest of UK. We had the best of times roaming around London, walking all around Amesbury trying to have a look at Stonehenge, surviving by eating the apples we plucked out of the trees on the roadside, trying to hitch rides to save money, walking around the bays in Cardiff, getting to the port by bus to catch a ferry to Ireland in the freezing cold in late hours, and crashing with some awesome cool people in Dublin who were angels in our low budget situation. From there I had to continue my journey alone since he had to get back home. I traveled around Ireland hitching, trekking, couchsurfing, drinking, and doing all kinds of fun stuff. Ireland was a mesmerizing beauty. I think Ireland is definitely the most beautiful place I have ever seen in my life and Irish are the nicest of people I have ever met in my life. From Belfast in Northern Ireland, I again took a ferry to a port in Scotland. I stayed with an amazing bunch of backpackers in Glasgow, then road-tripped around the Scottish highlands from Inverness with a couple of guys I met in the hostel, and then reached Edinburgh.

All I wrote so far was a build up for the beautiful, mighty old Edinburgh. It is not because Edinburgh was the best of my whole trip, but the most interesting thing happened in Edinburgh. So here it goes…

After my incredible journey around the Scottish highlands, I arrived at Edinburgh after a 5 hour bus journey - tired, famished and desperately in need for a bed. I walked half an hour from the bus station to get to the hostel I’ve reserved a bed in, for 8 pounds a night. As it turned out I got all the dates wrong and booked it for the next day. So there I was, on a Saturday night in Edinburgh with no place to stay and freezing. Initially I thought it could be nice thing after all. It is one of the safest cities in the world. I could just walk about the whole night, enjoying the night life, saving the 8 Pounds, and check in the next day. Boy, was I wrong! I got out, started walking outside to realize how cold it was. My spirits went down and I decided to try in some other hostels in the area. I started hopping from one hostel to another only to realize that most places were fully booked due to the weekend tourists and the rest were charging humongous sums of money for a night(The highest was 50 pounds a night! For a bed! In a hostel!). To tell you the truth, I didn’t have that much money left for the rest of my trip. So I again started walking and came face to face with the harsh truth that I was stranded in the city of Edinburgh in that bone chilling cold. After what felt like an eternity of walking(my watch said I only walked for an hour, but I was so damn sure that it was wrong!), I heard an angel singing “Shine on you crazy diamond” from somewhere. I walked and walked to find out where the angel is singing from to finally arrive in front of a pub where there was live music going on. There were bouncers in front of it, from whom I got to know that the pub is open till 3 O’clock in the morning. A roof above my head till 3AM! I immediately went inside and ordered myself the cheapest beer they had, which wasn’t that cheap though. Then the party crowd arrived, most of them were very drunk(so very nice). I started talking and even dancing, and more drinking. In effect I spent more than what I would have spent in my supposed hostel booking. The crowd was too much and I decided to get out. From one of the bouncers, I got to know that the railway station will be open all night and I can sleep on a bench there. I reached the railway station at half past 2 only to realize that they close it at 12 O’clock.

I started walking again, badly in need to urinate and saw a McDonald’s that was open and packed with a lot of drunken Scottish people. I went straight in, ordered myself a cheeseburger and a coffee(I was hungry and cold) and used their facilities. Even McDonald’s had bouncers and I started talking with them. I helped them in cleaning after the mess the party people made. But unfortunately it was about to close and I had to hit the road again. The situation outside was very interesting. People shouting, throwing up, snogging, peeing, etc. I found a nice little bench near a park and decided to spend the rest of the night there. It was almost half past 3 by then. I took out my Kindle and started reading Notes from a small Island by Bill Bryson. He always brings my spirits up. Ironically, the part that my Kindle showed was the one when Bill Bryson is in Dover, with no place to stay and trying to get some sleep on a bench. The Universe was screwing with me big time.

Drunken Scottish people offer all kind of things, if they see you on a roadside bench in the middle of the night. These things can range from drinks, smokes, drugs, kisses, to more non-materialistic things like peace of mind, enlightenment, and ecstasy(or does it belong to the first category?), but no one offered a place to stay. I decided to walk again because it was too cold just to sit there. I walked and walked. I helped people with useful information like where they can throw-up, urinate, find a bus, etc. because I have seen them all. I finally arrived at another McDonald’s which is open 24 hours. The place was again packed, but I decided to get another coffee and sit there till dawn. I got a coffee and found a place in the far end so no one would notice. To my dismay, a bouncer did notice and asked me to leave(it was very humiliating, of course). I again started walking. The clock was hitting 5 now. Almost there, I thought. I saw a guy walking with a lot of takeaway food and soft drinks in his hand(more than what he could handle), biting on his room key. I asked him if he wanted some help. He said yes and immediately asked how much do I want. He thought I was a homeless person trying to get some money from somewhere for my next beer. I politely let him know that I was just homeless for that particular night and he accepted my offer to help him for free(Should have charged him 10 pounds!). When we were walking he said they were having an after party at his place and asked me to join them. How bad can it be, I decided to go in. His friends were 5 Scots in their early 20s, 2 girls and 3 boys, who were very drunk. Two girls were spooning with a guy, and one of the girls was topless. Another guy was wearing the girl’s bra and dancing. It was a very interesting group of people. They offered me some expensive drink which I politely declined. One of the girls started talking to me how much she wants to do “world travel”, which is the reason why she dropped out of colleges thrice. I listened to all of their ramblings till half past 6 and then took my leave(to my great relief). It was morning, and birds were chirping and my miseries were coming to an end. I went back to my first hostel and they let me in to the warmth of their sitting room. And I survived the night…

If you’re still with me, please get your dates right and Don’t Panic!

I only have one more week to go on this trip, so I doubt if I will have such interesting experiences in the days to come. Thank goodness for that! Bye for now. I should get back to my tea and the incredible view now.

Peace.

A walk(followed by sitting and sleeping on benches, and what not) to remember was originally published by Nandaja (നന്ദജ) at Skill Will Prevail on December 06, 2016.

by Nandaja (നന്ദജ) (nandaja.varma@gmail.com) at December 06, 2016 05:12 PM

November 15, 2016

Rajeesh K Nambiar

Improvement in converting video/audio files with VLC

VLC Media Player has the ability to convert video/audio files into various formats it supports, since a long time. There is a dedicated “Convert/Save” menu for converting single or multiple files at once into a different format, with limited ‘editing’ features such as specifying a start time, caching options etc. It is quite useful for basic editing/cropping of multimedia files.

As an example, one of the easiest ways to create a custom iPhone ringtone is to create a “.m4r” (AAC format) file exactly 40 seconds long. It is a matter of selecting your favourite music file and doing a “Convert/Save” with appropriate “Profile”. A “Profile” specifies the video/audio encoding to be used, which can be easily customized by selecting different audio and video codecs.

The options “Caching time”, “Play another media synchronously” (think adding different sound track to a video clipping) and a “Start time” etc can be specified under “Show more options” button and even more advanced functionality is available by making use of the “Edit Options” line. Internally, all the options specified at this line are passed to the converter.

There was one thing lacking in this “Convert/Save” dialog though – there was no possibility to specify a “Stop Time” akin to the “Start Time”, in the GUI (although it can be manually specified in the “Edit Options”, but you need to calculate the time in milliseconds). VLC 2.x series convert looks like as follows – notice the lack of “Stop time”:

vlc-convert-old

Being bugged by this minor annoyance, I set out to add the missing “Stop-time” functionality. Going through the codebase of VLC, it was relieving to see that the converter backend already supports “:stop-time=” option (akin to “:start-time=”). It was then a matter of adding “Stop Time” to the GUI and properly updating the “Edit Options” when user changes the value.

A working patch was then sent to vlc-devel mailing list for review and feedback. After 5 rounds of review and constructive feedback from Filip Roséen the code was cleaned up (including existing code) which is now committed to the master branch. This functionality should be available to users in the upcoming 3.0 release. Screenshot below:

vlc-convert-stop-time

 

by Rajeesh at November 15, 2016 06:05 PM

October 30, 2016

Nandaja Varma

Subjecting myself to a potential failure

I am at a point now where I do not officially have a job. The work I do right now demands a better grasp on many programming fundamentals and on a programming language other than Python - preferably JVM based. I am not a very good programmer or exceptionally intelligent. So I have decided to invest a huge deal of time and effort to learn what I should have learned ages ago.

Learning Lisp or one of its dialects by reading SICP is something that I have tried a zillion times starting from my second year in college and have failed miserably. Another major endeavor that I failed at was trying to implement all basic and advanced data structures using that Lisp dialect I just mentioned(Haskell was the obvious choice back then because, Hey, Lambda calculus!).

I have decided to try this again this time. I got really inspired by Eli Bendersky’s notes that he made during his ‘taming SICP’ endeavor. I am going to start SICP again by solving the problems in Clojure instead of Scheme. Why I’m so caught upon a Lisp, you ask? The first thing I ever read about functional programming languages is the way, in pure functional programming, the programs are written with such abstraction that it is like a black box, doing operation without worrying or caring about the outside world, without any side effects. This sounded very appealing to me. Also, someone told me Lisp is not for me, but for real geeks(I promise, this is just a tertiary or a quaternary reason).

I think Clojure is a beautiful looking language. Fell in love with the way it binds arguments to functions using square brackets at the very first sight. It is JVM based and suits perfectly for the PeARS requirements. Also, I was mind blown by the talks given by Rich Hickey. Hands down the best programming talks I’ve ever listened to(Check it out if you haven’t yet. Please. Starting here, maybe?).

So this is me putting myself out there. The game plan is laid:

  • Start reading the book, of course.
  • Solve the problems using Clojure.
  • Implement some algorithms and data structures in Clojure(I started learning Clojure a bit back).
  • Make notes every weak and publish. Even if it is - “I am such a loser, didn’t do anything this week”.



Cheers!

P.S. The title might come out very pessimistic. But I love it when I rise above the expectations, even when it’s my own. :-)

Subjecting myself to a potential failure was originally published by Nandaja (നന്ദജ) at Skill Will Prevail on October 30, 2016.

by Nandaja (നന്ദജ) (nandaja.varma@gmail.com) at October 30, 2016 05:12 PM

September 07, 2016

Balasankar C

SMC/IndicProject Activities- ToDo List

Heya,

So, M.Tech is coming to an end I should probably start searching for a job soon. Still, it seems I will be having a bit of free time from Mid-September. I have got some plans about the areas I should contribute to SMC/Indic Project. As of now, the bucket list is as follows:

  1. Properly tag versions of fonts in SMC GitLab repo - I had taken over the package fonts-smc from Vasudev, but haven’t done any update on that yet. The main reason was fontforge being old in Debian. Also, I was waiting for some kind of official release of new versions by SMC. Since the new versions are already available in the SMC Fonts page, I assume I can go ahead with my plans. So, as a first step I have to tag the versions of fonts in the corresponding GitLab repo. Need to discuss whether to include TTF file in the repo or not.
  2. Restructure LibIndic modules - Those who were following my GSoC posts will know that I made some structural changes to the modules I contributed in LibIndic. (Those who don’t can check this mail I sent to the list). I plan to do this for all the modules in the framework, and to co-ordinate with Jerin to get REST APIs up.
  3. GNOME Localization - GNOME Localization has been dead for almost two years now. Ashik has shown interest in re-initiating it and I plan to do that. I first have to get my committer access back.
  4. Documentation - Improve documentation about SMC and IndicProject projects. This will be a troublesome and time consuming task but I still like our tools to have proper documentation.
  5. High Priority Projects - Create a static page about the high priority projects so that people can know where and how to contribute.
  6. Die Wiki, Die - Initiate porting Wiki to a static site using Git and Jekyll (or any similar tool). Tech people should be able to use git properly.

Knowing me pretty much better than anyone else, I understand there is every chance of this being “Never-being-implemented-plan” (അതായത് ആരംഭശൂരത്വം :D) but still I intend to do this in an easy-first order.

September 07, 2016 04:47 AM

September 04, 2016

Nandaja Varma

Backpacking In And Around North India

When I quit my job, everyone kept asking - ‘So, what next?’ to which I politely replied - ‘Nothing decided yet’. In my mind, I already had the plan though. I wanted to travel. Where to? I knew that as well. I wanted to explore my vast and diverse country. I thought I will finish off my country first and then will go on to the South-East Asian countries, as if it is even possible to “finish off” seeing India. I have been on the road for a couple of months now and I haven’t even scratched the surface of the country. Every single place I have been so far had some novelty or other to offer. Looking back, this journey has been life changing. The experiences I have had - some good and some bad, gave me a lot of perspective on my own life. Moreover, I realised how perfectly content I am just being with myself. I always used to think I was an introvert. I was a bit shy in public and never felt comfortable in a big group. Travelling alone, when I got so comfortable being with myself, I started getting along with people so well. Someone wise once told me - ‘If you are not happy being with yourself, you can never be happy being with other people’. I understand what he meant now.

Every single memory I made will be cherished eternally, but still I do have some favourites. Like sleeping in a railway station platform, riding a bike up in the Himalayas, the trek I did in Pokhara, the 10 day silent Vipassana meditation retreat(story for a whole new blog post),celebrating my birthday with some total strangers I met, hitchhiking in the outskirts of Nepal, the delicious Tibetan bread that a lady prepared for me inviting me to her abode in Darjeeling, etc. Unlike my last trip, this time I was working alongside on the project PeARS[product placement] and it felt so relaxing working by looking at ever changing landscapes.

Only one word that can describe my feeling when this picture was taken - Liberation! And that makes this picture my favourite.

All the stories you hear growing up, how people are so bad in the country, why a woman should never travel alone, etc. makes you so prejudiced about people. I didn’t even have a single bad experience of that sort (I, of course, travelled sensibly and cautiously not to get into trouble). People turned out to be really nice and most of the localites were curious to know why a girl from Kerala is travelling alone in the North. I went completely mental one day and decided to take a local bus from the India-Nepal border to Varanasi. The journey took almost 12 hours and was extremely uncomfortable. I was the only woman in the bus. The bus driver and the conductor made sure I never felt insecure. They made me sit in the front seat, kept talking to me whenever they could and even offered me tea. Well, my faith in humanity is restored, folks!

People have asked me before why I like travelling. I hated that question. One of my friends said travelling is a fad. Truth be said, I didn’t have anything to say in my defence. I think I now have an answer. How sometimes you start feeling that your life has started following a routine and everything has kind of become static, is something I could never make peace with. And I think my initial attraction towards travelling was the absence of this feeling. Once I started travelling, I started to enjoy observing the day-to-day life of people. People leading their lives in manners completely different from the way we do back home. The way of life is heavily influenced by the local culture, the climate, the heredity, religion, and what not. You learn so much about the history of a place just by observing the common people. On top of it all, the fellow travellers you meet, with so many adventures of their own to talk about, leave you inspired and wanderlust-ing more.

In four days, I am heading back home and I feel like I had to end this journey rather abruptly, like I haven’t had enough of it. But I think, I will never have enough of it because road is where I belong, with changing images but one unchanging theme - Life. I return home, not with a heavy heart but with a lot of unforgettable experiences, to plan my next adventure.



Namaste!

Backpacking In And Around North India was originally published by Nandaja (നന്ദജ) at Skill Will Prevail on September 04, 2016.

by Nandaja (നന്ദജ) (nandaja.varma@gmail.com) at September 04, 2016 05:12 PM

August 23, 2016

Anwar N

GSoC 2016 IBus-Braille-Enhancement Project - Summary

Hi,
   First of all my thanks to Indic Project and Swathanthra Malayalam Computing(SMC) for accepting this project. All hats off to my mentors Nalin Sathyan and Samuel Thibault. The project was awesome and I believe that I have done my maximum without any prior experience

Project Blog : http://ibus-braille-enhancement.blogspot.in/


Now let me outline what we have done during this period.

Braille-Input-Tool (The on-line version)
  Just like Google transliteration or Google Input Tools online. This is required because it's completely operating system independent and it's a modern method which never force user to install additional plugin or specific browser. The user might use this form temporary places like internet cafe. This is written using JQuery and Html. And works well in GNU/Linux, Microsoft windows, Android etc

See All Commits : https://github.com/anwar3746/braille-input/commits/gh-pages
Test with following link : http://anwar3746.github.io/braille-input/


IBus-Braille enhancements
See All Commits : https://gitlab.com/anwar3746/ibus-braille/activity

1 IBus-Braille integrated with Liblouis : The Liblouis software suite provides an open-source braille translator, back-translator and formatter for a large number of languages and braille codes. So maintaining and shipping separate braille maps(located at /share/ibus-sharada-braille/braille) with ibus-braille is a bad idea. Through this we completely adopted Ibus-Braille to use Liblouis. The conversion is done in an entire word manner instead of each letter. ie the conversion does after writing direct braille unicode and pressing space.
Commit 1 : https://gitlab.com/anwar3746/ibus-braille/commit/6826982fa39cbd2e155bfb389658e16cc57b0dae
Commit 2 : https://gitlab.com/anwar3746/ibus-braille/commit/7032cf7b0c8cea7ce6c619c39750f5110effcfa3
Commit 3 : https://gitlab.com/anwar3746/ibus-braille/commit/46ec83a1caab75b2b25bbd06e1156d927b33c211

See Picture of Ibus-Braille preferences given below

2 8-Dot braille Enabled : Yes languages having more than 64 characters which can't be handled with 64 (6 dots ) combination are there, Music notations like  “Abreu” and LAMBDA (Linear Access to Mathematics for Braille Device and Audio Synthesis) uses 8-dot braille system.  unicode support 8-dot braille.
Commit 1 : https://gitlab.com/anwar3746/ibus-braille/commit/54d22c0acbf644709d72db076bd6de00af0e20b9

See key/shortcut page picture of ISB preferences dot setting

3 Dot 4 issue Solved :  In IBus-Braille when we type in bharati braille such as Malayalam, Hindi, etc. we have to use 13-4-13 to get letter ക്ക(Kka). But according to braille standard in order to get EKKA one should press 4-13-13. And this make beginners to do extra learning to start typing. Through this project we solved this issues and a conventional-braille-mode switch is provided in preferences in order to switch between.

Commit : https://gitlab.com/anwar3746/ibus-braille/commit/089edca78d31355c3ab0e08559f0d9fe79929de6

4 Add Facility to write direct Braille Unicode : Now one can use IBus-Braille to type braille dot notation directly with the combination.  The output may be sent to a braille embosser. Here braille embosser is an impact printer that renders text in braille characters as tactile braille cells.

Commit : https://gitlab.com/anwar3746/ibus-braille/commit/4c6d2e3c8a2bbe86e08ca8820412201a52117ad1


5 Three to Six for disabled people with one hand : A three key implementation which uses delay factor between key presses for example 13 followed by
13 having delay less than delay factor (eg:0.2) will give X. If more, then output would be KK. If one want to type a letter having combination only 4,5,6 he have to press "t" key prior. The key and the Conversion-Delay can be adjusted from preferences.

Commit : https://gitlab.com/anwar3746/ibus-braille/commit/dda2bd83ba69fb0a0f6b526a940bc878bf230485

6 Arabic language added
Commit : https://gitlab.com/anwar3746/ibus-braille/commit/bd0af5fcfabf891f0b0e6649a3a6c647b0d5e336

7 Many bugs solved
Commit : https://gitlab.com/anwar3746/ibus-braille/commit/da0f0309edb4915ed770e9ab41e4355c2bd2c713
others are implied

Project Discourse : https://docs.google.com/document/d/16v-BMLLzWmzbo1n5S-wDTnUmFV-cwhoon1PeJ0mDM64/edit?usp=sharing
IBus-Sharada-Braille (GSoC 2014) : http://ibus-sharada-braille.blogspot.in/

Plugins for firefox and chrome
    This plugin can be installed will work with every text entry on the web pages no need for copy paste. extensions are written in Javascript.
See All Commits : https://github.com/anwar3746/braille-browser-addons/commits/master


Modification yet desirable are as following

1 Announce extra information through Screen Reader:  When user expand abbreviation or a contraction having more than 2 letters is substituted the screen reader is not announcing it. We have to write a orca(screen reader) plugin for Ibus-Braille

2 A UI for Creating and Editing Liblouis Tables

3 Add support for more Indic Languages and Mathematica Operators via liblouis

Braille-input-tool (online version)
                             
                       Liblouis integration
Conventional Braille, Three Dot mode and Table Type selection 
Chrome Extension

Direct braille unicode typing
 Eight dot braille enabled

by Anwar N (noreply@blogger.com) at August 23, 2016 04:39 AM

August 22, 2016

malayaleecoder

GSoC — Final Report!

So finally it’s over. Today is the last date for submission of the GSoC project. This entire ride was a lot informative as well as an experience filled one. I thank Indic Project organisation for accepting my GSoC project and my mentors Navaneeth K N and Jishnu Mohan for helping me out fully throughout this project.

The project kicked off keeping in mind of incorporating the native libvarnam shared library with the help of writing JNI wrappers. But unfortunately the method came to a stall when we were unable to import the libraries correctly due to lack of sufficient official documentations. So my mentor suggested me an alternative approach by making use of the Varnam REST API. This has been successfully incorporated for 13 languages with the necessity of the app requiring internet connection. Along with it, the suggestions which come up are also the ones returned by Varnam in the priority order. I would be contributing further to Indic Project to make the library method work in action. Apart from that see below the useful links,

  • this and this is related to adding a new keyboard with “qwerty” layout.
  • this is adding a new SubType value and a method to identify TransliterationEngine enabled keyboards.
  • this is adding the Varnam class and setting the TransliterationEngine.
  • this and this deals with applying the transliteration by Varnam and returning it back to the keyboard.
  • this is the patch to resolve the issue, program crashes on switching keyboards.
  • this makes sure that after each key press, the displayed word is refreshed and the transliteration of the entire word is shown.
  • this makes sure that on pressing deletion, the new word in displayed.
  • this creates a template such that more keyboards can be added easily.
  • this makes sure that the suggestions appearing are directly from the Varnam engine and not from the inbuilt library.
  • The lists of the commits can be seen here which includes the addition of layouts for different keyboards and nit fixes.

Add Varnam support into Indic Keyboard

https://medium.com/media/30df9a95b2ac8d2171a7e7a1d00fe0ad/href

The project as a whole is almost complete. The only thing left to do is to incorporate the libvarnam library into the apk and then we can call that instead of the Varnam class given here. The ongoing work for that can be seen below,

malayaleecoder/libvarnam-Android

//Varnam
varnamc -s ml -t "Adutha ThavaNa kaaNaam" //See you next time

by Vishnu H Nair at August 22, 2016 08:06 PM

Sreenadh T C

It’s a wrap!

“To be successful, the first thing to do is to fall in love with your work — Sister Mary Lauretta”

Well, the Google Summer of Code 2016 is reaching its final week as I get ready to submit my work. It has been one of those best three-four months of serious effort and commitment. To be frank, this has to be one of those to which I was fully motivated and have put my 100%.

Well, at first, the results of training wasn’t that promising and I was actually let down. But then, me and my mentor had a series of discussions on submitting, during which she suggested me to retrain the model excluding the data set or audio files of those speakers which produced the most errors. So after completing the batch test, I noticed that four of the data set was having the worst accuracy, which was shockingly below 20%. This was causing the overall accuracy to dip from a normal one.

So, I decided to delete those four data set and retrain the model. It was not that of a big deal, so I thought its not gonna be drastic change from the current model. But the result put me into a state of shock for about 2–3 seconds. It said

TOTAL Words: 12708 Correct: 12375 Errors: 520
TOTAL Percent correct = 97.38% Error = 4.09% Accuracy = 95.91%
TOTAL Insertions: 187 Deletions: 36 Substitutions: 297
SENTENCE ERROR: 9.1% (365/3993) WORD ERROR RATE: 4.1% (519/12708)

Now, this looks juicy and near to perfect. But the thing is, the sentences are tested as they where trained. So, if we change the structure of sentence that we ultimately give to recognize, it will still be having issues putting out the correct hypothesis. Nevertheless, it was far more better than it was when I was using the previous model.

So I guess I will settle with this for now as the aim of the GSoC project was to start the project and show proof of that this can be done, but will keep training better ones in the near future.

Google Summer of Code 2016 — Submission

  1. Since the whole project was carried under my personal Github repository, I will link the commits in it here : Commits
  2. Project Repository : ml-am-lm-cmusphinx
  3. On top of that, we (me and the organization) had a series of discussions regarding the project over here: Discourse IndicProject
https://medium.com/media/9e8990c8b26cb11e147e0d3e4c5642a7/href

Well, I have been documenting my way through the project over here at Medium starting from the month of May. The blogs can be read from here.

What can be done in near future?

Well, this model is still in its early stage and is still not the one that can be used error free, let alone be applied on applications.

The data set is still buggy and have to improved with better cleaner audio data and a more tuned Language Model.

Speech Recognition development is rather slow and is obviously community based. All these are possible with collaborated work towards achieving a user acceptable level of practical accuracy rather than quoting a statistical, theoretical accuracy.

All necessary steps and procedure have been documented in the README sections of the repository.

puts "thank you everyone!"

by Sreenadh T C at August 22, 2016 07:01 AM

August 21, 2016

Arushi Dogra

GSoC Final Report

Its almost the end of the GSoC internship. From zero knowledge of Android to writing a proposal, proposal getting selected and finally 3 months working on the project was a great experience for me! I have learned a lot and I am really thankful to Jishnu Mohan for mentoring throughout .

Contributions include :-

All the tasks mentioned in the proposal were discussed and worked upon.

Layouts 
I started with making the designs of the layouts. The task was to make Santali Olchiki and Soni layouts for the keyboard. I looked at the code of the other layouts to get a basic understanding of how phonetic and inscript layouts work. Snapshot of one of the view of Santali keyboard :

Screen Shot 2016-08-21 at 6.53.03 PM

Language Support Feature 
While configuring languages, the user is prompted about the locales that might not be supported by the phone.

Screen Shot 2016-08-21 at 6.33.25 PM

Adding Theme Feature
Feature is added at the setup to enable user to select the keyboard theme

Screen Shot 2016-08-21 at 6.49.21 PM

Merging AOSP code
After looking at everything mentioned in the proposal, Jishnu  gave me the job of  merging AOSP source code to the keyboard as the current keyboard doesn’t have changes that were released along with  android M code drop because of which target sdk is not 23 . There are a few errors yet to be resolved and I am working on that 😀

Overall, it was a wonderful journey and I will always want to be a contributor to the organisation as it introduced me to the world of open source and opened a whole new area to work upon and learn more.
Link to the discourse topic : https://discourse.indicproject.org/t/indic-keyboard-project/45

Thank You!  😀


by arushidogra at August 21, 2016 01:29 PM

August 17, 2016

Balasankar C

GSoC Final Report

Heya,

It is finally the time to wind up the GSoC work on which I have been buried for the past three months. First of all, let me thank Santhosh, Hrishi and Vasudev for their help and support. I seem to have implemented, or at least proved the concepts that I mentioned in my initial proposal. A spell checker that can handle inflections in root word and generate suggestion in the same inflected form and differentiate between spelling mistakes and intended modifications has been implemented. The major contributions that I made were to

  1. Improve LibIndic’s Stemmer module. - My contributions
  2. Improve LibIndic’s Spell checker module - My contributions
  3. Implement relatively better project structure for the modules I used - My contributions on indicngram

1. Lemmatizer/Stemmer

TLDR

My initial work was on improving the existing stemmer that was available as part of LibIndic. The existing implementation was a rule based one that was capable of handling single levels of inflections. The main problems of this stemmer were

  1. General incompleteness of rules - Plurals (പശുക്കൾ), Numerals(പതിനാലാം), Verbs (കാണാം) are missing.
  2. Unable to handle multiple levels of inflections - (പശുക്കളോട്)
  3. Unnecessarily stemming root words that look like inflected words - (ആപത്ത് -> ആപം following the rule of എറണാകുളത്ത് -> എറണാകുളം)

The above mentioned issues were fixed. The remaining category is verbs which need more detailed analysis.

Long Version

A demo screencast of the lemmatizer is given below.

So, comparing with the existing stemmer algorithm in LibIndic, the one I implemented as part of GSoC shows considerable improvement.

Future work

  1. Add more rules to increase grammatical coverage.
  2. Add more grammatical details - Handling Samvruthokaram etc.
  3. Use this to generate sufficient training data that can be used for a self-learning system implementing ML or AI techniques.

2. Spell Checker

TLDR

The second phase of my GSoC work involved making the existing spell checker module better. The problems I could identify in the existing spell checker were

  1. It could not handle inflections in an intelligent way.
  2. It used a corpus that needed inflections in them for optimal working.
  3. It used only levenshtein distance for finding out suggestions.

As part of GSoC, I incorporated the lemmatizer developed in phase one to the spell checker, which could handle the inflection part. Three metrics were used to detect suggestion words - Soundex similarity, Levenshtein Distance and Jaccard Index. The inflector module that was developed along with lemmatizer was used to generate suggestions in the same inflected form as that of original word.

Long Version

A demo screencast of the lemmatizer is given below.

3. Package structure

The existing modules of libindic had an inconsistent package structure that gave no visibility to the project. Also, the package names were too general and didn’t convey the fact that they were used for Indic languages. So, I suggested and implemented the following suggestions

  1. Package names (of the ones I used) were changed to libindic-. Examples would be libindic-stemmer, libindic-ngram and libindic-spellchecker. So, the users will easily understand this package is part of libindic framework, and thus for indic text.
  2. Namespace packages (PEP 421) were used, so that import statments of libindic modules will be of the form from libindic.<module> import <language>. So, the visibility of the project ‘libindic’ is increased pretty much.

August 17, 2016 04:47 AM

August 16, 2016

Anwar N

IBus-Braille Enhancement - 3

Hi,
 A hard week passed!

1 Conventional Braille Mode enabled : Through this we solved dot-4 issue and now one can type using braille without any extra knowledge

commit 1 : https://gitlab.com/anwar3746/ibus-braille/commit/089edca78d31355c3ab0e08559f0d9fe79929de6

2 handle configure parser exceptions : corrupted isb configuration file can make it won't start. so I solved this by proper exception handling

commit 2 : https://gitlab.com/anwar3746/ibus-braille/commit/da0f0309edb4915ed770e9ab41e4355c2bd2c713

3 Liblouis integration : I think our dream is about to come true!  But still also we are struggling with vowel substitution on the middle.
commit 3 : https://gitlab.com/anwar3746/ibus-braille/commit/6826982fa39cbd2e155bfb389658e16cc57b0dae
commit 4 : https://gitlab.com/anwar3746/ibus-braille/commit/46ec83a1caab75b2b25bbd06e1156d927b33c211
commit 5 : https://gitlab.com/anwar3746/ibus-braille/commit/7032cf7b0c8cea7ce6c619c39750f5110effcfa3

by Anwar N (noreply@blogger.com) at August 16, 2016 08:35 PM

August 09, 2016

Sreenadh T C

What now?

“Now that the basic aim was fulfilled, what more can we work on, given there is almost half a month to GSoC Submission!”

Well, as of now the phoneme transcription was done purely based on the manner the word was written and not completely based on the Speech pattern. What I mean is that there are some exceptions in how we write the word and pronounce it (differently). This was pointed out by Deepa mam. She also asked if I could possibly convert some of the existing Linguistic rules(algorithms) that was made with Malayalam TTS in mind, so that it could be used to re-design the phoneme transcription. This could also turn out to be helpful for future use like using it for a fully intelligent Phoneme Transcriber for Malayalam Language Modeling.

This is what we are working on right now, and am literally like scratching my head over some loops in Python!

juzzzz jokinnn
The basic idea is to iterate over each line in the ‘ml.dic’ file and validate the transcription I made earlier with the set of rules. Correcting them (if found invalid) as it goes over.

Seems pretty straight forward! Will see how it goes!

Update — 4th August

Wew!, This is going nuts! OK so I first tried using Lists to classify the different types of phones. It all was good, until I reached a point in algorithm where I have to check if the current phoneme in the transcription is a member of a particular class of phoneme ( now, when I say, class of Phoneme, I just mean, the classification and not the class ). Of course I can search in List for the presence of the element and its quite sufficient enough to say in small comparisons. Our case is different. We are talking about around 7000 words in a file, on top of which each line will have significant amount of if-elif clauses.

This could slow down things and make the script less efficient ( will eventually see the difference ). So I went back to Python documentation and read about the Set Types ( set and frozenset )

A set object is an un-ordered collection of distinct hashable objects. Common uses include membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference. — said the Python doc.

This is exactly what I wanted. I mean, I don’t have to do any manipulation to the phoneme classes, so there is no real meaning in using a List. Furthermore, the Set supports the ‘in’ using which the membership can be checked with no additional searching procedure. How cool is that!

here!

Update — 9th August

So, after some test on the script, I generated the dictionary file once again, this time applying some of the TTS rules. Now the SphinxTrain is running with this dictionary file. Hopefully, there should be some change in the accuracy.!

left panel with new dictionary, right panel with old dictionary
left panel with new dictionary, right panel with old dictionary

This might as well be the last development phase update if all goes well. Then it is submission time.

puts 'until then ciao'

by Sreenadh T C at August 09, 2016 01:56 PM

Anwar N

IBus-Braille Enhancement - 2

Hi, with this week I where fighting with my final semester exams! and it's over.  Also within this week I added the facility for typing direct braille Unicode.

https://gitlab.com/anwar3746/ibus-braille/commit/4c6d2e3c8a2bbe86e08ca8820412201a52117ad1

instead of converting to Unicode I added it as a new language so that one can later edit and use. 

by Anwar N (noreply@blogger.com) at August 09, 2016 03:40 AM

July 31, 2016

malayaleecoder

GSoC Progress — Week 8 & 9

Awesome, something good is happening :)

Cmake was giving me some trouble in the beginnning. After clearing all the dependency issues with the Cmake example, I was successfully able to run the endless-tunnel on my phone. Following the similar pattern of how the modules are being incorporated in the cmake app, we tried to incorporate the varnam module. The code for the attempt is given here.

Now there comes a problem :| I have documented the issue here,

Adding a new native module using CMake yields "Error: exception during working with external system:"

After 9 days, there has still not been a single response :( So as an alternative we have decided to use the varnam API. I have completed the class for the same and is yet to link to the Keyboard input from the Indic Keyboard app. This part is the agenda for the next week.

//Pascal
program HelloWorld(output);
begin
writeln("That's all for now, see you next time!")
end.

by Vishnu H Nair at July 31, 2016 04:53 PM

Sreenadh T C

‘He’ just recognized what I said!

Yipeeee!!
Well, the title says it all. The computer just recognized what I said in my Mother Tongue! A major step in the right the direction.

For this to happen, I had to complete the Acoustic Model training. So then!

What is Acoustic Model!

Well it is a set of statistical representational parameters used to learn the language by representing the relation between audio signal and corresponding linguistic features that make up that speech or audio ( phoneme, and transcription! ).

To produce these we need to set up a database structure as documented by the CMU SphinxTrain team. Some of these files were common to the Language Model preparation like the phoneme transcription file. After setting up the database it should look like this irrespective of the language!

The training is straight forward if you get the database error free which was not my case! Thank you! ( ** if you get it error free on the first run, you are probably doing it wrong! ** )

I had to solve two issues ( 1 and 2 ) before I could run the training without any hiccups! It took a day to make the patch works in the files. The documentation didn’t mention that the phone set should contain a maximum of 255 phones due to practical limitation though theoretically it had no problems ( found out myself from the CMU help forums. ). That was the Issue : Reduce phoneset to a max of 255 #31. I successfully reduced to what is found in the current repository version.

Update — July 27

Acoustic Model is ready for testing!

How??!!
$ sphinxtrain -t ml setup

This command will setup the ‘etc’ and ‘wav’ folder as mentioned above. Now we need to setup the sphinx_train.cfg config file which is excellently documented by the team.

Once that is out of the way, run the training.

$ cd ml

and,

$ sphinxtrain run

and,

wait!!

..

….

still waiting!!

.

..

Finally its done! Took quite a lot of time!

Not only that, my Zenbook finally started showing heating and fan noise. That sixth gen Intel needed some extra air! ( ** nice! ** ).

Update — July 29

Well, this means, the GSoC 2016 aim have been achieved which was to develop the Language Model and Acoustic Model. Only thing left is to keep testing it.

The discussion with Deepa mam helped in bringing out a possibility in improving the accuracy which am working on as a branch in parallel to the testing.

With that in mind for the coming week, that’s it for this week

puts "until then ciao!"

by Sreenadh T C at July 31, 2016 07:46 AM

July 30, 2016

Anwar N

IBus-Braille Enhancement - 1

Hi,
  This week I forked IBus-Braille project from SMC GitLab repository  added two things.

1 Eight-Dot braille enabled. Now one can add languages with 8 dot's. The default keys are Z for dot 7 and period for dot-8. This can be remapped using preferences. 
https://gitlab.com/anwar3746/ibus-braille/commit/54d22c0acbf644709d72db076bd6de00af0e20b9


2 Arabic Language added and tested with users
https://gitlab.com/anwar3746/ibus-braille/commit/bd0af5fcfabf891f0b0e6649a3a6c647b0d5e336

See commits : https://gitlab.com/anwar3746/ibus-braille/commits/master

by Anwar N (noreply@blogger.com) at July 30, 2016 03:23 AM

July 26, 2016

Arushi Dogra

Updates on work

My next task was to show instead of all layouts, filter them on the basis of language. My first option I decided to do filtering based on locale. So instead of ACTION_INPUT_METHOD_SUBTYPE_SETTINGS we can use ACTION_LOCALE_SETTINGS but the problem here was that it was giving a list of all the locales in the system instead of the locales in our app. So I skipped this idea. And then decided to create a list and enable users selection on that. But there was no way to connect that to enabled system subtypes. I was stuck on this for quite some time .We ditched the plan and moved on to the “Theme selection” task.

I am currently working on the Theme Selection task . I have successfully added the step . But now I am working on adding the fragment instead of the whole activity . After I am done with this, I will move to adding the images of the themes. I will hopefully complete this task by the weekend.

Also , after a meeting with the mentor, it is decided that after this task I will work on merging AOSP source code to the keyboard as the current keyboard doesn’t have changes that were released along with  android M code drop because of which target sdk is not 23 . So my next task will be merging AOSP code which will give the benifit of run time permissions. 😀


by arushidogra at July 26, 2016 12:34 AM

July 25, 2016

Balasankar C

4 Days. 22 Hours. LaTeX.

Heya folks,

One of the stuff I love doing is teaching what I know to others. Though it is a Cliché dialogue, I know from experience that when we teach others our knowledge expands. From 10 students, you often get 25 different doubts and minimum 5 of them would be ones you haven’t even thought yourself earlier. In that way, teaching drives a our curiosity to find out more.

I was asked to take a LaTeX training for B.Tech students as a bridge course (happening during their semester breaks. Poor kids!). The usual scenario is faculty taking class and we PG students assisting them. But, since all the faculty members were busy with their own subjects’ bridge courses and LaTeX was something like an additional skill that the students need for their next semesters for their report preparation, I was asked to take to take it with the assistance of my classmates. At first, I was asked to take a two-day session for third year IT students. But later, HOD decided that both CS and IT students should have that class, and guess what - I had to teach for four days. Weirdly, the IT class was split to two non-continuous dates - Monday and Wednesday. So, I didn’t have to take class for four consecutive days, but only three. :D

The syllabus I followed is as follows:

  • Basic LaTeX – Session I
    1. Brief introduction about LaTeX, general document structure, packages etc.
    2. Text Formatting
    3. Lists – Bullets and Numbering
  • Graphics and Formulas – Session II
    1. Working with Images
    2. Tables
    3. Basic Mathematical Formulas
  • Academic Document Generation (Reports and Papers) – Session III
    1. Sectioning and Chapters
    2. Header and Footer
    3. Table of Contents
    4. Adding Bibliography and Citations
    5. IEEETran template
  • Presentations using Beamer – Session IV

As (I, not the faculty) expected, only half of the students came (Classes on semester breaks, I was surprised when even half came!). Both the workshops - for CS and IT - were smooth without any much issues or hinderences. Students didn’t hesitate much to ask doubts or tips on how to do stuff that I didn’t teach (Unfortunately, I didn’t have time to go off-syllabus, so I directed them to Internet. :D). Analysing the students, CS students were more lively and interactive but they took some time to grasp the concept. Compared to them, even though kind of silent, IT students learned stuff fast.

By Friday, I had completed 4 days, around 22 hours of teaching and that too non-stop. I was tired each day after the class, but it was still fun to share the stuff I know. I would love to get this chance again.

IT Batch

IT Batch




CSE Batch

CSE Batch

July 25, 2016 12:00 AM

July 24, 2016

Sreenadh T C

Developing the Language Model

Finally, I can start the work towards Milestone — 2, which is completing the development of Language Model for Malayalam. Time to completely switch to Ubuntu from here on. Why?

Well, all the forums related to CMU Sphinx keep telling that they won’t monitor the reports from Windows anyways, and since all the commands and codes mentioned in the documentation is more inclined to Linux, let’s just stick to it as well. After all, when it comes to Open-Source, why should I develop using Microsoft Windows. (** Giggle **)

What is a Statistical Language Model?

Statistical language models describe more complex language, which in our case is Malayalam. They contain probabilities of the words and word combinations. Those probabilities are estimated from a sample data ( the sentence file ) and automatically have some flexibility.

This means, every combination from the vocabulary is possible, though probability of such combination might vary.

Let’s say if you create statistical language model from a list of words , which is what I did for my Major Project work, it will still allow to decode word combinations ( phrases or sentences for that matter. ) though it might not be our intent.

Overall, statistical language models are recommended for free-form input where user could say anything in a natural language and they require way less engineering effort than grammars, you just list the possible sentences using the words from the vocabulary.

Let me explain this with a traditional Malayalam example:

Suppose we have these two sentences “ ഞാനും അവനും ഭക്ഷണം കഴിച്ചു ” and “ ചേട്ടൻ ഭക്ഷണം കഴിച്ചില്ലേ ”.

If we use the statistical language model of this set of sentences, then it is possible to derive more sentences from the words( vocabulary ).

ഞാനും (1) , അവനും (1) , ഭക്ഷണം (2) , കഴിച്ചു (1) , ചേട്ടൻ (1) , കഴിച്ചില്ലേ (1)

That is, we can have sentences like “ ഞാനും കഴിച്ചു ഭക്ഷണം ” or maybe “ഭക്ഷണം കഴിച്ചില്ലേ ”, or “ അവനും കഴിച്ചില്ലേ ” and so on. It’s like the Transitive Property of Equality but in a more complex manner. Here it's related to probability of occurrence of a given word after a word. Now this is calculated using the sample data that we provide as the database.

Now, you might be wondering what the numbers inside the parenthesis mean. Those are nothing but the number of occurrences of each word in the given complete set of sentences. This is calculated by the set of C libraries provided by a toolkit that I will introduce shortly.

Update — July 18

Okay!

Let’s start building. If you remember from my previous blog post/articles, you can recollect me writing about extracting words and then transcribing those to phonetic representation. Those words are nothing but the vocabulary that I just showed.

For building a language model of such a large scale vocabulary, you will need to use specialized tools or algorithms. One such set of algorithms are provided as C Libraries by the name “CMU-Cambridge Statistical Language Modeling Toolkit” or in short CMU-CLMTK. You can head over to their official page to know more about it. I have already installed it. So we are ready to go.

So according to the documentation,

The first step is to find out the number of occurrences. (text2wfreq)

cat ml.txt | text2wfreq > ml.wfreq

Next we need the .wfreq to .vocab file without the numbers and stuff. Just the words.

cat ml.wfreq | wfreq2vocab -top 20000 > ml.vocab

Oops, there are some issues with the generated vocab file regarding repetitions and additional words here and there which are not required. This might have happened while I was filtering the sentences file but forgot to update or skipped updating the transcription file. Some delay in further process. It's already late night! I need to sleep!

Update — July 19

‘Meld’. Thank you StackExchange

With this guy, its easy to compare everything and make changes simultaneously. It should be done by today!

.

.

Done!

Okay, now that the issue have been handled, we are getting somewhere. It should be pretty much straight forward now.

Next we need find list of every id n-gram which occurred in the text, along with its number of occurrences. i.e. Generate a binary id 3-gram of the training text ( ml.txt ), based on this vocabulary ( ml.vocab ).

By default, the id n-gram file is written out as binary file, unless the -write_ascii switch is used in the command.

-temp ./ switch can be used if youwant to run the command without root permission and use the current working directory as the temp folder. Or you can just run it as root, without any use, which by default will use /usr/tmp as temp folder.

cat ml.txt | text2idngram -vocab ml.vocab -temp ./ > ml.idngram

Finally, we can generate the Language Model. This can either be an ARPA model or a Binary.

idngram2lm -idngram ml.idngram -vocab ml.vocab -binary ml.bin

or

idngram2lm -idngram ml.idngram -vocab ml.vocab -arpa ml.arpa

Even though ARPA is available, using the binary format of the language model is recommended for faster operations.

Here is the basic work-flow.

as provided by the Toolkit Documentation.

That’s it. The Language Model is complete. I can now go ahead into next step, that is building and training the Acoustic Model.

by Sreenadh T C at July 24, 2016 06:13 AM

July 21, 2016

Balasankar C

Kerala State IT Policy - A Stakeholder Consultation

Heya folks,

Last Saturday, that is 16th July, I attendeda a meeting regarding the upcoming Kerala State IT Policy. It was a stakeholder consultation organized by DAKF, Software Freedom Law Centre and Ernakulam Public Library Infopark branch. The program was presided by Prasanth Sugathan of SFLC (I had met him during Swatanthra, when I helped Praveen in the Privacy track) and was inaugurated by M. P Sukumaran Nair, advisor to the Minister of Industries. The agenda of the meeting was to discuss about the suggestions that needs to be submitted to the Government before they draft the official IT policy, that will be in effect for the next few years. I attended the meeting representing Swathanthra Malayalam Computing. Even though the meeting had a small audience, some of the key topics were brought into the mix.

Professor Jyothi John, retired principal of Model Engg. College, discussed about MOOCs to improve the education standard of the State. He also talked about improving the industry-academia-research relationship that is in a pathetic state as of now. I was asked to talk a few words. But, since SMC hadn’t taken any official stand or points for the meeting, I actually talked about my views about the issue. Obviously, my topics were more focused on Language Computing, Digital empowerment of the language and as well as how FOSS should be the key stone of the IT policy. I also mentioned about the E-Waste problem that Anivar had discussed the other day on the Whatsapp group.

Me Talking

Me Talking | PC: Sivahari

Mr. Joseph Thomas, the president of FSMI also talked on the importance of FOSS in IT policy (Kiran Thomas had some pretty strong disagreements with it. :D ). Following that, Babu Dominic from BSNL talked about their success stories with FOSS and how the project was scraped by government. There were some brilliant insights from Satheesh, who is a Social Entrepreneur now and once ran an IT-based company.

Following that, the meeting took the form of a round table discussion where interesting points regarding E-Waste and the money-saving nature of FOSS (Microsoft has been targetting Institutions for pirated copies, not home users) were raised by Mr. Bijumon, Asst Professor of Model Engg College. Mr. Jayasreekumar, who is a journalist talked about the important issue of the downtrodden people, or the people in the lower socio-economic belt were not part of the discussion and the image of digital divide that carves. We have to seriously increase diversity of participants in these meetings, as a large part of the population has no representation in them. Such meetings will be only fruitful, if the sidelined communities who also should benefit from this policy are brought together to participate in them.

The general theme of the meeting was pointing towards how the IT policy should focus more on the internal market, and how it should be helpful in entrepreneurs in competing with foreign competitors, atleast in the domestic market.

News Coverage in Deshabhimani

News Coverage | PC: Deshabhimani

More and more meetings of this nature are a must, if the state is to advance in the domain of IT.

July 21, 2016 12:00 AM

July 20, 2016

Anwar N

work progress in browser-addon

Hi,
 About two months passed. We do many testing on online braille-input tool. And some widgets rearranged for user comforts. In the recent weeks we made a good progress in both Firefox and Chrome browser addons. But still we suffer from a grate problem with  these addons, The plugins are not working in google chat and Facebook chat entry's.  We are seeking the solution...

by Anwar N (noreply@blogger.com) at July 20, 2016 08:15 PM

July 19, 2016

Balasankar C

GSoC Update: Week #7 and #8

Heya,

Last two weeks were seeing less coding and more polishing. I was fixing the LibIndic modules to utilize the concept of namespace packages (PEP 420) to obtain the libindic.module structure. In the stemmer module, I introduced the namespace package concept and it worked well. I also made the inflector a part of stemmer itself. Since inflector's functionality was heavily dependent on the output of the stemmer, it made more sense to make inflector a part of stemmer itself, rather than an individual package. Also, I made the inflector language-agnostic so that it will accept a language parameters as input during initialization and select the appropriate rules file.

In spellchecker also, I implemented the namespace concept and removed the bundled packages of stemmer and inflector. Other modifications were needed to make the tests run with this namespace concept, fixing coverage to pick the change etc. In the coding side, I added weights to the three metrics so as to generate suggestions more efficiently. I am thinking about formulating an algorithm to make comparison of suggestions and root words more efficient. Also, I may try handling spelling mistakes in the suffixes.

This week, I met with Hrishi and discussed about the project. He is yet to go through the algorithm and comment on that. However he made a suggestion to split out the languages to each file and make init.py more clean (just importing these split language files). He was ok with the work so far, as he tried out the web version of the stemmer.

[caption id="attachment_852" align="aligncenter" width="800"]hrishi_testing_spellchecker Hrishi testing out the spellchecker[/caption]

July 19, 2016 12:47 AM

July 16, 2016

malayaleecoder

GSoC Progress — Week 6 & 7

Why doesn’t it work!!!!!

Alright, for the past two weeks, me and my mentor have been trying a lot to call the varnam library in Java. First we went on trying to load the prebuilt library onto Android Studio and then use the methods in Java, which didn’t work :(

Now we are on a different route of compiling varnam during runtime. For this we are following the cmake example given here. Another thing to note that is, cmake requires canary Android Studio which can be downloaded here. It all started off well when it was seen that OSX has a problem running that.

Now I am getting it all setup on Linux as well as Windows( just in case :P ) Sorry in not writing any technical details, will make it up in the next week.

//Rust
fn main() {
println!('That's all for now, see you next time!');
}

by Vishnu H Nair at July 16, 2016 06:49 AM

Sreenadh T C

Mentioning the huge contributions

“ In open source, we feel strongly that to really do something well, you have to get a lot of people involved. — Linus Torvalds ”

I have always loved the idea of Open Source and have been fortunate enough to be participating in one of the world’s best platform for a student to develop, grow, and learn. Google Summer of Code 2016 have gone past it’s mid-term evaluation, and so have I. The last couple of weeks have been in a slow pace compared to the weeks in June.

contribution graph — May-June-July

This is simply because, I was ahead of my schedule while in the Mid-term evaluation period , also I didn’t want to rush things up and screw it up. But, I thought this is the right time to mention the contributions that have been taking place towards this Open Source Project.

Gathering recordings or speech data for training would mean that a lot of people have to individually record their part, and then send it to me. Now this might seem simple enough to some of you out there, but believe me, recording 250 lines or sentences in Malayalam with all its care is not going to be that interesting.

Nonetheless, pull requests have been piling up on my Repository since the early days of the project. The contribution has been really awesome.

What more can you ask for when your little brother who have absolutely no idea about what the heck am doing, but decides to record 250 sentences in his voice so that I could be successful in completing the project! (** aww… you little prankster… **)

And he did all this without making much of a mistake or even complaining about the steps I instructed him to follow. He was so careful that he decided to save after I confirm each and every sentence as he records them. (** giggles **). For those who are interested in knowing what he contributed, take a look at this commit and this. Oh and by the way, he is just 11 years old :) .

To not mention other friends along with this, would be unfair.

So here is a big shout out to all 18 other guys and gals without whom this would not have reached this far.

I know this blog post was not much about the project when looking in one aspect but, when you look it in another point of view, this is one of the most important part of my GSoC work.

With the final evaluation, coming up in 4.5 weeks or so, it is time to start wrapping up my work and put up a final submission in a manner that someone with same enthusiasm or even better can take up this and continue to work on it to better it.

I guess that’s it for this week’s update. More to follow as I near the completion of this awesome experience.

puts "until then ciao!"

by Sreenadh T C at July 16, 2016 06:14 AM

July 11, 2016

Arushi Dogra

Update on work

The week started with continuing the task for detection of supported locales. I was facing some problems initially. I was trying to first change the contents of a static file during runtime which I later realised couldn’t be done. So as directed by mentor I changed the approach and decided to prompt the user at the setup time about which languages might not be supported by the phone.
It looks something like this:

Screenshot_2016-07-12-00-24-21

Unfortunately my system crashed and the later part of my time was given to formatting the laptop,taking backup, installing the OS and re-setup of the project. Then I went home for my parents wedding anniversary for 3 days.

My next task : Improving the setup wizard . Since the user might not be interested in all the languages , so instead of showing all the layouts at once , we are planning to first ask the user to chose the language and then the corresponding layout in it. I have to discuss more with Jishnu regarding this task.


by arushidogra at July 11, 2016 07:16 PM

July 08, 2016

Hrishi

ട്രോളുകളുടേ ജനാധിപത്യം – മനോരമ ന്യൂസിലെ നിയന്ത്രണരേഖയിൽ

മനോരമ ന്യൂസിലെ നിയന്ത്രണരേഖയിൽ ട്രോളുകളുടേ ജനാധിപത്യം എന്ന വിഷയത്തിൽ നടന്ന ചർച്ചയിൽ ഐ.സി.യു. വിനെ പ്രതിനിധീകരിച്ച് പങ്കെടുത്തു. വി.ടി ബൽറാം, ഉഴവൂർ വിജയൻ, വി.വി രാജേഷ്, ആർദ്ര നമ്പ്യാർ , സുഭാഷ് നായർ എന്നിവരും പങ്കെടുത്തിരുന്നു

by Hrishi at July 08, 2016 07:46 PM

July 06, 2016

Anwar N

Braille-Input-Tool : The final touch

Hi,

            With this two weeks we have done many testing with users and done many additions according to their needs. The first one  is Key reassigning. as you know there are many keyboard variants also user like to set there own keys instead of using f,d,s,j,k and l. But this make the necessity of saving user preferences. So we done this using jstorage. it's working fine
https://github.com/anwar3746/braille-input/commit/9e8bb0b5ef9a54d61dfa5081d0966ec9d10f01a0


Key reassigning can be done by clicking "Configure Keys" button which will popup many entry's where user can remap his keys. Restore option is also provided there.
https://github.com/anwar3746/braille-input/commit/3d3469ab8a68711ba0189d61f02c7231297ded3a


New and Save are the basic things that should be provided by a online editor
https://github.com/anwar3746/braille-input/commit/074829d2f4be81b7fa984931a90a108e3bac03ab

Changing font color, font size and background color are very impotent for partially impaired blind people. For keeping the page accessible we choose combobox containing major color list instead of providing graphical color picker
https://github.com/anwar3746/braille-input/commit/f1f6d3de308386d08977f40bc417c4c1ac0b3eb9

Various bugfixes
https://github.com/anwar3746/braille-input/commit/9b8cbc8d54051e9cb330514aacc6d8e6066cf7c6
https://github.com/anwar3746/braille-input/commit/d8127ceb3dc567bfb1778a437d29c2cfe989b24f
https://github.com/anwar3746/braille-input/commit/d3a01c17db64d4fabbad29b18d605992b633270f
https://github.com/anwar3746/braille-input/commit/f34104bfb55c3e4e7735a23016ee913311444702

Braille-Input-Tool : http://anwar3746.github.io/braille-input/
See all commits : https://github.com/anwar3746/braille-input/commits/gh-pages


by Anwar N (noreply@blogger.com) at July 06, 2016 09:09 PM

July 05, 2016

Balasankar C

GSoC Update: Week #5 and #6

Heya,

Last two weeks were spent mostly in getting basic spellchecker module to work. In the first week, I tried to polish the stemmer module by organizing tags for different inflections in an unambiguous way. These tags were to be used in spellchecker module to recreate the inflected forms of the suggestions. For this purpose, an inflector module was added. It takes the output of stemmer module and reverses its operations. Apart from that, I spent time in testing out the stemmer module and made many tiny modifications like converting everything to a sinlge encoding, using Unicode always, and above all changed the library name to an unambiguous one - libindic-stemmer (The old name was stemmer which was way too general).

In the second week, I forked out the spellchecker module, convert the directory structure to match the one I've been using for other modules and added basic building-testing-integration setup with pbr-testtools-Travis combination. Also, I implemented the basic spell checking and suggestion generation system. Like stemmer, marisa_trie was used to store the corpus. Three metrics were used to generate suggestions - Soundex similarity, Levenshtein Distance and Jaccard's Index. With that, I got my MVP (Minimum Viable Product) up and running.

So, as of now, spell checking and suggestion generation works. But, it needs more tweaking to increase efficiency. Also, I need to formulate another comparison algorithm, one tailored for Indic languages and spell checking.

On a side note, I also touched indicngram module, ported it to support Python3 and reformatted it to match the proposed directory that I have been using for other modules. A PR has been created and am waiting for someone to accept it.

July 05, 2016 01:57 PM

June 26, 2016

malayaleecoder

GSoC Progress — Week 4 & 5

Ooh boy, half way through GSoC and lot to be done. Finally we decided to do the entire project in Android Studio so that the later integration with Indic Keyboard would be easier. As said in the last post, I was in a state of completing the wrappers of varnam_init() and rest of the functions when a queue of challenges popped up.

First of all since we are moving out of the regular “PC” kind of architecture, storing the scheme files in a specific directory is still a problem. First we decided to store it in the internal storage of the mobile which then eventually caused a lot of problems because varnam_set_symbols_dir() required a string path to the directory, which was not possible. Then we later decided to store it in the external storage of the device. This decision is temporary because once the user removes the external SD card, Varnam keyboard would not be functional :P

Then came the problem of build architectures. Since my work machine is a Mac, all the built libraries are in the form of .dylib files. Android accepts only .so files as the jniLibs. After generating the binary in my dual boot Ubuntu, it turned out that Android accepts only 32 - bit architecture libraries. Then using VirtualBox I finally managed to get the desired files. Now out of nowhere the thrown error is,

"Cannot find: libpthread.so.0"

I have currently written wrappers for most of the required methods, but have to resolve these errors to get the testing going smoothly. I will upload a list of references I have gone through(there a tons of em) in the next post so that anyone working in this topic may find it useful.

//Scala
object Bye extends Application {
println('That's all for now, see you next time!')
}

by Vishnu H Nair at June 26, 2016 09:40 PM

Sreenadh T C

Hours of data piling up!

drifting along, calm and composed! *wink*

Howdy everyone, well its exactly mid way to Google Summer of Code 2016, and everything have been going as per the schedule and plan, as I type this looking at the matte screen of the Asus Zenbook that just arrived. No more of criticizing of the Electricity and rain which I have been doing in my previous posts ( **giggle** ) but the internet connectivity still haunts me.

The week started off with spending a day setting up the new Zenbook with dual boot, installing dependencies on Ubuntu (sudo apt-get install blah-blah ), setting up git and repo, and on the other hand hoping that Windows will finish updating… … …one day… …! Ultimately, I decided to turn every automatic things off ( **duh** ) so that I can squeeze some speed out of my Broadband connection ( -___- ).

Anyways, the completion of transcribing the dictionary to its phonetic representation means I can now concentrate on collecting the training voices from all the contributors. Almost 12 of the speakers have completed their quota of sentences and around 8 speakers are remaining. Once this is completed, I can actually begin the reorganizing of database and then start the training using that database.

In the meantime, there other files to setup. Like, the file containing the ‘phones’ alone ( ml.PHONE ), the file that contains the relative path to the audio files in the wav directory ( ml.FILEIDS ), “wav/speaker1/file_1.wav” , the filler file that contains phonetic representation of sounds and disturbances for a more accurate recognition ( ml.FILLER ).

Talking about making the ml.FILEIDS file, mapping 4993 sentences from 15+ folders with each one having exactly 250 wav files is not going to be easy. But then there is a catch, notepad++ is there to rescue. Column edit mode ( Alt + Shift + up/down ) and Column replace with increment decimal options are available which will save time writing down each file name.

Note: the column edit will only work as long as the character we want to replace is in same column. Now since, the file id is of the form speaker/file_# , I can easily select the # column and replace it with decimal increment option — 1,2,3,4…

So, that’s how the week have panned out and hoping to continue this good run of form ( * That’s the football side of me typing. Euro 2016 commentary style * ).

puts “until then ciao!”

by Sreenadh T C at June 26, 2016 09:59 AM

Srihari,

Srihari,

Are you referring to the bash command used in the experiment I described or about the ruby scripts from my previous posts. I used the scripts to extract the sentences and words from subtitle file. The same script proved useful in many related situations during the course.

I didn’t have to sit for long time to figure out the script and was not sure if np++ had option for extraction :)

by Sreenadh T C at June 26, 2016 05:56 AM

June 25, 2016

Arushi Dogra

Weekly Blog

I am given the task to detect whether a language is supported by the keyboard or not. In my phone Punjabi is not supported so I did all the testing with that. Whenever a language is not supported it is displayed as blank so that gave me an idea on how I will work on this issue. So I created the bitmap for all the characters of the language and compared it with an empty bitmap. So If the language was not supported it had empty bitmap and I declared it as not supported.

I have to improve on : Currently it is checking every time when the keyboard is opening. So I will do it such that it checks for all languages during the setup wizard and stores the info.

My task for next week is checking in setup wizard for all languages and in the list displaying the languages which cannot be supported as not supportable so that the user can know.


by arushidogra at June 25, 2016 01:35 PM

June 23, 2016

Balasankar C

GSoC Update: Week #3 and #4

Heya,

[Sorry for the delay in the post]

I spent the last two weeks mainly testing out the stemmer module and the defined rules. During that I found out there are many issues for a rule based model because different types of inflections to different parts of speech can yield same inflected form. This can be solved only by machine learning algorithm that incorporates a morphological analyzer and is hence out of scope of my proposal. So I decided to move forward with the stemmer.

I tried to incorporate handling of inflections of verb - like tense change - using rules and was able to do a subset of them. Rest of the forms need more careful analysis and I've decided to get the system working first and then optimize it.

I've also decided to tag the rules so that a history of stemming can be preserved. The stemmer will now generate the stem as well as the tags of rules applied. This metadata can be useful to handle the problem of same letter being inflected to different forms that I faced while developing VibhakthiGenerator.

I spent some time in cleaning up the code more and setting up some local testing setup like a CLI and Web interface.

The PR was accepted by Vasudev and the changes are currently a part of the indicstemmer codebase.

BTW, it is time for the Midterm evaluations of GSoC 2016, where the mentors evaluate the progress of the students and give a pass/fail grade to them. Also, the students get to evaluate the mentors, communication with them and their inputs. I have already completed this and am waiting for my mentor to finish it. Hopefully, everything will go well.

June 23, 2016 03:04 PM

June 21, 2016

Anwar N

Bug fixes on Online Braille-Input-Tool

Hi,
  The first month is over, the webpage is almost finished we gone through many bugs in the last week. Sathyaseelan mash and Balaram G really helped us to find out the bugs. One of the crucial and not easy to detect was the bug with map initialization.  We take lot of time to find and fix it.  Another one was with the insertion of text at the middle. following are the names of other commits

CapsLock(G) and Beginning-Middle switch(Alt)
Simple mode checkbox
Word and letter deletion enabled
Abbreviation enabled

https://github.com/anwar3746/braille-input/commits/gh-pages


by Anwar N (noreply@blogger.com) at June 21, 2016 05:45 AM

June 18, 2016

Sreenadh T C

Milestone 1 : Conquered

We look far ahead and calls it ‘future’ but fails to realize the coming step is closest milestone for future.

Ok so the week have been very interesting. As usual I started off doing the same thing that I have been doing for the past 3 weeks which is transcribing the Malayalam words to phonetic representation. I kept saying that this thing is getting boring day by day for myself. With the mid term evaluation starting from Monday of next week I had to complete the phonetic transcription by Saturday or Sunday. This seems very unlikely given the pace that I am continuing with.

To keep me busy and not bored of Google summer of code ( even though I wanted this ) I thought of learning a new language which would be useful for the coming future. The Rust language. It seemed to take my mind off for a little time but it was not of much as help as I thought.

In the meantime recording of sound or speech was going on among my friends and there has been some updates regarding that. I have completed recording 115 sentences in the midst of the transcription dilemma.

June 16 — update

Ok I think I have an idea. I have to wait and see if this will work or not within a day. Well today something really helpful and timely stuff happened. I have been working on an Android app to run the model. Unfortunately it was always crashing until today when I fixed it finally. The bug was really, really really small. In fact it was embarrassing when I found out what the bug was. I had to use capitalised letter (** duh, are you serious! **) while specifying a search name tag. Now how come I missed that! The model that I developed for my major project was running absolutely fine in the app. In fact it so happens that it have better accuracy than it did on my PC.

feeling joyful… yayy!

But this was not the idea that I was talking about at the beginning of the paragraph. (** giggles **)

Ok I was not joking.

Well the idea would have been more beneficial if I had this in mind at the starting of phonetic transcription but like anybody else' my brain won’t work when I need it.

Anyways the idea was to use find and replace option of Notepad++ to batch edit all the words but there is a catch that if I am to edit the words then I must save about 3 to 4 days. Keeping in mind that if this experiment goes in vain then I am going to need more days than I actually need if I was to follow the steps that I used up until now.

I decided to try the experiment anyways because if I succeed in it then I am going to save more days than I am going to lose if I do not. And probably couple of days before my schedule.

I think I make some sense!?!

So here is a brief of what I actually typed in my so called experiment.

  • The b.txt and a.txt files initially contain raw malayalam words that needs to be represented phonetically.
  • In the alphabetic order, I find each character or sound ( eg. കാലം, കാപ്പി, കാറ്റ് after finding കാ and replacing it with KA will look as KA ലം, KA പ്പി, KA റ്റ് and so on) and replace all with the phonetic representation inside the b.txt file. This will save a lot of time rather editing each word line by line which I was doing till now.
  • Once all the replacing is done, file will only have English characters (phones) ( eg. KA L1AM, KA PPI, KA T ). But this is not what the file should be like. The file should be like കാലം KA L1AM, കാപ്പി KA PPI, കാറ്റ് KA T and so on.
  • To make it that way, I simply have to join the file with ML words ( a.txt ) and this new file with just the phonetic representation ( b.txt ). This is where the paste command of bash was used and find and replace again to get rid of newline. ( thanks to Aboobacker MK, for the quick reply with the bash command )
paste -d “ ” a.txt b.txt | tee out.txt 
Milestone Conquered
puts “until then ciao!”

by Sreenadh T C at June 18, 2016 06:59 AM

June 17, 2016

Arushi Dogra

Working with the layouts!

I started with making the designs of the layouts. The task was to make Santali Olchiki and Soni layouts for the keyboard. I looked at the code of the other layouts to get a basic understanding of how they were working.

Soni Layout
It took some time to understand how the transliteration codes were working.I did changes in the ime submodule for the layout. I messed up with the locale names and fixed that later. The changes were merged! Then I updated  the submodule on the Indic keyboard branch .

Santali Olchiki Layout

Previously I made the inscript layout of Santali Olchiki but after discussion with the mentor, it was decided to work on the phonetic layout as it can fit in smaller layout and thus easier to type too. I made the design of the keyboard and wrote the code for it and tested on the device. It is coming out fine too.

After that I explored various keyboard apps to see their setup wizards.

My task for the next week is to detect whether a language is supported by the system or not. I am planning to do it by checking if a character is typed it gives empty results or not. I will look for other ways too. I will update about the progress in the next blog.

 


by arushidogra at June 17, 2016 12:18 PM

June 14, 2016

Sreenadh T C

Hi shaun,

Hi shaun,

Am working with CMU Sphinx toolkit that has recognizer libraries written in C. I am focussing on adding Language model and Acoustic model for Malayalam Language. CMUSphinx already have support for famous languages with pretty good accuracy results.

by Sreenadh T C at June 14, 2016 07:02 PM

June 13, 2016

malayaleecoder

GSoC Progress — Week 2 & 3

First of all, apologies for skipping the last week’s post. Last two weeks were somewhat a bit rocky :P

Discussion with my mentor suggested me to start of with the varnam_init method. Following with the initial trial build sent in a queue of issues to be resolved. Following are the errors in order

Finally after a few resolution of machine dependency and reinstallation I got it finally running :) I am now finishing varnam_init and will move on to the whole libvarnam in the coming week.

//R
cat('That's all for now, see you next time!')

by Vishnu H Nair at June 13, 2016 06:57 PM

June 11, 2016

Anwar N

Basic Online Braille-Input Tool

Hi,

         As three weeks passed, After developing basic Chrome and Firefox extensions we moved to development of webpage braille-input where one can type in six key way. For achieving this we have gone through a lot of things such as Ajax, jQuery, JSON, Apache web server, etc..  The most major referred links are given at the end of this post. even my mentor also new to web based developments he always suggest me to keep it more ideal as possible.  Even the concept of Map switching bitween the begining, middle and contraction list was bit difficult to understand later I realize that's the way it should be. Finaly when we requesting for a space to host the web page one of another mentor from my organization Akshay S Dinesh gave us a hint about facility in Github itself to host. So we done it with a simple effort even we faced jQery download problem and Contraction file listing.

Source Code : https://github.com/anwar3746/braille-input

Now one can try it using the following link
http://anwar3746.github.io/braille-input/

Now we have to implement Abbreviations, Simple-Mode, Open, New , Save, Option to change font, font size, Background and Foreground Color etc.. as done in Sharada-Braille-Writer.




Refered Links :

http://viralpatel.net/blogs/dynamic-combobox-listbox-drop-down-using-javascript/
http://stackoverflow.com/questions/6116474/how-to-find-if-an-array-contains-a-specific-string-in-javascript-jquery
http://stackoverflow.com/questions/4329092/multi-dimensional-associative-arrays-in-javascript
http://stackoverflow.com/questions/7196212/how-to-create-dictionary-and-add-key-value-pairs-dynamically-in-javascript
http://www.w3schools.com/jquery/jquery_events.asp
http://stackoverflow.com/questions/133310/how-can-i-get-jquery-to-perform-a-synchronous-rather-than-asynchronous-ajax-re
http://stackoverflow.com/questions/351409/appending-to-array
http://stackoverflow.com/questions/952924/javascript-chop-slice-trim-off-last-character-in-string

by Anwar N (noreply@blogger.com) at June 11, 2016 12:11 PM

How would be the browser extensions


Hi All,

       Yes the community bonding period is over and coding period started. Me and my mentor really happy to announce that with this community bonding period we just made the basic chrome and firefox extensions that can show how it's going to be!! Once again thanks to my mentor and varnam project. The code is hosted on github with the name braille-browser-addons.

Repository URL : https://github.com/anwar3746/braille-browser-addons

To test it in firefox do the following steps
1 - git clone https://github.com/anwar3746/braille-browser-addons.git
2 - cd braille-browser-addons/firefox/
3 - jpm run -b /usr/bin/firefox
4 - Go to google.com and right click on text entry,from the context menu select Enable from braille sub menu.
5 - Try typing l(fds), k(fs), a(f), b(fd), c(fj)

To test it in chrome
1 - git clone https://github.com/anwar3746/braille-browser-addons.git
2 - Open chrome browser
3 - Go to settings and select extensions
4 - Check Developer mode
5 - Click Load unpacked extensions
6 - Choose chrome folder from braille-browser-addons
7 - Go to google.com and right click on text entry,from the context menu select Enable from braille sub menu.
8 - Try typing l(fds), k(fs), a(f), b(fd), c(fj)

References 
  1. https://developer.mozilla.org/en-US/Add-ons/SDK/Tutorials/Getting_Started_(jpm)
  2. https://developer.mozilla.org/en-US/Add-ons/SDK/Tools/jpm
  3. https://developer.mozilla.org/en-US/Add-ons/SDK/Guides/Content_Scripts 
  4. https://developer.mozilla.org/en-US/Add-ons/SDK/Guides/Content_Scripts/port 
  5. https://developer.chrome.com/extensions/getstarted 
  6. ibus-sharada-braille.blogspot.com/
  7. https://developer.mozilla.org/en-US/docs/Web/API/EventTarget/addEventListener

An article that I read from OpenSourceForU May 2016
Courtesy  : CHMK Library, University Of Calicut







Thank You,
Anwar

by Anwar N (noreply@blogger.com) at June 11, 2016 12:01 PM

June 10, 2016

Sreenadh T C

Slow as a Snail

“by persistence the snail reached the ark”

That’s the pace am in. Jeez, typing these transcription is taking too long than I expected. Well, am talking about transcribing the Malayalam word to the previously mentioned Phonetic representation. See for yourself.

.
.
.
അവസ്ഥ A V S3T2H
അവിചാരിത A VI CA
അവിടം A VI T1M
അവിടുത്തെ A VI T1U TT2AE1
അവിടുന്ന് A VI T1U NN2
അവിടെത്തന്നെയാണ് A VI T1AE1 TT2 NN2AE1 YA N1U1
അവിടെത്തെ A VI T1AE1 TT2AE1
അവിടെന്ന് A VI T1AE1 NN2
അവിടെയല്ലാ A VI T1AE1 Y LL1AA
അവിടെയാണ് A VI T1AE1 YA N1U1
അവിടെയായിരിക്കും A VI T1AE1 YA YI R2I KKUM
അവിടെയില്ലാത്തത് A VI T1AE1 YI LL1AA TT2 T22
അവിടെയുണ്ടെന്ന് A VI T1AE1 YU N1T1AE1 NN2
അവിടെയുണ്ട് A VI T1AE1 YU N1T1
അവിടെയുമല്ലാ A VI T1AE1 YU M LL1AA
.
.
.

As I transcribe the One Thousand Two Hundred and Forty Third word, I remember I haven’t shared the week with you guys. Well to tell the truth, this week was pretty slow and it's been raining. You know what’s coming. (** sighs **)

Yup, guessed it right. Again, erratic power failure (** aargghhh **).

After couple of days!

Well the reason I am writing this in a slow pace is because the week or the past 2 weeks have been very slow but its gaining its pace. Slow in the sense that the work am doing is kind of repetitive.

Each day, I sit in front of my PC, open up the Phonetic dictionary file, and start writing more phoneme transcriptions.
Blogging this over and over is going to be boring, but some new things are shaping up.

For GSOC to be successful like every other guys on forums and chats have said to me, the main thing to focus is on “how not to get distracted from coding”.

With the kind of work am doing, this is pretty simple. I mean to get distracted is pretty simple. ( ** irony duh ** ). Especially when there is rain happening outside your window ( ** aww…the feel of rain ** ) and you can’t do anything except muster any abuse at the state of power supply during the rainy time.

June 10 — update continues

Jokes aside, that’s how the weeks have passed away after the last update. But, as there is a mid term evaluation just around the corner, I know that I have to be focused (** I am! **), plus the organization and my mentor, have been kind enough to notify the same through e-mail (** kudos **).

And so, the fire picks up the rate and spreads. (** duh **), well it’s time to take some contributions towards collecting audio data for training. Now there are several ways to collect data from the Internet but to have a control over what we are developing, the simplest and feasible way is to record whatever we need in the correct format. In this way the recorded sounds can be used in the proper format that the tool requires it to be in. Problem with downloading pre recorded data from the Internet is that the audio format might be different from what we actually need. Even if the format is correct there is no guarantee that it might work for us.

In light of this conclusion I decided to take contributions from friends and the community so that the project itself has the meaning of being open source. (** yaay **)

Since the number of sentences in the collected source which I mentioned in my previous blog is about 5000 sentences, having around 20 different speakers each of whom will record about 250 will produce ample amount of speech data for the tool to be trained. Some of my friends ( Akhil, Sijin, Jithin) have already started contributing towards this through GitHub profile.

If you wish to contribute, head over to the repository and fork it. Then ping me. (** wink **)

puts "until then ciao!!"

by Sreenadh T C at June 10, 2016 12:48 PM

June 06, 2016

Arushi Dogra

Weekly Update

This week started with the successful gradle build of the project on my system.The build was successful with  version of gradle : 2.13, SDK version : 22 , build tools version : 22.0.1  . After that I deployed the app on the emulator and on my phone. I am currently working on making Santali Olchiki and Soni layouts for the keyboard.


by arushidogra at June 06, 2016 11:15 AM

June 05, 2016

Balasankar C

GSoC Update: Week #1 and #2

Heya,

Last two weeks of GSoC mostly involved working on Stemmer module of the proposal. I had discussions with Hrishi and Vasudev regarding the directory structure that I will be using for the stummer. I proposed .. format because it gives more visibility to libindic. Since both of them agreed (I will be converting all the existing modules to this format, after GSoC), I first ported the existing indicstemmer module to this directory structure. With directions and suggestions from Vasudev and Hrishi, we (Me and Jerin, who is working on Sandhi Splitter set up pbr as packaging tool, testtools as testing framework, Travis CI for continuous integration and tox for local automation and testing. The development environment may be summarized as follows

Work on Stemmer

There are several problems with the existing stememr implementation. One is the high count of false positives. This is because, in Malayalam there exists root words which satisfy the structure of an inflected word. An example is ആപത്ത്, which can be considered similar to എറണാകുളത്ത്. The former is a root word whereas the latter is an inflected word. So, based on the stemmer rule ത്ത്=ം, that is used to handle എറണാകുളത്ത്, ആപത്ത് will also get stemmed to ആപം. This, hence, is a false positive. What we need is a root word corpus, that contains the possible root words in Malayalam (well, we need some crowd sourcing to update it) and checking the input word against it so as to detect if it is a root word.

Another problem with the existing stemmer is that it is unable to handle multiple levels of inflection. An example is അതിലേക്ക് (into that) : അതിൽ + ഏക്ക് : അത് (that) + ഇൽ (in) + ഏക്ക് (to). We need to implement a multiple suffix stripping algorithm that will handle it. I wrote an iterative suffix stripping algorithm, that continues suffix stripping and transformation until a root word is encountered or a mis-hit on the rules occurs.

Since linear list obviously is the least optimal solution for storing a large dataset, and tries are good for storing textual data, I decided to go with a Trie for storing the root word corpus. Tailoring a data structure that will suit my need is one of the last tasks of my GSoC proposal. So, I used an existing trie implementation - marisa trie that is available as a Python library.

I have added tests for all the 7 vibhakthis, and form of a word (അവനും, രാമുവും), some of the plural forms (കാളകൾ) etc and the coverage as of now is 100%.

As always, the code and update is available at GitHub repo.

June 05, 2016 04:04 AM

May 29, 2016

malayaleecoder

GSoC Progress — Week 1

A week into GSoC has come to an end and it’s been a wonderful learning experience. My goal for the next couple of weeks would be to compile libvarnam in Android. So I decided to play around with the working and the flow of NDK to use native code in the Java program.

Android NDK code flow

The above image roughly explains the workflow of the NDK in Android. This tutorial explains very well as to how one can get started with NDK. The NDK knowledge would come very handy in the fututre for the progress on this project. Another excellent resource is the set of videos by Aleksander Gargenta which can be found here,

https://medium.com/media/e980ad26c890700ed0282c9f4c425b5d/href

I did follow the entire playlist, and found it extremely useful. He explains each and every detail of the process and I would highly suggest it for people looking to get started with NDK. So the future plan is to implement a very basic application which calls the libvarnam module inside the Java application and then hook the skeleton of the program to the Indic Keyboard app.

//Java
System.out.println("That's all for now, see you next time!");

by Vishnu H Nair at May 29, 2016 05:50 PM

May 27, 2016

Arushi Dogra

Update on work

In the initial weeks of Community Bonding, I learned Java and Android and made a sample app on Android.

Then I started working on building Indic Keyboard on my system . First problem that I faced was of gradle version. Then maven was not working on proxy net which extended the building process. I am new to android so  many errors came, and it took some time to resolve them. Currently stuck on exception on processDebugResources . My work for the week includes Santali and Soni Keyboard layouts. Building Indic Keybaord on the system is taking longer than I expected.


by arushidogra at May 27, 2016 03:50 PM