ICFK is organized by KaChaTaThaPa foundation headed by master calligrapher Narayana Bhattathiri. The event usually takes place on 2–5 October in Kochi. Varying talks, workshops, demonstration sessions, exhibitions, and above all meeting and learning from exemplary calligraphers is the best part of the event. The venue always bursts with beauty, energy, and fun; where everyone is approachable.
Reconnected with old friends and made new friends. Ashok Parab was traveling pan-India and documenting scripts, that lead to teaching scripts — including Malayalam — as well. Abhishek Vardhan is doing research on NÄ�garÄ« script. Syam is doing research on Malayalam calligraphy. They promised to share their findings and public/open resources, which would be very interesting to look at. Vinoth Kumar, Michel D’Anastasio, Nikheel Aphale, Muqtar Ahammed, and Shipra Rohtagi gave me souvenirs — thank you! I had chances for interesting long chats with Uday Kumar (who asked me about Sayahna Foundation after the t-shirt I wore), Achyut Palav, Sarang Kulkarni, Brody Neuenschwander, and also Shyam Patel of Kochi Biennale Foundation.
On many occasions delegates approached and asked me about font development process, complex text shaping and related topics. It was also too tempting to not buy fountain pens and Bhattathiri’s merchandise on sale, as gift to friends. The dinner with the ICFK team at Boulangerie Art Cafe was delicious. TM Krishna’s carnatic music concert on Saturday evening was a heavenly experience — Krishna Seth sitting next to me was spontaneously drawing on the notebook for the entire duration of the concert.
For the last edition, I presented a talk about font development, font engineering, complex text shaping, and such back-end tasks that designers generally find difficult. This year, I talked about the ‘Fundamentals of Typography’. I hope the talk succeeded to some extent in making everyone unhappy when they look at a badly typeset page .
The slides for the presentation are available here.
The 46th annual TeX Users Group conference (TUG2025) took place in Kerala during July 18–20, 2025. I’ve attended and presented at a few of the past TUG conferences; but this time it is different as I was to present a paper and help to organize the conference. This is a personal, incomplete, (and potentially hazy around the edges) reflection of my experience organizing the event, which had participants from many parts of the Europe, the US and India.
Preparations
The Indian TeX Users Group, lead by CVR, have conducted TUG conferences in 2002 and 2011. We, a group of about 18 volunteers, lead by him convened as soon as the conference plan was announced in September 2024, and started creating todo-lists, schedules and assigning responsible persons for each.
STMDocs campus has excellent conference facilities including large conference hall, audio/video systems, high-speed internet with fallback, redundant power supply etc. making it an ideal choice, as done in 2011. Yet, we prioritized the convenience of the speakers and delegates to avoid travel from and to the hotel in city — prior experience found it is best to locate the conference facility closer to the stay. We scouted for a few hotels with good conference facilities in Thiruvananthapuram city, and finalized the Hyatt Regency; even though we had to take greater responsibility and coordination as they had no prior experience organizing a conference with requirements similar to TUG. Travel and visit advisories were published on the conference web site as soon as details were available.
Projector, UPS, display connectors, microphones, WiFi access points and a lot of related hardware were procured. Conference materials such as t-shirt, mug, notepad, pen, tote bag etc. were arranged. Noted political cartoonist E.P. Unny graciously drew the beloved lion sketches for the conference.
Karl Berry, from the US, orchestrated mailing lists for coordination and communication. CVR, Shan and I assumed the responsibility of answering speaker & delegate emails. At the end of extended deadline for submitting presentations and prerecorded talks; Karl handed us over the archive of those to use with the audio/video system.
Audio/video and live streaming setup
I traveled to Thiruvananthapuram a week ahead of the conference to be present in person for the final preparations. One of the important tasks for me was the setup the audio/video and live streaming for the workshop and conference. The audio/video team and volunteers in charge did a commendable job of setting up all the hardware and connectivity on 16th evening and we tested presentation, video playing, projector, audio in/out, prompt, clicker, microphones and live streaming. There was no prompt at the hotel, so we split the screen-out to two monitors placed on both side of the podium — this was much appreciated by the speakers later. In addition to the A/V team’s hardware and (primary) laptop, two laptops (running Fedora 42) were used: a hefty one to run the presentation & backup OBS setup; another for video conferencing remote speakers’ Q&A session. The laptop used for presentation had 4K screen resolution. Thanks to Wayland (specifically, Kwin), the connected HDMI out can be independently configured for 1080p resolution; but it failed to drive the monitors split further for prompt. Changing the laptop built-in display resolution also to 1080p fixed the issue (may changing from 120 Hz refresh rate to 60 Hz might have helped, but we didn’t fiddle any further).
Also met with Erik Nijenhuis in front of the hotel, who was hand-rolling a cigarette (which turned out to be quite in demand during and after the conference), to receive a copy of the book ‘The Stroke’ by Gerrit Noordzij he kindly bought for me — many thanks!
Workshop
The ‘Tagging PDF for accessibility’ workshop was conducted on 17th July at STMDocs campus — the A/V systems & WiFi were setup and tested a couple of days prior. Delegates were picked up at the hotel in the morning and dropped off after the workshop. Registration of workshop attendees were done on the spot, and we collected speaker introductions to share with session chairs. Had interesting discussions with Frank Mittelbach and Boris Veytsman during lunch.
Reception & Registration
There was a reception at Hyatt on 17th evening, where almost everyone got registered, collected the conference material with program pre-print, t-shirt, mug, notepad & pen, a handwritten (by N. Bhattathiri) copy of Daiva Daśakam, and a copy of the LaTeX tutorial. All delegates introduced themselves — but I had to step out at the exact moment to get into a video call to prepare for live Q&A with Norman Gray from UK, who was presenting remotely on Saturday. There were two more remote speakers — Ross Moore from Australia and Martin J. Osborne from Canada — with whom I conducted the same exercise, despite at inconvenient times for them. Frank Mittelbach needed to use his own laptop for presentation; so we tested the A/V & streaming setup with that too. Doris Behrendt had a presentation with videos; its setup was also tested & arranged.
An ode to libre software & PipeWire
Tried to use a recent Macbook for the live video conference of remote speakers, but it failed miserably to detect the A/V splitter connected via USB to pick up the audio in and out. Resorting to my old laptop running Fedora 42; the devices were detected automagically and PipeWire (plus WirePlumber) made those instantly available for use.
With everything organized and tested for A/V & live streaming, I went back to get some sleep to wake early on the next day.
Day 1 — Friday
Woke up at 05:30, reached hotel by 07:00, and met with some attendees during breakfast. By 08:45, the live stream for day 1 started. Boris Veytsman, the outgoing vice-president of TUG opened TUG2025, handed over to the incoming vice-president and the session chair Erik Nijenhuis; who then introduced Rob Schrauwen to deliver the keynote titled ‘True but Irrelevant’ reflecting on the design of Elsevier XML DTD for archiving scientific articles. It was quite enlightening, especially when one of the designers of a system looks back at the strength, shortcomings, and impact of their design decisions; approached with humility and openness. Rob and I had a chat later, about the motto of validating documents and its parallel with IETF’s robustness principle.
You may see a second stream for day 1, this is entirely my fault as I accidentally stopped streaming during tea break; and started a new one. The group photo was taken after a few exercises in cat-herding.
All the talks on day 1 were very interesting: with many talks about tagging pdf project (that of Mittelbach, Fischer, & Moore); the state of CTAN by Braun — to which I had a suggestion for inactive package maintainer process to consider some Linux distributions’ procedures; Vrajar�ja explained their use of XeTeX to typeset in multiple scripts; Hufflen’s experience in teaching LaTeX to students; Behrendt & Busse’s talk about use of LaTeX in CryptTool; and CVR’s talk about long running project of archiving Malayalam literary works in TEI XML format using TeX and friends. The session chairs, speakers and audience were all punctual and kept their allotted time in check; with many followup discussions happening during coffee break, which had ample time to feel the sessions not rushed.
Ross Moore’s talk was prerecorded. As the video played out, he joined via a video conference link. The audio in/out & video out (for projecting on screen and for live streaming) were connected to my laptop, and we could hear him through the audio system as well as the audience questions via microphone were relayed to him with no lag — this worked seamlessly (thanks to PipeWire). We had a small problem with pausing a video that locked up the computer running the presentation; but quickly recovered — after the conference, I diagnosed it to be a noveau driver issue (a GPU hang).
By the end of the day, Rahul & Abhilash were accustomed to driving the presentation and live streams, so I could hand over the rein and enjoy the talks. Decided to stay back at the hotel to avoid travel, and went to bed by 22:00 but sleep descended on this poor soul only by 04:30 or so; thanks to that cup of ristretto for breakfast!
Judging by the ensuing laughs and questions; it appears not everyone was asleep during my talk. Frank & Ulrike suggested not to colour the underscore glyph in math, instead properly colour LaTeX3 macro names (which can have underscore and colon in addition to letters) in the font.
The sessions on second day were also varied and interesting, in particular Novotný’s talk about static analysis of LaTeX3 macros; Vaishnavi’s fifteen-year long project of researching and encoding Tulu-Tigalari script in Unicode; bibliography processing talks separately by Gray and Osborne (both appeared on video conferencing for live Q&A which worked like a charm), etc.
In the evening, all of us walked (the monsoon rain was at respite) to the music and dance concert; both of which were fantastic cultural & audio-visual experience.
Veena music, and fusion dance concerts.
Day 3 — Sunday
The morning session of final day had a few talks: Rishi lamented about eroding typographic beauty in publishing (which Rob concurred with, Vrajar�ja earlier pointed out as the reason for choosing TeX, …); Doris on LaTeX village in CCC — and about ‘tuwat’ (to take action); followed by the TeX Users Group annual general body meeting presided by Boris as the first session post lunch; then on his approach to solve editorial review process of documents in TeX; and a couple more talks: Rahul’s presentation about pdf tagging used our opentype font for syntax highlighting (yay!); and the lexer developed by Overleaf team was interesting. On Veeraraghavan’s presentation about challenges faced by publishers, I had a comment about the recurrent statement that “LaTeX is complex� — LaTeX is not complex, but the scientific content is complex, and LaTeX is still the best tool to capture and represent such complex information.
Two Hermann Zapf fans listening to one who collaborated with Zapf [published with permission].
Calligraphy
For the final session, Narayana Bhattathiri gave us a calligraphy demonstration, in four scripts — Latin, Malayalam, Devanagari and Tamil; which was very well received judging by the applause. I was deputed to explain what he does; and also to translate for the Q&A session. He obliged the audience’s request of writing names: of themselves, or spouse or children, even a bär, or as Hà n Thế Thà nh wanted — Nhà khủng lồ (the house of dinosaurs, name for the family group); for the next half hour.
Bhattathiri signing his calligraphy work for TUG2025.
Nijenhuis was also giving away swags by Xerdi, and I made the difficult choice between a pen and a pendrive, opting for the latter.
The banquet followed; where in between enjoying delicious food I could find time to meet and speak with even more people and say good byes and ‘tot ziens’.
Later, I had some discussions with Frank about generating MathML using TeX.
Many thanks
A number of people during the conference shared their appreciation of how well the conference was organized, this was heartwarming. I would like to express thanks to many people involved, including the TeX Users Group, the sponsors (who made it fiscally possible to run the event and support many travels via bursary), STMDocs volunteers who handled many other responsibilities of organizing, the audio-video team (who were very thoughtful to place the headshot of speakers away from the presentation text), the unobtrusive hotel staff; and all the attendees, especially the speakers.
Thanks particularly to those who stayed at and/or visited the campus, for enjoying the spicy food, delicious fruits from the garden, and surviving the long techno-socio-eco-political discussions. Boris seems to have taken it to heart my request for a copy of the TeXbook signed by Don Knuth — I cannot express the joy & thanks in words!
The TeXbook signed by Don Knuth.
The recorded videos were handed over to Norbert Preining, who graciously agreed to make the individual lectures available after processing. The total file size was ~720 GB; so I connected the external SSD to one of the servers and made it available to a virtual machine via USB-passthrough; then mounted and made it securely available for copying remotely.
Special note of thanks to CVR, and Karl Berry — who I suspect is actually a kubernetes cluster running hundreds of containers each doing a separate task (with apologies to a thousand gnomes), but there are reported sightings of him; so I sent personal thanks via people who have seen him in flesh — for leading and coordinating the conference organizing. Barbara Beeton and Karl copy-edited our article for the TUGboat conference proceedings, which is gratefully acknowledged. I had a lot of fun and a lot less stress participating in TUG2025 conference!
In 1978, a commemorative souvenir was published to celebrate the milestone of acting in 400 films by Bahadoor, a celebrated Malayalam movie actor. Artist Namboodiri designed its cover caricature and the lettering.
Cover of Bahadoor souvenir designed by artist Namboodiri in 1978.
Based on this lettering, KH Hussain designed a traditional script Malayalam Unicode font named ‘RIT Bahadur’. I did work on the engineering and production of the font to release it on the 25th death anniversary of Bahadoor, on 22-May-2025.
RIT Bahadur is a display typeface that comes in Bold and BoldItalic variants. It is licensed under Open Font License and can be freely downloaded from Rachana website.
Many participants registered from all over India and shared their initial design of a few selected characters. Ten submissions were shortlisted, and the selected participants were invited for a two-day in person workshop conducted at River Valley campus, Trivandrum. The workshop was lead by the jury members — Dr. KH Hussain who designed notably Rachana and Meera fonts (among many other); eminent calligrapher and designer of Sundar, Ezhuthu, Karuna & Chingam fonts, Narayana Bhattathiri; type designer and multi-scripts expert Vaishnavi Murthy; and yours truly. High quality sessions & feedback from the speakers and lively interactive sessions enlightened both experienced and non-technical designers about the intricacies of typeface design.
Participants of the font workshop held at River Valley Campus, Trivandrum, in August 2024.
Refinement
To manage the glyph submissions for collaborative font projects, a friend of mine and I built a web service. The designers just need to create each character in SVG format and upload into their font project. This helped to abstract away from the designers all the technical complexities, such as assigning correct Unicode codepoint, correct naming convention, OpenType layout & shaping etc.
There was mid-term evaluation of the completed glyph set in October 2024; and a couple of online sessions where the jury pointed out necessary corrections and improvements required for each font.
The final submissions were done near the end of December 2024; and further refinements ensued. All the participants were very receptive to the constructive feedback and enthusiastic to improve the fonts. The technical work for final font production was handled by your humble correspondent.
Results
In March 2024, the jury made a final evaluation and adjudged the winners of the competition. All the six fonts completed are published as open source, and they can be downloaded from Rachana website. See the report for the winning entries, font specimens & posters, prize money, and all other details.
RIT Thaara (താര), calligraphic style, named after Sabdatharavali.
RIT Lekha (ലേഖ), body text font.
RIT Lasya (ലാസ്യ). The Latin glyphs were drawn independently based on Akaya Kannada font, as suggested by a jury member.
RIT Ala (അല).
RIT Keram Bold (കേരം).
RIT Indira Bold (ഇന്ദിര).
I am very happy to have the chance to collaborate over the course of a year with designers from various backgrounds to develop beautiful traditional Malayalam orthography fonts and make them all available under free license. I would like to thank the jury members who did exemplary work in evaluating the designs and providing constructive feedback & guidance multiple times that helped to refine the fonts; CVR for the work to create web pages on Rachana website; and the three Foundations for the initiative and funding to make this all possible. Full disclosure: all the jury members worked in volunteer capacity.
Next competition
RIT-KaChaTaThaPa-Sayahna foundations have already announced plans for next open font design competition! This time the focus is on body text fonts.
The 46th annual meeting of the International TeX Users Group (TUG 2025) will take place in Thiruvananthapuram (aka Trivandrum), Kerala, India, on 18–20 July, 2025. The Indian TeX Users Group and TeXFolio (STMDocs) with support from International TeX Users Group and sponsors are organizing the event this time as it comes back to India after a long hiatus of 14 years (the last two instances hosted were in 2011 and 2002).
Details about the registration, venue, travel, accommodation, programme, deadlines and important dates etc. are available at the conference page https://tug.org/tug2025/.
Call for participation
TUG conferences always enjoyed excellent presentations and talks about TeX, Typefaces/Fonts, Typesetting, Typography and anything related. Please submit interesting papers — see call for papers and speaker advice. Note that a visa is required for participants from most countries and it is a non-trivial undertaking. Please register and contact the program committee for a visa invitation letter as soon as possible.
The drawings for TUG 2025 are made by notable cartoonist E.P. Unny and the flyer is typeset by CVR.
The third International Calligraphy Festival of Kerala (ICFK) took place in October 2024, and I was invited to run a session. I had the fortune to see many exemplary calligraphers all over the world come together and demonstrate their work over three days of the festival.
Renowned Malayalam calligrapher Narayana Bhattathiri organizes the conference every year, and it was amazing to witness that many of the speakers and calligraphers were sharing the responsibilities and taking active role in the organization and execution of the sessions. The audience and speaker participation and interactions were warmly welcoming. The demographics was very distributed — students, professionals, calligraphers; and of all ages and genders.
An epiphany
During the session by Prof. G.V. Sreekumar when he asked the participants to write the word ‘സൂര്യൻ’ (the Sun); I took a survey of the writings. What I found was that the older generation all wrote the word in traditional Malayalam orthography; and a large number of younger generation also wrote it in traditional orthography. The latter group were not taught in schools to read or write in traditional script (the text books are all in broken/reformed script). Intrigued how they were familiarized with the traditional orthography, I had questioned how they knew to write in this fashion. The answers were categorized into:
They saw their parents write in traditional orthography.
They saw their grandparents write in traditional orthography (but their parents write in reformed script).
Most strikingly, some youngsters were sure this was the ‘correct’ way of writing and yet they could not explain how or from where they learnt it.
The response from the third category is very interesting: because they learnt the script ‘organically’ and it is imbibed in their identity — which is what the definition of ‘culture’ is. The revelation is that; the script belongs to its people and no matter what the government decrees about (ref: Kerala govt orders about script reform in 1971 and 2022).
The session
Together with type designer Athul Jayaraman, I had a joint session on typography. Athul focused more on the type design and I had elaborated more on Malayalam script (traditional orthography), and font engineering & techniques.
The Mathrubhumi news team also interviewed both of us and published a two-part series of the interview on their new site:
Of the many conferences I have been to, ICFK 2024 was one the best un-conferences in my experience. I met a lot of exemplary, yet humble & approachable, calligraphers (some of them gave me their autographed booklets, thank you!), learnt a lot and enjoyed thoroughly.
At the International TeX Users Group Conference 2023 (TUG23) in Bonn, Germany, I presented a talk about using Metafont (and its extension Metapost) to develop traditional orthography Malayalam fonts, on behalf of C.V. Radhakrishnan and K.H. Hussain, who were the co-developers and authors. And I forgot to post about it afterwards — as always, life gets in between.
In early 2022, CVR started toying with Metafont to create a few complicated letters of Malayalam script and he showed us a wonderful demonstration that piqued many of our interest. With the same code base, by adjusting the parameters, different variations of the glyphs can be generated, as seen in a screenshot of that demonstration: 16 variations of the same character ഴ generated from same Metafont source.
Hussain, quickly realizing that the characters could be programmatically assembled from a set of base/repeating components, collated an excellent list of basic shapes for Malayalam script.
Excerpts from the Malayalam character basic shape components documented by K.H. Hussain.
I bought a copy of ‘The Metafontbook’ and started learning and experimenting. We found soon that Metafont, developed by Prof. Knuth in the late 1970’s, generates bitmap/raster output; but its extension MetaPost, developed by his Ph.D. student John Hobby, generates vector output (postscript) which is required for opentype fonts. We also found that ‘Metatype1’ developed by Bogusław Jackowski et al. has very useful macros and ideas.
We had a lot of fun programmatically generating the character components and assembling them, splicing them, sometimes cutting them short, and transforming them in all useful manner. I have developed a new set of tools to generate the font from the vector output (SVG files) generated by MetaPost, which is also used in later projects like Chingam font.
At the annual TUG conference 2023 in Bonn, Germany, I have presented our work, and we received good feedback. There were three presentations about Metafont itself at the conference. Among others, I also had the pleasure to meet Linus Romer who shared some ideas about designing variable width reph-shapes for Malayalam characters.
The video of the presentation is available in YouTube.
Postscript (no pun intended): after the conference, I visited some of my good friends in Belgium and Netherlands. En route, my backpack with passport, identity cards, laptop, a phone and money etc. was stolen at Liège. I can’t thank enough my friends at Belgium and back at home for their unbridled care, support and help, on the face of a terrible affliction. On the day before my return, the stolen backpack with everything except the money was found by the railway authorities and I was able to claim it just in time.
I made yet another visit to the magnificent Plantin–Moretus Museum (it holds the original Garamond types!), where I myself could ink and print a metal typeset block of sonnet by Christoph Plantijn in 1575, which now hangs at the office of a good friend.
The open fonts for Malayalam developed by Rachana Institute of Technology use our independently developed advanced shaping rules since 2020. A conscious decision was made to support only the revised OpenType specification for Indic scripts (the script tag mlm2, which fixed issues with the v1 specification such as halant shifting). Our shaping rules provide precise, exact and definite shaping for Malayalam Unicode fonts on all major software platforms.
And yet, there are many users who still use either old or buggy softwares/platforms. Hussain and Bhattathiri have expressed angst and displeasure in seeing their beautifully and meticulously designed fonts not shaped correctly on some typeset works and prints (for instance, Ezhuthu is used by Mathrubhumi newspaper showing detached ു/ൂ-signs). I have received many requests over the years to add support for those obsolete (or sometimes proprietary) platforms, but have always refused.
Fig. 1: Ezhuthu font shaped with detached ൃ/ ു-signs. They should conjoin with base character. Source: Mathrubhumi.
Few weeks ago, CVR and I were trying to generate Malayalam epub content to read on a Kobo ebook reader (which supports loading user’s own fonts, unlike Kindle). We found that Kobo’s shaping software (quite possibly an old version of Pango) does not support the v2 OpenType specification. That did irk me and I knew it is going to be a rabbit hole. A little bit of reverse engineering and a day later, we were happy to read Malayalam properly shaped in Kobo, by adding rudimentary support for v1 spec.
Fig. 2: RIT Rachana shaped perfectly with Kobo ebook reader (ignore the book title).
Out of curiosity, I checked whether those small additions work with Windows XP, but it did not (hardly surprising). But now that the itch has been scratched; a bunch of shaping rules were added to support XP era applications as well (oh, well).
Fig. 3: RIT Rachana shaped perfectly in Windows XP.
Few days later, a user also reported (known) shaping issue with Adobe InDesign. Though I was inclined to close it as NOTABUG pointing to use HarfBuzz instead, the user was willing to help test a few attempts I promised to make. Adobe 2020/2021 (and earlier) products use Lipika shaper, but recent versions are using HarfBuzz natively. Lipika seems to support v2 OpenType specification, yet doesn’t work well with our existing shaping rules. Quite some reverse engineering and half a dozen attempts later, I have succeeded in writing shaping rules that support Lipika along with other shapers.
Fig.4: RIT Rachana shaped perfectly with InDesign 2021 (note: the characters outside margins is a known issue only with InDesign, and it is fixed with a workaround).
All published (and in progress) RIT Malayalam fonts are updated with these new set of shaping rules; which means all of them will be shaped exactly, precisely and correctly (barring the well-known limitation of v1 specification and bugs in legacy shapers ) all the way from Windows XP (2002) to HarfBuzz 8.0 (present day) and all applications in between.
Supported shaping engines
With this extra engineering work, RIT fonts now tested to work well with following shaping engines/softwares. Note: old Pango and Qt4 have shaping issues (with below base ല forms and ു/ൂ forms of conjuncts, in respective shapers), but those won’t be fixed. Any shaper other than HarfBuzz (and to a certain extent Uniscribe) is best effort only.
A lot of invaluable work was done by Narayana Bhattathiri, Ashok Kumar and CV Radhakrishnan in testing and verifying the fonts with different platforms and typesetting systems.
End users who reported issues and helped with troubleshooting have also contributed heavily in shaping (pun intended) community software like RIT Malayalam open fonts.
It comes with a regular variant, embellished with stylistic alternates for a number of characters. The default shape of characters D, O,ഠ, ാ etc. are wider in stark contrast with the shape of other characters designed as narrow width. The font contains alternate shapes for these characters more in line with the general narrow width characteristic.
Users can enable the stylistic alternates in typesetting systems, should they wish.
XeTeX: stylistic variant can be enabled with the StylisticSet={1} option when defining the font via fontspec package. For e.g.
% in the preamble \newfontfamily\chingam[Ligatures=TeX,Script=Malayalam,StylisticSet={1}]{Chingam} … \begin{document} \chingam{മനുഷ്യരെല്ലാവരും തുല്യാവകാശങ്ങളോടും അന്തസ്സോടും സ്വാതന്ത്ര്യത്തോടുംകൂടി ജനിച്ചിട്ടുള്ളവരാണ്…} \end{document}
Scribus: extra font features are accessible since version 1.6
LibreOffice: extra font features are accessible since version 7.4. Enable it using Format→Character→Language→Features.
InDesign: very similar to Scribus; there should be an option in the text/font properties to choose the stylistic set.
Development
Chingam is designed and drawn by Narayana Bhattathiri. Based on the initial drawings on paper, the glyph shapes are created in vector format (svg) following the glyph naming convention used in RIT projects. A new build script is developed by Rajeesh that makes it easier for designers to iterate & adjust the font metadata & metrics. Review & scrutiny done by CVR, Hussain KH and Ashok Kumar improved the font substantially.
Download
Chingam is licensed under Open Font License. The font can be downloaded from Rachana website, sources are available in GitLab page.
I have installed GNU/Linux on many a computers in ~20 years (some automated, most individually). In the University, I used to be woken past midnight by someone knocking at the door — who reinstalled Windows — and now they can’t boot because grub was overwritten. I’d rub the eyes, pickup the bunch latest Fedora CDs and go rescue the beast machine. Linux installation, customization and grub-recovery was my specialization (no, the course didn’t have credit for that).
Technologies (libre & otherwise) have improved since then. Instead of MBR, there’s GPT (no, not that one). Instead of BIOS, there’s UEFI. Dual booting Windows with GNU/Linux has become mostly painless. Then there’s Secure Boot. Libre software works with that too. You may still run into issues; I ran into one recently and if someone is in the same position I hope this helps:
A friend of mine got an Ideapad 3 Gaming laptop (which was preinstalled with Windows 11) and we tried to install Fedora 37 on it (of course, remotely; thanks to screensharing and cameras on mobile phones). The bootable USB pendrive was not being listed in boot options (F12), so we fiddled with TPM & Secure Boot settings in EFI settings (F2). No luck, and troubleshooting eventually concluded that the USB pendrive was faulty. Tried with another one, and this time it was detected, happily installed Fedora 37 (under 15 mins, because instead of spinning Hard Disks, there’s SSD). Fedora boots & works fine.
A day later, the friend selects Windows to boot into (from grub menu) and gets greeted by a BitLocker message: “Enter bitlocker recovery key” because “Secure boot is disabled”.
Dang. I thought we re-enabled Secure Boot, but apparently not. Go to EFI settings, and turn it back on; save & reboot; select Windows — but BitLocker kept asking for recovery key but with a different reason: “Secure Boot policy has unexpectedly changed”.
That led to scrambling & searching, as BitLocker was not enabled by the user but OEM, and thus there was no recovery key in the user’s Microsoft online account (if the user had enabled it manually, they can find the key there).
The nature of the error message made me conclude that Fedora installation with secure boot disabled has somehow altered the TPM settings and Windows (rightfully) refuses to boot. EFI settings has an option to ‘Restore Factory Keys’ which will reset the secure boot DB. I could try that to remove Fedora keys, pray Windows boots and if it works, recover grub (my specialty) or reinstall Fedora in the worst case scenario.
Enter Matthew Garret. Matthew was instrumental in making GNU/Linux systems to work with Secure Boot (and was awarded the prestigious Free Software Foundation Award). He is a security researcher who frequently writes about computer security.
I have sought Matthew’s advice before trying anything stupid, and he suggested thus (reproduced with permission):
First, how are you attempting to boot Windows? If you’re doing this via grub then this will result in the secure boot measurements changing and this error occurring – if you pick Windows from the firmware boot menu (which I think should appear if you hit F12 on an Ideapad?) then this might solve the problem.
If neither of these approaches work, then please try resetting the factory keys, reset the firmware to its default settings, and delete any Fedora boot entries from the firmware (you can recover them later), and with luck that’ll work.
Thankfully, the first option of booting Windows directly via F12 — without involving grub — works. And the first thing the user does after logging in is back up the recovery keys.
The Malayalam serif font RIT Rachana and its sans-serif counterpart MeeraNew have enjoyed a wide array of improvements in the past months; and are available now for download and use.
Some notable improvements are listed here:
Entire Malayalam character set defined in Unicode 15, including archaic and vedic characters.
All characters — especially vowel signs — now belong to proper Unicode categoryGDEF class (thanks to Liang Hai for pointing out the correction), removing a workaround put in place just for Adobe InDesign. This workaround is not required when using HarfBuzz shaping engine (which you should anyway).
Improved design of old-style figures 0, 1 & 2 in RIT-Rachana.
Standalone dependent glyphs of pre-base ra (reph) and below-base la can be displayed with ‘zwj+് + ര/ല’ respectively, useful for informational purpose (when writing a typography specific article, for instance). These characters otherwise always conjoin with the base character.
Major improvements in shaping rules to adhere to language rules even better: double consonants are always joined properly in context; even for unusual combinations. Correct shaping for below instance can be obtained by adding a ZWNJ before ണ but the advanced shaping rule is smarter to not require encoding corrections.
Improved underline position (although thou shalt question thyself why use underline in Indic scripts), which is now also respected by LibreOffice 7.5 thanks to Khaled Hosny. This bug was reported many years ago.
ന് + ് + റ → ന്റ (Unicode 5.1 atomic chillu nta) support added upon request.
… kerning improvements and many more tweaks and fine tuning. As usual, both typefaces are free & open source software, available at Rachana website. They will be available shortly in Fedora 36 & 37 as an update.
FontForge is the long standing libre font development tool: it can be used to design glyphs, import glyphs of many formats (svg, ps, pdf, …), write OpenType lookups or integrate Adobe feature files, and produce binary fonts (OTF, TTF, WOFF, …). It has excellent scripting abilities, especially Python library to manipulate fonts; which I extensively use in producing & testing fonts.
When I wrote advanced definitive OpenType shaping rules for Malayalam and build scripts based on FontForge, I also wanted to reuse the comprehensive shaping rules in all the fonts RIT develop. The challenge in reusing the larger set of rules in a ‘limited’ character set font was that FontForge would (rightly) throw errors that such-and-such glyph does not exist in the font and thus the lookup is invalid. For instance, the definitive OTL shaping rules for Malayalam has nearly 950 glyphs and lookup rules; but a limited character set font like ‘Ezhuthu’ has about 740 glyphs.
One fine morning in 2020, I set out to read FontForge’s source code to study if functionality to safely skip lookups that do not apply to a font (because the glyphs specified in the lookup are not present in the font, for instance) can be added. Few days later, I have modified the core functionality and adapted the Python interface (specifically, the Font.mergeFeature method) to do exactly that, preserving backward compatibility.
Next, it was also needed to expose the same functionality in the graphical interface (via File→Merge Feature info menu). FontForge uses its own GUI toolkit (neither GTK nor Qt); but with helpful pointers from Fredrick Brennan, I have developed the GUI to take a flag (default ‘off’ to retain backward compatibility) that allows the users to try skipping lookup rules that do not apply to the current font. In the process, I had to touch the innards of FontForge’s low-level code and learn about it.
Fig. 1: Fontforge now supports skipping non-existent glyphs when merging a comprehensive OpenType feature file.
This worked fine for our use case, typically ignoring the GSUB lookups of type sub glyph1 glyph2 by glyph3 where glyph3 does not exist in the font. But it did not properly handle the cases when glyph1 or glyph2 were non-existent. I’ve tried to fix the issue but then was unable to spend more time to finish it as Real Life caught up; c’est la vie. It was later attempted as part of Free Software Camp mentoring program in 2021 but that didn’t bear fruit.
A couple of weeks ago, Fred followed up now that this functionality is found very useful; so I set aside time again to finish the feature. With fresh eyes, I was able to fix remaining issues quickly, rebase the changes to current master and update the pull request.
Today, on the auspicious day of Malayalam new year (ചിങ്ങം ൧), I am pleased to announce the release of a new libre font for Malayalam script ‘Karuna’ by Rachana Institute of Typography. Karuna is a display typeface suitable for titling and headlines.
Here are some beautiful posters designed in Karuna by Narayana Bhattathiri.
Karuna is designed by renowned calligrapher Narayana Bhattathiri, font development is done by KH Hussain, font engineering is done by me (Rajeesh KV) in collaboration with CV Radhakrishnan.
Bhattathiri explains that the font was inspired by style of CN Karunakaran (1940–2013), an acclaimed painter, illustrator & art director from Kerala. Inspired by and as a homage to his style of titling and designs; Bhattathiri designed the shapes for Karuna. Karuna brings a unique design to the growing collection of high-quality open fonts maintained by Rachana Institute of Typography. In KH Hussain’s words:
മലയാളത്തിന്റെ ടൈറ്റിലിംഗിലും കവർ ഡിസൈനിംഗിലും സി.എൻ.കരുണാകരൻ ആയിരത്തിത്തൊള്ളായിരത്തി എഴുപതുകളിൽ കൊണ്ടുവന്ന മാറ്റം വിപ്ലവാത്മകമായിരുന്നു. എ.എസ്സിന്റെയും നമ്പൂതിരിയുടെയും സമകാലീനനായിരിക്കുമ്പോൾ തന്നെ ചിത്രീകരണങ്ങളിലും അക്ഷര രൂപകല്പനയിലും കരുണാകരൻ പൂർവ്വഗാമികളിൽ നിന്നു വ്യക്തമായ അകലവും വ്യത്യസ്തതയും പുലർത്തി.
അരനൂറ്റാണ്ടിനു ശേഷം നാരയണ ഭട്ടതിരി കരുണ ഡിസൈൻ ചെയ്യുമ്പോൾ വെറുമൊരു പകർത്തലല്ലാതായി അത് മാറുന്നുണ്ട്. കരുണാകരൻ മലയാള അക്ഷരങ്ങളിൽ കാണിച്ച അതേ സ്വാതന്ത്ര്യം കരുണാകരന്റെ അക്ഷരങ്ങളിൽ ഭട്ടതിരിയും എടുക്കുന്നു. മലയാളം ടൈപോഗ്രഫിയിലെ ഏറ്റവും അനന്യമായ ഫോണ്ടായി കരുണ മാറുകയാണ്. ഇന്നിപ്പോൾ ആസ്കിയിലും യൂണികോഡിലും ഉപയോഗത്തിലുള്ള മറ്റെല്ലാ ഫോണ്ടുകൾക്കും മലയാളത്തിലും റോമനിലുമൊക്കെ ചാർച്ചകൾ കണ്ടെത്താൻ കഴിയും. കരുണയ്ക്കു കഴിയില്ല.
1977 ൽ തടവറക്കവിതകൾക്കു വേണ്ടി കരുണാകരൻ ഡിസൈൻ ചെയ്ത പുറംചട്ടയിൽ കരുണാകരന്റെ കാലിഗ്രാഫിയുടെ പ്രത്യേകതകൾ ദർശിക്കാൻ കഴിയും. അടിയന്തിരാവസ്ഥയിൽ കൊടിയ മർദ്ദനങ്ങൾക്കിരയായി തടവറയിൽ കിടന്ന് നക്സലൈറ്റുകൾ എഴുതിയ കവിതകളുടെ സമാഹാരമായിരുന്നു ആ പുസ്തകം. അടിയന്തിരാവസ്ഥയുടെ നൃശംസതകൾ ആ കവർ ചിത്രത്തിലെ അക്ഷരങ്ങളിൽ വിറങ്ങലിപ്പായി നിഴലിക്കുന്നു. കരുണ ഫോണ്ട് അതിന്റെയൊരു പകർന്നാട്ടമായി മാറുന്നു.
Title designed by CN Karunakaran in 1977. Source: KH Hussain.
Karuna follows the traditional orthography of Malayalam script (neither reformed script, nor re-reformed script) and has precise OTL shaping rules required for advanced script layout. The font is licensed and made available for public use under Open Font License (OFL). You may download it at Rachana website. Font sources are available at the GitLab repository.
We are living in 2022. And it is now possible to digitally sign a PDF document using libre software. This is a love letter to libre software projects, and also a manual.
For a long time, one of the challenges in using libre software in ‘enterprise’ environments or working with Government documents is that one will eventually be forced to use a proprietary software that isn’t even available for a libre platform like GNU/Linux. A notorious use-case is digitally signing PDF documents.
Recently, Poppler (the free software library for rendering PDF; used by Evince and Okular) and Okular in particular has gained a lot of improvements in displaying digital signature and actually signing a PDF document digitally (see this, this, this, this, this and this). When the main developer Albert asked for feedback on what important functionality would the community like to see incorporated as part this effort; I had asked if it would be possible to use hardware tokens for digital signature. Turns out, poppler uses nss (Network Security Services, a Mozilla project) for managing the certificates, and if the token is enrolled in NSS database, Okular should be able to just use it.
This blog post written a couple of years ago about using hardware token in GNU/Linux is still actively referred by many users. Trying to make the hardware token work with Okular gave me some more insights. With all the other prerequisites (token driver installation etc.) in place, follow these steps to get everything working nicely.
Howto
There are 2 options to manage NSSDB: (i) manually by setting up$HOME/.pki/nssdb, or (ii) use the one automatically created by Firefox if you already use it. Assuming the latter, the nssdb would be located in the default profile directory $HOME/.mozilla/firefox/<random.dirname>/ (check for existence of the file pkcs11.txt in that directory to be sure).
Open Okular and go to Settings → Configure backend → PDF and choose/set the correct certificate database path, if not already set by default.
Fig. 1: Okular PDF certificate database configuration.
Start the smart card service (usually auto-started, you won’t have to do this): either pcsc_wd.service (for WatchData keys) or pcscd.service.
Plug in the hardware token.
Open a PDF in Okular. Add digitial signature using menu Tools → Digitally Sign
This should prompt for the hardware token password.
Fig. 2: Digital token password prompt when adding digital sign in the PDF document.
Click & drag a square area where you need to place the signature and choose the certificate. Note that, since Poppler 22.03, it is also possible to insert signature in a designated field.
Fig. 3: Add digital signature by drawing a rectangle.
Signature will be placed on a new PDF file (with suffix -signed) and it will open automatically.
Fig. 4: Digitally signed document.
You can also see the details of the hardware token in PDF backend settings.
Fig. 5: Signature present in hardware token visible on the PDF backend settings.
Thanks to the free software projects & developers who made this possible.
MeeraNew is the default Malayalam font for Fedora 36. I have just released a much improved version of this libre font and they are just built for Fedora 36 & rawhide; which should reach the users in a week of time. For the impatient, you may enable updates-testing repository and provide karma/feedback.
Fig. 1: MeeraNew definitive-script Malayalam font version 1.3.
Two major improvements are present in version 1.3
Improved x-height to match RIT Rachana, the serif counterpart. This should improve readability at default font sizes.
Large improvements to many kerning pairs including the above-base mark positioning of dotreph (0D4E) character (e.g. ൎയ്യ), ി (0D3F), ീ (0D40) vowel symbols (e.g. ന്റി), post-base symbols of Ya, Va (e.g. സ്ത്യ) etc.
The font source and OTF/TTF files can be downloaded at Rachana website or at GitLab.
The upcoming Fedora release 36 (due end of April 2022) and beyond, and ELN (Enterprise Linux Next, what would become RHEL) will have default Malayalam script fonts as RIT Rachana and Meera New fonts. In addition, Sundar, TNJoy, Panmana and Ezhuthu fonts are now available in the official repositories. This brings Malayalam fonts that are modern (Unicode 13 compatible), well-maintained, having perfect complex-script shaping and good metadata to the users of Fedora, RHEL, CentOS & downstream OSen. I have made all the necessary updates in the upstream projects (which I maintain) and packaged them for Fedora (which also I maintain).
Update: thanks to Norbert Preining, all these fonts are also available for ArchLinux!
RIT Malayalam fonts available in Fedora.
RIT Rachana and Meera New fonts will be default serif and sans-serif fonts for Malayalam. smc-rachana-fonts and smc-meera-fonts are deprecated as they are unmaintained.
All the fonts can be installed from your favourite package managers (GNOME Software, Discover, dnf etc.).
RIT fonts in GNOME software of upcoming Fedora 36.
The packages can be installed using dnf via:
sudo dnf install -y rit-*-fonts
This change in Fedora required many well orchestrated steps:
Packaging & building RIT fonts according to latest font packaging guidelines
Set as default serif/sans-serif fonts for Malayalam in langpacks
Set as default serif/sans-serif fonts for Malayalam in fedora-comps
Propose the ChangeRequest which is then discussed & approved by Fedora Engineering Steering Committee (FESCO).
I would like to especially thank Parage Nemade for coordinating all the changes and relevant engineering procedures, and Pravin Satpute for initial discussions; in helping to complete these updates in time for Fedora 36.
TLDR; research and development of a completely new OpenType layout rules for Malayalam traditional orthography.
Writing OpenType shaping rules is hard. Writing OpenType shaping rules for advanced (complex) scripts is harder. Writing OpenType shaping rules without causing any undesired ligature formations is even harder.
Background
The shaping rules for SMC fonts abiding v2 of Malayalam OpenType specification (mlm2 script tag) were written and polished in large part by me over many years, fixing shaping errors and undesired ligature formations. It still left some hard to fix bugs. Driven by the desire to fix such difficult bugs in RIT fonts and the copyright fiasco, I have set out to write a simplified OpenType shaping rules for Malayalam from scratch. Two major references helped in that quest: (1) a radically different approach I have tried few years ago but failed with mlym script tag (aka Windows XP era shaping); (2) a manuscript by R. Chithrajakumar of Rachana Aksharavedi who culled and compiled the ‘definitive character set’ for Malayalam script. The idea of ‘definitive character set’ is that it contains all the valid characters in a script and it doesn’t contain any (invalid) characters not in the script. By the definition; I wanted to create the new shaping rules in such a way that it does not generate any invalid characters (for e.g. with a detached u-kar). In short: it shouldn’t be possible to accidentally generate broken reformed orthography forms.
Fig. 1. Samples of Malayalam definitive character set listing by R. Chithrajakumar, circa 1999. Source: K.H. Hussain.
“Simplify, simplify, simplify!”
Henry David Thoreau
It is my opinion that a lot of complexity in the Malayalam shaping largely comes from Indic OpenType shaping specification largely follows Devanagari, which in turn was adapted from ISCII, which has (in my limited understanding) its root in component-wise metal type design of ligature glyphs. Many half, postbase and other shaping rules have their lineage there. I have also heard similar concerns about complexity expressed by others, including Behdad Esfahbod, FreeFont maintainer et al.
Implementation
As K.H. Hussain once rightly noted, the shaping rules were creating many undesired/unnecessary ligature glyphs by default, and additional shaping rules (complex contextual lookups) are written to avoid/undo those. A better, alternate approach would be: simply don’t generate undesired ligatures in the first place.
“Invert, always invert.”
Carl Gustav Jacob Jacobi
Around December 2019, I set out to write a definitive set of OpenType shaping rules for traditional script set of Malayalam. Instead of relying on many different lookup types such as pref, pstf, blwf, pres, psts and myriad of complex contextual substitutions, the only type of lookup required was akhn — because the definitive character set contains all ligatures of Malayalm and those glyphs are designed in the font as a single glyph — no component based design.
The draft rules were written in tandem with RIT-Rachana redesign effort and tested against different shaping engines such as HarfBuzz, Allsorts, XeTeX, LuaHBTeX and DirectWrite/Uniscribe for Windows. Windows, being Windows (also being maintainers of OpenType specification), indeed did not work as expected adhering to the specification. Windows implementation clearly special cased the pstf forms of യ (Ya, 0D2F) and വ (Va, 0D35). To make single set of shaping rules work with all these shaping engines, the draft rules were slightly amended, et voila — it worked in all applications and OSen that use any of these shaping engines. It was decided to drop support for mlym script which was deprecated many years ago and support only mlm2 specification which fixed many irreparable shortcomings of mlym. One notable shaping engine which doesn’t work with these rules is Adobe text engine (Lipika?), but they have recently switched to HarfBuzz. That covers all major typesetting applications.
Fig. 2. Samples of the OpenType shaping rules for definitive characters set of Malayalam traditional orthography.
Testing fonts developed using this new set of shaping rules for Malayalam indeed showed that they do not generate any undesired ligatures in the first place. In addition, compared to the previous shaping rules, it gets rid of 70+ lines of complex contextual substitutions and other rules, while remaining easy to read and maintain.
Fig. 3. Old vs new shaping rules in RIT Rachana.
Application support
This new set of OpenType layout rules for Malayalam is tested to work 100% with following shaping engines:
In addition, the advantages of the new shaping rules are:
Adheres to the concept of ‘definitive character set’ of the language/script completely. Generate all valid conjunct characters and do not generate any invalid conjunct character.
Same set of rules work fine without adjustments/reprogramming for ‘limited character set’ fonts. The ‘limited character set’ may not contain conjunct characters as extensive in the ‘definitive character set’; yet it would always have characters with reph and u/uu-kars formed correctly.
Reduced complexity and maintenance (no complex contextual lookups, reverse chaining etc.). Write once, use in any fonts.
Open source, libre software.
This new OpenType shaping rules program was released to public along with RIT Rachana few months ago, and also used in all other fonts developed by RIT. It is licensed under Open Font License for anyone to use and integrate into their fonts, please ensure the copyright statements are preserved. The shaping rules are maintained at RIT GitLab repository. Please create an issue in the tracker if you find any bugs; or send a merge request if any improvement is made.
Let’s Encrypt revolutionized the SSL certificate management for websites in a short span of time — it directly improved the security of users of the world wide web by: (1) making it very simple to deploy SSL certificates to websites by administrators and (2) make the certificates available free of cost. To appreciate their efforts, compare to what hoops one had to jump through to obtain a certificate from a certificate authority (CA) and how much money and energy one would have to spend on it.
I make use of letsencrypt in all the servers I manitain(ed) and in the past used the certbot tool to obtain & renew certificates. Recent versions of certbot are only available as a snap package, which is not something I’d want to or able to setup in many cases.
Enter acme. It is shell script that works great. Installing acme will also setup a cron job, which would automatically renew the certificate for the domain(s) near its expiration. I have recently setup dict.sayahna.org using nginx as a reverse proxy to a lexonomy service and acme for certificate management. The cron job is supposed to renew the certificate on time.
Except it didn’t. Few days ago received a notification from about imminent expiry of the certificate. I have searched the interweb quite a bit, but didn’t find a simple enough solution (“make the proxy service redirect the request”…). What follows is the troubleshooting and a solution, may be someone else find it useful.
Problem
acme was unable to renew the certificate, because the HTTP-01 authentication challenge requests were not answered by the proxy server where all traffic was being redirected to. In short: how to renew letsencrypt certificates on an nginx reverse-proxy server?
Certificate renewal attempt by acme would result in errors like:
# .acme.sh/acme.sh --cron --home "/root/.acme.sh" -w /var/www/html/
[Sat 08 May 2021 07:28:17 AM UTC] <strong>===Starting cron===</strong>
[Sat 08 May 2021 07:28:17 AM UTC] <strong>Renew: 'my.domain.org'</strong>
[Sat 08 May 2021 07:28:18 AM UTC] Using CA: https://acme-v02.api.letsencrypt.org/directory
[Sat 08 May 2021 07:28:18 AM UTC] Single domain='my.domain.org'
[Sat 08 May 2021 07:28:18 AM UTC] Getting domain auth token for each domain
[Sat 08 May 2021 07:28:20 AM UTC] Getting webroot for domain='my.domain.org'
[Sat 08 May 2021 07:28:21 AM UTC] Verifying: my.domain.org
[Sat 08 May 2021 07:28:24 AM UTC] <strong>my.domain.org:Verify error:Invalid response from https://<strong>my.domain</strong>.org/.well-known/acme-challenge/Iyx9vzzPWv8iRrl3OkXjQkXTsnWwN49N5aTyFbweJiA [NNN.NNN.NNN.NNN]:</strong>
[Sat 08 May 2021 07:28:24 AM UTC] <strong>Please add '--debug' or '--log' to check more details.</strong>
[Sat 08 May 2021 07:28:24 AM UTC] <strong>See: https://github.com/acmesh-official/acme.sh/wiki/How-to-debug-acme.sh</strong>
[Sat 08 May 2021 07:28:25 AM UTC] <strong>Error renew <strong>my.domain</strong>.org.</strong>
Troubleshooting
The key error to notice is
Verify error:Invalid response from https://my.domain.org/.well-known/acme-challenge/Iyx9vzzPWv8iRrl3OkXjQkXTsnWwN49N5aTyFbweJiA [NNN.NNN.NNN.NNN]
Sure enough, the resource .well-known/acme-challenge/… is not accessible. Let us try to make that accessible, without going through proxy server.
Solution
First, create the directory if it doesn’t exist. Assuming the web root as /var/www/html:
Then, edit /etc/nginx/sites-enabled/my.domain.org and before the proxy_pass directive, add the .well-known/acme-challenge/ location and point it to the correct location in web root. Do this on both HTTPS and HTTP server blocks (otherwise it didn’t work for me).
Make sure the configuration is valid and reload the nginx configuration
nginx -t && systemctl reload nginx.service
Now, try to renew the certificate again:
# .acme.sh/acme.sh --cron --home "/root/.acme.sh" -w /var/www/html/
...
[Sat 08 May 2021 07:45:01 AM UTC] Your cert is in /root/.acme.sh/my.domain.org/dict.sayahna.org.cer
[Sat 08 May 2021 07:45:01 AM UTC] Your cert key is in /root/.acme.sh/my.domain.org/my.domain.org.key
[Sat 08 May 2021 07:45:01 AM UTC] v2 chain.
[Sat 08 May 2021 07:45:01 AM UTC] The intermediate CA cert is in /root/.acme.sh/my.domain.org/ca.cer
[Sat 08 May 2021 07:45:01 AM UTC] And the full chain certs is there: /root/.acme.sh/my.domain.org/fullchain.cer
[Sat 08 May 2021 07:45:02 AM UTC] _on_issue_success
This probably is a very late eulogy. This also means, it took me that long to find the nerve to put together words without breaking down or loosing my composure.
So here goes the story of two little friends who “are” brothers (from two close families) for a lifetime.
I was almost a year old when he arrived (Hari Krishnan, referred to as Kichus from here on).
We both grew up sharing toys, getting new dresses together for Onam, buying crackers together for Vishu, fighting for penalties and 6s (and ofcz making amends the very next day).
I took for granted that this kid is going to be with me forever, see me graduate high school, class 10th, class 12th, see me become an Engineer. But destiny as we call it had other plans.
Kichu was diagnosed with Blood Cancer at the age of 16.
Little did we know that, we started losing each other way before he even turned 16.
Let me tell you how I remained helpless while everything around me pulled me down into a rabbit hole.
I was off for school that morning, to attend my last half-yearly exam. Something was off that day from the time I woke up. My mom was acting weird and she was in a hurry to push me off for school. Given this was my exams, I felt this urge was justified. So I walk away from the gate and I could see my mom peeking through the kitchen window, making sure I was not being stopped by anyone to tell me what had happened. I somehow reach the bus-stop, and the bus that comes on time everyday, was no where to be seen.
I could see my mom outside my house now, with those extended neck looking out for me, checking if I safely got into the bus. At this point I knew something wasn’t right. We all knew Kichu wasn’t gonna make it, coz I saw him a month before this day.
He had his fare share of chemo done by then, and had lost all his hair. It was hard for me to face him and look him in the eyes that was searching for a bit of hope, coz he knew it all the way.
How did he know you may ask.
Coz he had seen his elder sister take the same path to death when he was probably 6.
Every time I saw him, he was holding on to that smile making sure his mom never saw him suffer the pain he had within. Kichu was strong and he asked me to be strong alongside him and keep my shit together.
He wanted to do so much thing, and he had very little time.
We started swimming lessons, we went for painting class, he got himself a gaming PC and we played NFS all day or till he was tired.
He couldn’t play any more for his heart was weak, but he watched me score goals. Even when he was cheering from the sidelines, I was hoping for that one day when I could celebrate another goal with him.
So the bus finally arrived and I am off for school. Mom is relieved for the time being.
I write my exam, thinking about what had happened in the morning. I walk back home in the afternoon, and I open the front door.
I could see my mom had cried the whole day, and her eyes were so red and dry. I could see my grandma numb and looking at me with those helpless eyes.
Mom finally said: “ശ്രീ, നമ്മടെ കിച്ചു പോയെടാ / Kichu is no longer with us”
I don’t remember anything but just one answer. I asked mom if she was hiding this from me in the morning.
Yes
I felt so much anger and pain, I wanted to smash the front window glass. I went straight to my room upstairs, shut the door from behind, grabbed a pillow, and bit it like an angry dog and screamed for a long time as far as I can remember.
I had to be strong.
How can I be strong, when I hear the friend I thought I had for a lifetime, had gone away for ever.
How can I be strong when the last image of him I had in my head was of the kid who hoped to live a healthy funny life.
How can I be strong when I could not even say a final goodbye. I couldn’t even see his body for one last time.
Days, weeks, and months pass by. I wanted to accept the reality, but till this very day, I wake up on most days empty and have all these thoughts about how we grew up as brothers.
Chemistry paper was out after valuation, and I still remember the then chemistry teacher asking me in-front of the whole class about what was wrong with me. She wasn’t expecting me to do this poorly in the exam for she knew my mom who also happens to teach the same subject.
As much as I wanted to shout to the whole class that I just lost my best friend, I kept quite with my head down. I felt so much pain that day, that I tore up the answer paper and threw it away on my way back home. I don’t think any of my classmates knew about the whole Kichu scene.
I was scared to talk about it and have only told this to a close friend of mine, once. I am still scared and it hurts hell to write this draft, which I don’t know if I would be able to publish.
Almost 10 years have gone by, and when I look back at my childhood, at least I can still picture the little kid with a bright smile who always had my back.
This is a Eulogy for you buddy:
You have shown me the courage to fight with hope. You are my brother, I miss you very badly and I’ll always carry you with me. I wish you could see me now. I wasn’t ready for you to go yet, but you left me with no other choice. Growing up into adulthood without you was hard, and am still finding it hard to believe its been 10 years since you left.
I feel so proud and honored to have shared the kind of brotherhood and love we had for each other for 16 years, but how I wish I could get more of those.
I know you tried hard and I understand why you had to give up. I know you faked a lot of smile towards the end but I know you did it for a reason. If something life has taught me from all that you went through, its that “there are some people who always find reason to make others happy, even when they know that they are dying”. I don’t really believe in after life and stuff. For all the people who know me, should now know why I gave up on the concept of God, for God wasn’t there when I needed. I don’t trust someone who doesn’t show up when you need them to. So God for me died with Kichu.
Love you, my brother.
-Sree
NB. This post is for remembering my friend and also to help me let some of the longing feeling of pain and heavy heart. This post doesn’t really tell half the pain I still have and I could never write something that does. For people who have been in my shoes or are currently in it, please find that strength by holding on to good memories with the ones you lost.
I have been interested in the concept of Freedom - both in the technical and
social ecosystems for almost a decade now. Even though I am not a harcore
contributor or anything, I have been involved in it for few years now - as an
enthusiast, a contributor, a mentor, and above all an evangelist. Since 2019 is
coming to an end, I thought I will note down what all I did last year as a FOSS
person.
GitLab
My job at GitLab is that of a Distribution Engineer. In simple terms, I have to
deal with anything that a user/customer may use to install or deploy GitLab. My
team maintains the omnibus-gitlab packages for various OSs, docker image, AWS
AMIs and Marketplace listings, Cloud Native docker images, Helm charts for
Kubernetes, etc.
My job description is essentially dealing with the above mentioned tasks
only, and as part of my day job I don’t usually have to write and backend
Rails/Go code. However, I also find GitLab as a good open source project and
have been contributing few features to it over the year. Few main reasons I
started doing this are
An opportunity to learn more Rails. GitLab is a pretty good project to do
that, from an engineering perspective.
Most of the features I implemented are the ones I wanted from GitLab, the
product. The rest are technically simpler issues with less complexity(relates
to the point above, regarding getting better at Rails).
I know the never-ending dilemma our Product team goes through to always
maintain the balance of CE v/s EE features in every release, and prioritizing
appropriate issues from a mountain of backlog to be done on each milestone.
In my mind, it is easier for both them and me if I just implemented something
rather than asked them to schedule it to be done by a backend team, so that I
cane enjoy the feature. To note, most of the issues I tackled already had
Accepting Merge Requests label on them, which meant Product was in
agreement that the feature was worthy of having, but there were issues with
more priority to be tackled first.
So, here are the features/enhancements I implemented in GitLab, as an interested
contributor in the selfish interest of improving my Rails understanding and to
get features that I wanted without much waiting:
I have been a volunteer at Swathanthra Malayalam Computing for almost 8 years
now. Most of my contributions are towards various localization efforts that SMC
coordinates. Last year, my major contributions were improving our fonts build
process to help various packaging efforts (well, selfish reason - I wanted my
life as the maintainer of Debian packages to be easier), implementing CI based
workflows for various projects and helping in evangelism.
I have been a Debian contributor for almost 8 years, became a Debian Maintainer
3 years after my first stint with Debian, and have been a Debian Developer for 2
years. My activities as a Debian contributor this year are:
Continuing maintenance of fonts-smc-* and hyphen-indic packages.
Mozilla - Being one of the Managers for Malayalam
Localization team of Mozilla, I helped coordinate localizations of various
projects, interact with Mozilla staff for the community in clarifying their
concerns, getting new projects added for localization etc.
Talks
I also gave few talks regarding various FOSS topics that I am
interested/knowledgeable in during 2019. List and details can be found at the
talks page.
Overall, I think 2019 was a good year for the FOSS person in me. Next year, I
plan to be more active in Debian because from the above list I think that is
where I didn’t contribute as much as I wanted.
Dockup is a tool that helps engineering teams spin up on-demand environments. We have a UI that talks to several agents which are installed on remote servers. The UI sends commands to these agents over WebSocket connections using Phoenix channels.
What if agent went down?
The commands to spin up and manage environments are sent over to agents running on remote servers. For this to work, we need to make sure our agents are online and ready to receive the commands. In order to do this, we need to keep track of agents assigned to our users and also show the agent’s online status in the UI.
Our first implementation
In the UI, we show if the agent for the organization is online and ready to receive the commands. It is an old school synchronous “ping” to agent behind a Retry module, where we ask for “pong” from agent to relay that back to our UI. This has a problem.
Consider the agent went down due to some unexpected error in the remote server, or suppose the organization has not yet been configured with a proper agent. If the user now opens the page that shows the agent status, the request would be blocked until the “ping” to the agent times out. Unfortunately this would take some time and would be terrible UX. No user would want to see an empty loading screen, only to find out that their agent is actually down!
Using Phoenix Presence
Phoenix Presence is a feature which allows you to register process information on a topic and replicate it transparently across a cluster. It’s a combination of both a server-side and client-side library which makes it simple to implement. A simple use-case would be showing which users are currently online in an application.
If we can track online statuses of users in chat-rooms, it should be possible to track online statuses of our agents too. That’s exactly what we did and here’s a step-by-step guide on how to do it
Firstly, we need to add Phoenix Presence under the App supervision tree as explained in the official docs.
We then configure our agents channel to use Presence to track the agents that connect with Dockup. After this, we can then simply ask Presence if there is a presence of an agent in our app!
By the time user actually visits the settings page, Presence would already have the info whether that specific agent has already joined the topic or not. Since this is a very basic key-value lookup, it is going to be super quick. We no longer need to play ping-pong with the agent to know the presence!
Earlier, this page would, in worst case scenario, take around 30–50 seconds to render, simply because the agent was down.
[info] Received GET /settings [info] Sent 200 response in 40255.44ms
Using Presence, the response time came down to around 40–50ms, or even lower.
[info] Received GET /settings [info] Sent 200 response in 17.86ms [info] Received GET /settings [info] Sent 200 response in 49.65ms [info] Received GET /settings [info] Sent 200 response in 65.62ms [info] Received GET /settings [info] Sent 200 response in 31.12ms [info] Received GET /settings [info] Sent 200 response in 51.73ms
The most interesting thing about solving this issue for us was that the PR that went in was tiny (just +40/-1), but the impact it had was significant, something we’ve seen time and again with Elixir!
On-demand environments for running end-to-end tests
“Be more confident about your code changes by adding end-to-end tests that run for each deployment you create on Dockup.”
End-to-end testing is a technique used to verify the correctness of an application’s behavior when it works in integration with all its dependencies. Running end-to-end tests have become exceedingly complicated over time as companies embrace service oriented architecture and monoliths turn into micro-services.
In this blog post, we’ll see how to use Dockup to automatically spin up on-demand environments to run end-to-end tests for every pull request.
We will be explaining this based on Cypress.io but you can follow the similar steps for configuring your favorite E2E tool to run alongside Dockup deployments.
For ease of understanding, let’s use a simple VueJS app that implements a TodoMVC.
We will keep this source under a common project folder, say todomvc-app/ and also create another folder, say todomvc-app/e2e/. We will write our tests inside this. Cypress test specs are kept under a sub-directory called "cypress" and we will have a cypress.json file inside our e2e folder. See more on how Cypress tests are written from their docs.
Once you have your test specs ready, we need to add a Dockerfile and the whole directory structure would look something like this:
Since the test cases are to be run for several deployments, we will be keeping the baseUrl config value for cypress as an initial dummy URL, and then we will override it with env variables. This is documented by Cypress here.
Container for the actual todo-app
Assuming that you have already added the container for the actual app while creating a Dockup Blueprint for your todomvc-app (as shown in figure above), we will add a new container holding the image source details. If you are new to Dockup Blueprint, head over here to read more about creating one.
Take care about the Dokcerfile path here, as this is the one which resides inside our e2e folder.
We will also have to add CYPRESS_BASE_URL env for Cypress to receive a public endpoint for the deployment. This can be done using the Environment Variables Substitution feature ( Refer DOCKUP_PORT_ENDPOINT_) in Dockup.
The cypress container would exit with the overall number of error the test had.
Container form for e2e
That is all you need to do to have a working cypress end-to-end test running alongside each of the deployments.
Since containers inside a Dockup deployment spin up when they are ready and not sequentially, you will need a shell script that waits for the UI endpoint to be live before you start to run tests. The script can simply fail when the endpoint is not live, upon which the Dockup container would restart.
#!/bin/sh
set -x set -e
echo "Checking if the endpoint for testing is ready..."
response=$(curl --write-out %{http_code} --silent --output /dev/null "$CYPRESS_BASE_URL") if [[ $response != 200 ]]; then # exit the script exit 1 fi
Cypress has its own docker image configured to run on several CI tools, which you can use it on Dockup as well, without much changes. All you have to do is, put the cypress/folder in the same level as the Dockerfile as their images look for it in the root directory, and as soon as the containers spin up, cypress run command would run. It is however not recommended to do this on Dockup due to the reason mentioned above. Instead, have the script take care of running the cypress command when the endpoint is up.
Now you can go ahead and deploy this blueprint and have it run the E2E tests for you. Your containers should spin up and the e2e tests should start running. While you wait for them to complete, you can also take a look at the logs.
Image builds are ready
A successful deployment with your e2e tests passed would look something like this.
E2E test has passed, and hence the container has a success check
Checks also send updates to GitHub if the deployments are triggered by PRs.
An example of how Dockup sends updates to GitHub PRs
In the case of cypress, it would exit out with a non-zero exit code when there are failures, and thus the container would also fail, suggesting there are failed test cases.
“Spin up on-demand staging environment to test out your custom plugins and themes for WordPress”
Setting up a staging environment and maintaining it for every theme/plugin project for WordPress can be very daunting. Quite often when website developers work on design implementations or content creators try to add articles to their website, they tend to seek approval from team members more often than one can imagine. This can be a tedious amount of work and also time consuming if the team is limited by availability of staging environments.
Dockup helps you mitigate this problem by providing on-demand staging environments for you WordPress site. Your changes would automatically be made available across your team as and when you update it, while letting you concentrate on the design or the article.
In this article, we’ll see how you can use Dockup to automatically spin up on-demand copies of your WordPress site.
How can Dockup help?
Dockup can automatically spin up a staging environment every time you open a PR for your WordPress site. This way, you will have an environment ready at your disposal, with all the changes from the PR. All you have to do is, push code, test your changes for the theme, and perhaps show it your team.
You can also deploy your branches manually on Dockup. This can be super useful when a non-tech team member wants to test how the site looks for any commit or branch. Let’s see how to set this up.
Setting up Dockup
Assuming you have prior knowledge on how/where themes and plugins fits in WordPress, let me quickly setup a sample plugin for the sake of this documentation. If you don’t have any current project, tag along the next step to have a simple source code which we can deploy on Dockup to test things.
This plugin for WordPress will append a line to each of the post that we create. This can be used to write some thanks message or a goodbye message to the end of each post.
Setup the project folder as below:
We have a root project folder called “ending-line-wp-plugin”
A file with the plugin code, “ending-line-wp-plugin/ending-line/ending-line.php”
Dockerfile for building images for Dockup
Project structure
Copy paste the following code in the ending-line.php file:
Now let’s dockerise this one. Its pretty straightforward here. All you have to do is copy this one project folder in to a plugins folder inside the “wp-content”.
FROM wordpress:php7.3-apache
WORKDIR /var/www/html
COPY ending-line/ wp-content/plugins/ending-line/
Note that we are not using any scripts to start a MySQL server before we run the actual WordPress server. Dockup lets us spin up both containers separately and we will connect them using the environment variables.
Let’s create a Dockup Blueprint for this project and see how we can stage our plugin project.
Note that we are using GitHub as the source here, but you can also use a pre-built docker image of the plugin/theme you are developing.
Also, double check the env variables that your project might need. For this one, we need three. To know what env variables are supported by WordPress, head over here
Now, you have a staging environment ready for you to test the plugin you just wrote!
Don’t forget to activate your plugin from the admin panel of WordPress. If you are following this sample app, remember to add the text to be appended to every post via Settings > Ending Line Plugin
How to create on-demand environments for Jekyll sites
“See how your Jekyll site and articles look like before you publish them.”
Want to see how your site will turn out before you publish? Just open a PR on your repo and Dockup will spin up a live site for you!
Assuming that you have a Jekyll blog in place, let’s see how we can dockerise it and create a Dockup Blueprint. Here’s the Jekyll site we’ll use: Minima.
We’ll add a couple of files to the root directory:
Been quite some time since I wrote about anything. This time, it is Debutsav.
When it comes to full-fledged FOSS conferences, I usually am an attendee or at
most a speaker. I have given some sporadic advices and suggestions to few in the
past, but that was it. However, this time I played the role of an organizer.
DebUtsav Kochi is the second edition of Debian Utsavam, the celebration of Free
Software by Debian community. We didn’t name it MiniDebConf because it was our
requirement for the conference to be not just Debian specific, but should
include general FOSS topics too. This is specifically because our target
audience aren’t yet Debian-aware to have a Debian-only event. So, DebUtsav Kochi
had three tracks - one for general FOSS topics, one for Debian talks and one for
hands-on workshops.
As a disclaimer, the description about the talks below are what I gained from my
interaction with the speakers and attendees, since I wasn’t able to attend as
many talks as I would’ve liked, since I was busy with the organizing stuff.
The event was organized by Free Software Community of India, whom I represented
along with Democratic Alliance for Knowledge Freedom (DAKF) and Student
Developer Society (SDS). Cochin University of Science and Technology were
generous enough to be our venue partners, providing us with necessary
infrastructure for conducting the event as well as accommodation for our
speakers.
The event span across two days, with a registration count around 150
participants. Day 1 started with a keynote session by Aruna Sankaranarayanan,
affiliated with OpenStreetMap. She has been also associated with GNOME Project,
Wikipedia and Wikimedia Commons as well as was a lead developer of the Chennai
Flood Map that was widely used during the floods that struck city of Chennai.
Sruthi Chandran, Debian Maintainer from Kerala, gave a brief introduction about
the Debian project, its ideologies and philosophies, people behind it, process
involved in the development of the operating system etc. An intro about
DebUtsav, how it came to be, the planning and organizations process that was
involved in conducting the event etc were given by SDS members.
After these common talks, the event was split to two parallel tracks - FOSS and
Debian.
In the FOSS track, the first talk was by Prasanth Sugathan of Software Freedom
Law Centre about the needs of Free Software licenses and ensuring license
compliance by projects. Parallely, Raju Devidas discussed about the process
behind becoming an official Debian Developer, what does it mean and why it
matters to have more and more developers from India etc.
After lunch, Ramaseshan S introduced the audience to Project Vidyalaya, a free
software solution for educational institutions to manage and maintain their
computer labs using FOSS solutions rather than the conventional proprietary
solutions. Shirish Agarwal shared general idea about various teams in Debian and
how everyone can contribute to these teams based on their interest and ability.
Subin S showed introduced some nifty little tools and tricks that make Linux
desktop cool, and improve the productivity of users. Vipin George shared about
the possibility of using Debian as a forensic workstation, and how it can be
made more efficient than the proprietary counterparts.
Ompragash V from RedHat talked about using Ansible for automation tasks, its
advantages over similar other tools etc. Day 1 ended with Simran Dhamija talking
about Apache SQOOP and how it can be used for data transformation and other
related usecases.
In the afternoon session of Day 1, two workshops were also conducted parallel
to the talks. First one was by Amoghavarsha about reverse engineering, followed
by an introduction to machine learning using Python by Ditty.
We also had an informal discussion with few of the speakers and participants
about Free Software Community of India, the services it provide and how to get
more people aware of such services and how to get more maintainers for them etc.
We also discussed the necessity of self-hosted services, onboarding users
smoothly to them and evangelizing these services as alternatives to their
proprietary and privacy abusing counterparts etc.
Day 2 started with a keynote session by Todd Weaver, founder and CEO of Purism
who aims at developing laptops and phones that are privacy focused. Purism also
develops PureOS, a Debian Derivative that consists of Free Software only, with
further privacy enhancing modifications.
On day 2, the Debian track focused on a hands-on packaging workshop by Pirate
Praveen and Sruthi Chandran that covered the basic workflow of packaging, the
flow of packages through various suites like Unstable, Testing and Stable,
structure of packages. Then it moved to the actual process of packaging by
guiding the participants through packaging a javascript module that is used by
GitLab package in Debian. Participants were introduced to the tools like
npm2deb, lintian, sbuild/pbuilder etc. and the various debian specific files and
their functionalities.
In the FOSS track, Biswas T shared his experience in developing keralarescue.in,
a website that was heavily used during the Kerala Floods for effective
collaboration between authorities, volunteers and public. It was followed by
Amoghavarsha’s talk on his journey from Dinkoism to Debian. Abhijit AM of COEP
talked about how Free Software may be losing against Open Source and why that
may be a problem. Ashish Kurian Thomas shed some knowledge on few *nix tools
and tricks that can be a productivity booster for GNU/Linux users. Raju and
Shivani introduced Hamara Linux to the audience, along with the development
process and the focus of the project.
The event ended with a panel discussion on how Debian India should move forward
to organize itself properly to conduct more events, spread awareness about
Debian and other FOSS projects out there, prepare for a potential DebConf in
India in the near future etc.
The number of registrations and enthusiasms of the attendees for the event is
giving positive signs on the probability of having a proper MiniDebConf in
Kerala, followed by a possible DebConf in India, for which we have bid for.
Thanks to all the participants and speakers for making the event a success.
Thanks to FOSSEE, Hamara Linux and GitLab for being sponsors of the event and
thus enabling us to actually do this. And also to all my co-organizers.
A very special thanks to Kiran S Kunjumon, who literally did 99% of the work
needed for the event to happen (as you may recall, I am good at sitting on a
chair and planning, not actually doing anything. :D ).
So I attended my first international FOSS conference - FOSSAsia 2018
at Lifelong learning institute, Singapore. I presented a
talk titled “Omnibus - Serve your dish on all the tables”
(slides, video)
about the tool Chef Omnibus which I use on a daily basis for my job at
GitLab.
The conference was a 4-day long one and my main aim was to network with as many
people as I can. Well, I planned to attend sessions, but unlike earlier times
when I attended all the sessions, these days I am more focussed on certain
topics and technologies and tend to attend sessions on those (for example,
devops is an area I focuses on, block chain isn’t).
One additional task I had was attend the Debian booth at the exhibition from
time to time. It was mainly handled by Abhijith (who is a DM). I also met two
other Debian Developers there - Andrew Lee(alee)
and Héctor Orón Martínez(zumbi).
I also met some other wonderful people at FOSSAsia, like Chris Aniszczyk of
CNCF, Dr Graham Williams of Microsoft, Frank Karlitschek of NextCloud,
Jean-Baptiste Kempf and Remi Denis-Courmont of VideoLan, Stephanie Taylor of
Google, Philip Paeps(trouble) of FreeBSD, Harish Pillai of RedHat, Anthony,
Christopher Travers, Vasudha Mathur of KDE, Adarsh S of CloudCV (and who is from
MEC College, which is quite familiar to me), Tarun Kumar of Melix, Roy Peter of
Go-Jek (with whom I am familiar, thanks to the Ruby conferences I attended),
Dias Lonappan of Serv and many more. I also met with some whom I know knew only
digitally, like Sana Khan who was (yet another, :D) a Debian contributor from
COEP. I also met with some friends like Hari, Cherry, Harish and Jackson.
My talk went ok without too much of stuttering and I am kinda satisfied by it.
The only thing I forgot is to mention during the talk that I had stickers (well,
I later placed them in the sticker table and it disappeared within minutes. So
that was ok. ;))
PS: Well, I had to cut down quite a lot of my explanation and drop my demo due
to limited time. This caused me miss many important topics like omnibus-ctl or
cookbooks that we use at GitLab. But, I had a few participants come up and meet
me after the talk, with doubts regarding omnibus and its similarity with
flatpak, relevance during the times of Docker etc, which was good.
Some photos are here:
Abhijith in Debian Booth
Abhijith with VLC folks
Andrew's talk
With Anthony and Harish: Two born-and-brought-up-in-SG-Malayalees
It has been long since I have written somewhere. In the last year I attended some events, like FOSSMeet, DeccanRubyConf, GitLab’s summit and didn’t write anything about it. The truth is, I forgot I used to write about all these and never got the motivation to do that.
Anyway, last week, I conducted a workshop on Git basics for the students of CUSAT. My real plan, as always, was to do a bit of FOSS evangelism too. Since the timespan of workshop was limited (10:00 to 13:00), I decided to keep everything to bare basics.
Started with an introduction to what a VCS is and how it became necessary. As a prerequisite, I talked about FOSS, concept of collaborative development, open source development model etc. It wasn’t easy as my audience were not only CS/IT students, but those from other departments like Photonics, Physics etc. I am not sure if I was able to help them understand the premise clearly. However, then I went on to talk about what Git does and how it helps developers across the world.
IIRC, this was the first talk/workshop I did without a slide show. I was damn lazy and busy to create one. I just had one page saying “Git Workshop” and my contact details. So guess what? I used a whiteboard! I went over the basic concepts like repositories, commits, staging area etc and started with the hand-on session. In short, I talked about the following
Initializing a repository
Adding files to it
Add files to staging areas
Committing
Viewing commit logs
Viewing what a specific commit did
Viewing a file’s contents at a specific commit
Creating a GitLab account (Well, use all opportunity to talk about your employer. :P)
Creating a project in GitLab
Adding it as a remote repository to your local one
Pushing your changes to remote repository
I wanted to talk about clone, fork, branch and MRs, but time didn’t permit. We wound up the session with Athul and Kiran talking about how they need the students to join the FOSSClub of CUSAT, help organizing similar workshops and how it can help them as well. I too did a bit of “motivational talk” regarding how community activities can help them get a job, based on my personal experience.
Here are a few photos, courtesy of Athul and Kiran:
So, M.Tech is coming to an end I should probably start searching for a job soon. Still, it seems I will be having a bit of free time from Mid-September. I have got some plans about the areas I should contribute to SMC/Indic Project. As of now, the bucket list is as follows:
Properly tag versions of fonts in SMC GitLab repo - I had taken over the package fonts-smc from Vasudev, but haven’t done any update on that yet. The main reason was fontforge being old in Debian. Also, I was waiting for some kind of official release of new versions by SMC. Since the new versions are already available in the SMC Fonts page, I assume I can go ahead with my plans. So, as a first step I have to tag the versions of fonts in the corresponding GitLab repo. Need to discuss whether to include TTF file in the repo or not.
Restructure LibIndic modules - Those who were following my GSoC posts will know that I made some structural changes to the modules I contributed in LibIndic. (Those who don’t can check this mail I sent to the list). I plan to do this for all the modules in the framework, and to co-ordinate with Jerin to get REST APIs up.
GNOME Localization - GNOME Localization has been dead for almost two years now. Ashik has shown interest in re-initiating it and I plan to do that. I first have to get my committer access back.
Documentation - Improve documentation about SMC and IndicProject projects. This will be a troublesome and time consuming task but I still like our tools to have proper documentation.
High Priority Projects - Create a static page about the high priority projects so that people can know where and how to contribute.
Die Wiki, Die - Initiate porting Wiki to a static site using Git and Jekyll (or any similar tool). Tech people should be able to use git properly.
Knowing me pretty much better than anyone else, I understand there is every chance of this being “Never-being-implemented-plan” (അതായത് ആരംഭശൂരത്വം :D) but still I intend to do this in an easy-first order.
So finally it’s over. Today is the last date for submission of the GSoC project. This entire ride was a lot informative as well as an experience filled one. I thank Indic Project organisation for accepting my GSoC project and my mentors Navaneeth K N and Jishnu Mohanfor helping me out fully throughout this project.
The project kicked off keeping in mind of incorporating the native libvarnam shared library with the help of writing JNI wrappers. But unfortunately the method came to a stall when we were unable to import the libraries correctly due to lack of sufficient official documentations. So my mentor suggested me an alternative approach by making use of the Varnam REST API. This has been successfully incorporated for 13 languages with the necessity of the app requiring internet connection. Along with it, the suggestions which come up are also the ones returned by Varnam in the priority order. I would be contributing further to Indic Project to make the library method work in action. Apart from that see below the useful links,
this and this is related to adding a new keyboard with “qwerty” layout.
this is adding a new SubType value and a method to identify TransliterationEngine enabled keyboards.
this is adding the Varnam class and setting the TransliterationEngine.
this and this deals with applying the transliteration by Varnam and returning it back to the keyboard.
this is the patch to resolve the issue, program crashes on switching keyboards.
this makes sure that after each key press, the displayed word is refreshed and the transliteration of the entire word is shown.
this makes sure that on pressing deletion, the new word in displayed.
this creates a template such that more keyboards can be added easily.
this makes sure that the suggestions appearing are directly from the Varnam engine and not from the inbuilt library.
The lists of the commits can be seen here which includes the addition of layouts for different keyboards and nit fixes.
The project as a whole is almost complete. The only thing left to do is to incorporate the libvarnam library into the apk and then we can call that instead of the Varnam class given here. The ongoing work for that can be seen below,
Hi,
First of all my thanks to Indic Project and Swathanthra Malayalam Computing(SMC) for accepting this project. All hats off to my mentors Nalin Sathyan and Samuel Thibault. The project was awesome and I believe that I have done my maximum without any prior experience
Now let me outline what we have done during this period. Braille-Input-Tool (The on-line version)
Just like Google transliteration or Google Input Tools online. This is required because it's completely operating system independent and it's a modern method which never force user to install additional plugin or specific browser. The user might use this form temporary places like internet cafe. This is written using JQuery and Html. And works well in GNU/Linux, Microsoft windows, Android etc
See Picture of Ibus-Braille preferences given below
2 8-Dot braille Enabled : Yes languages having more than 64 characters which can't be handled with 64 (6 dots ) combination are there, Music notations like “Abreu” and LAMBDA (Linear Access to Mathematics for Braille Device and Audio Synthesis) uses 8-dot braille system. unicode support 8-dot braille.
Commit 1 : https://gitlab.com/anwar3746/ibus-braille/commit/54d22c0acbf644709d72db076bd6de00af0e20b9
See key/shortcut page picture of ISB preferences dot setting
3 Dot 4 issue Solved : In IBus-Braille when we type in bharati braille such as Malayalam, Hindi, etc. we have to use 13-4-13 to get letter ക്ക(Kka). But according to braille standard in order to get EKKA one should press 4-13-13. And this make beginners to do extra learning to start typing. Through this project we solved this issues and a conventional-braille-mode switch is provided in preferences in order to switch between.
4 Add Facility to write direct Braille Unicode : Now one can use IBus-Braille to type braille dot notation directly with the combination. The output may be sent to a braille embosser. Here braille embosser is an impact printer that renders text in braille characters as tactile braille cells.
5 Three to Six for disabled people with one hand : A three key implementation which uses delay factor between key presses for example 13 followed by
13 having delay less than delay factor (eg:0.2) will give X. If more, then output would be KK. If one want to type a letter having combination only 4,5,6 he have to press "t" key prior. The key and the Conversion-Delay can be adjusted from preferences.
1 Announce extra information through Screen Reader: When user expand abbreviation or a contraction having more than 2 letters is substituted the screen reader is not announcing it. We have to write a orca(screen reader) plugin for Ibus-Braille
2 A UI for Creating and Editing Liblouis Tables
3 Add support for more Indic Languages and Mathematica Operators via
liblouis
Braille-input-tool (online version)
Liblouis integration
Conventional Braille, Three Dot mode and Table Type selection
“To be successful, the first thing to do is to fall in love with your work — Sister Mary Lauretta”
Well, the Google Summer of Code 2016 is reaching its final week as I get ready to submit my work. It has been one of those best three-four months of serious effort and commitment. To be frank, this has to be one of those to which I was fully motivated and have put my 100%.
Well, at first, the results of training wasn’t that promising and I was actually let down. But then, me and my mentor had a series of discussions on submitting, during which she suggested me to retrain the model excluding the data set or audio files of those speakers which produced the most errors. So after completing the batch test, I noticed that four of the data set was having the worst accuracy, which was shockingly below 20%. This was causing the overall accuracy to dip from a normal one.
So, I decided to delete those four data set and retrain the model. It was not that of a big deal, so I thought its not gonna be drastic change from the current model. But the result put me into a state of shock for about 2–3 seconds. It said
TOTAL Words: 12708 Correct: 12375 Errors: 520 TOTAL Percent correct = 97.38% Error = 4.09% Accuracy = 95.91% TOTAL Insertions: 187 Deletions: 36 Substitutions: 297 SENTENCE ERROR: 9.1% (365/3993) WORD ERROR RATE: 4.1% (519/12708)
Now, this looks juicy and near to perfect. But the thing is, the sentences are tested as they where trained. So, if we change the structure of sentence that we ultimately give to recognize, it will still be having issues putting out the correct hypothesis. Nevertheless, it was far more better than it was when I was using the previous model.
So I guess I will settle with this for now as the aim of the GSoC project was to start the project and show proof of that this can be done, but will keep training better ones in the near future.
Google Summer of Code 2016 — Submission
Since the whole project was carried under my personal Github repository, I will link the commits in it here : Commits
Well, I have been documenting my way through the project over here at Medium starting from the month of May. The blogs can be read from here.
What can be done in near future?
Well, this model is still in its early stage and is still not the one that can be used error free, let alone be applied on applications.
The data set is still buggy and have to improved with better cleaner audio data and a more tuned Language Model.
Speech Recognition development is rather slow and is obviously community based. All these are possible with collaborated work towards achieving a user acceptable level of practical accuracy rather than quoting a statistical, theoretical accuracy.
All necessary steps and procedure have been documented in the README sections of the repository.
Its almost the end of the GSoC internship. From zero knowledge of Android to writing a proposal, proposal getting selected and finally 3 months working on the project was a great experience for me! I have learned a lot and I am really thankful to Jishnu Mohan for mentoring throughout .
All the tasks mentioned in the proposal were discussed and worked upon.
LayoutsÂ
I started with making the designs of the layouts. The task was to make Santali Olchiki and Soni layouts for the keyboard. I looked at the code of the other layouts to get a basic understanding of how phonetic and inscript layouts work. Snapshot of one of the view of Santali keyboard :
Language Support FeatureÂ
While configuring languages, the user is prompted about the locales that might not be supported by the phone.
Adding Theme Feature
Feature is added at the setup to enable user to select the keyboard theme
Merging AOSP code After looking at everything mentioned in the proposal, Jishnu  gave me the job of  merging AOSP source code to the keyboard as the current keyboard doesn’t have changes that were released along with  android M code drop because of which target sdk is not 23 . There are a few errors yet to be resolved and I am working on that
Overall, it was a wonderful journey and I will always want to be a contributor to the organisation as it introduced me to the world of open source and opened a whole new area to work upon and learn more.
Link to the discourse topic :Â https://discourse.indicproject.org/t/indic-keyboard-project/45
It is finally the time to wind up the GSoC work on which I have been buried for the past three months. First of all, let me thank Santhosh, Hrishi and Vasudev for their help and support. I seem to have implemented, or at least proved the concepts that I mentioned in my initial proposal. A spell checker that can handle inflections in root word and generate suggestion in the same inflected form and differentiate between spelling mistakes and intended modifications has been implemented. The major contributions that I made were to
My initial work was on improving the existing stemmer that was available as part of LibIndic. The existing implementation was a rule based one that was capable of handling single levels of inflections. The main problems of this stemmer were
General incompleteness of rules - Plurals (പശുക്കൾ), Numerals(പതിനാലാം), Verbs (കാണാം) are missing.
Unable to handle multiple levels of inflections - (പശുക്കളോട്)
Unnecessarily stemming root words that look like inflected words - (ആപത്ത് -> ആപം following the rule of എറണാകുളത്ത് -> എറണാകുളം)
The above mentioned issues were fixed. The remaining category is verbs which need more detailed analysis.
I too decided to maintain the rule-based approach for lemmatizer (actually, what we are designing is half way between a stemmer and lemmatizer. Since it is more inclined towards a lemmatizer, I am going to call it that.) mainly because for implementing any ML or AI techniques, there should be sufficient training data, without which the efficiency will be very poor. It felt better to gain higher efficiency with available rules than to try out ML techniques with no guarantee (Known devil is better logic).
The basic logic behind the multi-level inflection handling lemmatizer is iterative suffix stripping. At each iteration, a suffix is identified from the word and it is transformed to something else based on predefined rules. When no more suffixes are found that have a match on the rule set, we assume the multiple levels of inflection have been handled.
To handle root words that look like inflected words (hereafter called ‘exceptional words’) from being stemmed unnecessarily, it is obvious we have to use a root word corpus. I used the Datuk dataset that is made available openly by Kailash as the root word corpus. A corpus comparison was performed before the iterative suffix stripping started, so as to handle root words without any inflection. Thus, the word ആപത്ത് will get handled even before the iteration begins. However, what if the input word is an inflected form of an exceptional word, like ആപത്തിലേക്ക്? This makes it necessary to introduce the corpus comparison step after each iteration.
At each iteration, suffix stripping happens from left to right. Initial suffix has 2nd character as the starting point and last character as end point. At each inner iteration, the starting point moves rightwards, thus making the suffix shorter and shorter. Whenever a suffix is obtained that has a transformation rule defined in the rule set, it is replaced with the corresponding transformation. This continues until the suffix becomes null.
Multi-level inflection is handled on the logic that each match in rule set induces a hope that there is one more inflection present. So, before each iteration, a flag is set to False. Whenever a match in ruleset occurs at that iteration, it is set to true. If at the end of an iteration, the flag is true, the loop repeats. Else, we assume all inflections have been handled.
Since this lemmatizer is also used along with a spellchecker, we will need a history of the inflections identified so that the lemmatization process can be reversed. For this purpose, I tagged the rules unambiguously. Each time an inflection is identified, that is the extracted suffix finds a match in the rule set, in addition to the transformation, the associated tag is also pushed to a list. As the result, the stem along with this list of tags is given to the user. This list of tags can be used to reverse the lemmatization process - for which I wrote an inflector function.
A demo screencast of the lemmatizer is given below.
So, comparing with the existing stemmer algorithm in LibIndic, the one I implemented as part of GSoC shows considerable improvement.
Future work
Add more rules to increase grammatical coverage.
Add more grammatical details - Handling Samvruthokaram etc.
Use this to generate sufficient training data that can be used for a self-learning system implementing ML or AI techniques.
2. Spell Checker
TLDR
The second phase of my GSoC work involved making the existing spell checker module better. The problems I could identify in the existing spell checker were
It could not handle inflections in an intelligent way.
It used a corpus that needed inflections in them for optimal working.
It used only levenshtein distance for finding out suggestions.
As part of GSoC, I incorporated the lemmatizer developed in phase one to the spell checker, which could handle the inflection part. Three metrics were used to detect suggestion words - Soundex similarity, Levenshtein Distance and Jaccard Index. The inflector module that was developed along with lemmatizer was used to generate suggestions in the same inflected form as that of original word.
There were some general assumptions and facts which I inferred and collected while working on the spell checker. They are
Malayalam is a phonetic language, where the word is written just like it is pronounced. This is opposite to the case of English, where alphabets have different pronunciations in different words. Example is the English letter “a” which is pronounced differently in “apple” and “ate”.
Spelling mistakes in Malayalam, hence, are also phonetic. The mistakes occur by a character with similar pronunciation, usually from the same varga. For example, അദ്ധ്യാപകൻ may be written mistakenly as അദ്യാപകൻ, but not as അച്യാപകൻ.
A spelling mistake does not mean a word that is not present in the dictionary. The user has to be considered intelligent and he should be trusted not to make mistakes. A word not present in dictionary can be an intentional modification also. A "mistake" is something which is not in the dictionary AND which is very similar to a valid word. If a word is not found in dictionary and no similar words are found, it has to be considered an intentional change the user induced and hence should be deemed correct. This often solves the issues of foreign words deemed as incorrect.
Spelling mistakes in inflected words usually happen at the lemma of the word, not the suffix. This is also because most commonly used suffix parts are pronounced differently and mistakes have a smaller chance to be present there.
The first phase, obviously is a corpus comparison to check if the input word is actually a valid word or not. If it is not, suggestions are generated. For this, a range of words have to be selected. From the logic of Malayalam having phonetic spelling mistakes, the words starting with the characters that are linguistic successor and predecessor of the first character of the word is selected. That is, for the input words ബാരതം, which have ബ as first character the words selected will be the ones starting by ഫ and ഭ. Out of these words, the top N (which is defaulted to 5) words have to be found out that are most similar to the input word.
Three metrics were used for finding out similarity between two words. For Malayalam, a phonetic language, soundex similarity was assigned the top priority. To handle the words that were similar but not phonetically similar because of a difference on a single character that defines phonetic similarity, levenshtein distance was also used. This finds out distance between two words, or the number of operations needed for one word to be transformed to other. To handle the other words, Jaccard index was also used. The priority was assigned as soundex > levenshtein > jaccard. Weights were assigned to each possible suggestion based on the values of these three metrics based on the following logic:
To differentiate between spelling “mistakes” and intended modifications, the logic used that if a word did not have N suggestions that have weight > 50, it is most probably an intended word and not a spelling mistake. So, such words were deemed correct.
A demo screencast of the spell checker is given below.
3. Package structure
The existing modules of libindic had an inconsistent package structure that gave no visibility to the project. Also, the package names were too general and didn’t convey the fact that they were used for Indic languages. So, I suggested and implemented the following suggestions
Package names (of the ones I used) were changed to libindic-. Examples would be libindic-stemmer, libindic-ngram and libindic-spellchecker. So, the users will easily understand this package is part of libindic framework, and thus for indic text.
Namespace packages (PEP 421) were used, so that import statments of libindic modules will be of the form from libindic.<module> import <language>. So, the visibility of the project ‘libindic’ is increased pretty much.
“Now that the basic aim was fulfilled, what more can we work on, given there is almost half a month to GSoC Submission!”
Well, as of now the phoneme transcription was done purely based on the manner the word was written and not completely based on the Speech pattern. What I mean is that there are some exceptions in how we write the word and pronounce it (differently). This was pointed out by Deepa mam. She also asked if I could possibly convert some of the existing Linguistic rules(algorithms) that was made with Malayalam TTS in mind, so that it could be used to re-design the phoneme transcription. This could also turn out to be helpful for future use like using it for a fully intelligent Phoneme Transcriber for Malayalam Language Modeling.
This is what we are working on right now, and am literally like scratching my head over some loops in Python!
juzzzz jokinnn
The basic idea is to iterate over each line in the ‘ml.dic’ file and validate the transcription I made earlier with the set of rules. Correcting them (if found invalid) as it goes over.
Seems pretty straight forward! Will see how it goes!
Update — 4th August
Wew!, This is going nuts! OK so I first tried using Lists to classify the different types of phones. It all was good, until I reached a point in algorithm where I have to check if the current phoneme in the transcription is a member of a particular class of phoneme ( now, when I say, class of Phoneme, I just mean, the classification and not the class ). Of course I can search in List for the presence of the element and its quite sufficient enough to say in small comparisons. Our case is different. We are talking about around 7000 words in a file, on top of which each line will have significant amount of if-elif clauses.
This could slow down things and make the script less efficient ( will eventually see the difference ). So I went back to Python documentation and read about the Set Types ( set and frozenset )
A set object is an un-ordered collection of distinct hashable objects. Common uses include membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference. — said the Python doc.
This is exactly what I wanted. I mean, I don’t have to do any manipulation to the phoneme classes, so there is no real meaning in using a List. Furthermore, the Set supports the ‘in’ using which the membership can be checked with no additional searching procedure. How cool is that!
So, after some test on the script, I generated the dictionary file once again, this time applying some of the TTS rules. Now the SphinxTrain is running with this dictionary file. Hopefully, there should be some change in the accuracy.!
left panel with new dictionary, right panel with old dictionaryleft panel with new dictionary, right panel with old dictionary
This might as well be the last development phase update if all goes well. Then it is submission time.
Hi, with this week I where fighting with my final semester exams! and it's over. Also within this week I added the facility for typing direct braille Unicode.
Cmake was giving me some trouble in the beginnning. After clearing all the dependency issues with the Cmake example, I was successfully able to run the endless-tunnel on my phone. Following the similar pattern of how the modules are being incorporated in the cmake app, we tried to incorporate the varnam module. The code for the attempt is given here.
Now there comes a problem :| I have documented the issue here,
After 9 days, there has still not been a single response :( So as an alternative we have decided to use the varnam API. I have completed the class for the same and is yet to link to the Keyboard input from the Indic Keyboard app. This part is the agenda for the next week.
//Pascal program HelloWorld(output); begin writeln("That's all for now, see you next time!") end.
Alright, for the past two weeks, me and my mentor have been trying a lot to call the varnam library in Java. First we went on trying to load the prebuilt library onto Android Studio and then use the methods in Java, which didn’t work :(
Now we are on a different route of compiling varnam during runtime. For this we are following the cmake example given here. Another thing to note that is, cmake requires canary Android Studio which can be downloaded here. It all started off well when it was seen that OSX has a problem running that.
Now I am getting it all setup on Linux as well as Windows( just in case :P ) Sorry in not writing any technical details, will make it up in the next week.
//Rust fn main() { println!('That's all for now, see you next time!'); }
Ooh boy, half way through GSoC and lot to be done. Finally we decided to do the entire project in Android Studio so that the later integration with Indic Keyboard would be easier. As said in the last post, I was in a state of completing the wrappers of varnam_init() and rest of the functions when a queue of challenges popped up.
First of all since we are moving out of the regular “PC” kind of architecture, storing the scheme files in a specific directory is still a problem. First we decided to store it in the internal storage of the mobile which then eventually caused a lot of problems because varnam_set_symbols_dir() required a string path to the directory, which was not possible. Then we later decided to store it in the external storage of the device. This decision is temporary because once the user removes the external SD card, Varnam keyboard would not be functional :P
Then came the problem of build architectures. Since my work machine is a Mac, all the built libraries are in the form of .dylib files. Android accepts only .so files as the jniLibs. After generating the binary in my dual boot Ubuntu, it turned out that Android accepts only 32 - bit architecture libraries. Then using VirtualBox I finally managed to get the desired files. Now out of nowhere the thrown error is,
"Cannot find: libpthread.so.0"
I have currently written wrappers for most of the required methods, but have to resolve these errors to get the testing going smoothly. I will upload a list of references I have gone through(there a tons of em) in the next post so that anyone working in this topic may find it useful.
//Scala object Bye extends Application { println('That's all for now, see you next time!') }
Well, the title says it all. The computer just recognized what I said in my Mother Tongue! A major step in the right the direction.
For this to happen, I had to complete the Acoustic Model training. So then!
What is Acoustic Model!
Well it is a set of statistical representational parameters used to learn the language by representing the relation between audio signal and corresponding linguistic features that make up that speech or audio ( phoneme, and transcription! ).
To produce these we need to set up a database structure as documented by the CMU SphinxTrain team. Some of these files were common to the Language Model preparation like the phoneme transcription file. After setting up the database it should look like this irrespective of the language!
The training is straight forward if you get the database error free which was not my case! Thank you! ( ** if you get it error free on the first run, you are probably doing it wrong! ** )
I had to solve two issues ( 1 and 2 ) before I could run the training without any hiccups! It took a day to make the patch works in the files. The documentation didn’t mention that the phone set should contain a maximum of 255 phones due to practical limitation though theoretically it had no problems ( found out myself from the CMU help forums. ). That was the Issue : Reduce phoneset to a max of 255 #31. I successfully reduced to what is found in the current repository version.
Update — July 27
Acoustic Model is ready for testing!
How??!!
$ sphinxtrain -t ml setup
This command will setup the ‘etc’ and ‘wav’ folder as mentioned above. Now we need to setup the sphinx_train.cfg config file which is excellently documented by the team.
Once that is out of the way, run the training.
$ cd ml
and,
$ sphinxtrain run
and,
wait!!
..
…
….
still waiting!!
.
..
…
Finally its done! Took quite a lot of time!
Not only that, my Zenbook finally started showing heating and fan noise. That sixth gen Intel needed some extra air! ( ** nice! ** ).
Update — July 29
Well, this means, the GSoC 2016 aim have been achieved which was to develop the Language Model and Acoustic Model. Only thing left is to keep testing it.
The discussion with Deepa mam helped in bringing out a possibility in improving the accuracy which am working on as a branch in parallel to the testing.
With that in mind for the coming week, that’s it for this week
This week I forked IBus-Braille project from SMC GitLab repository added two things.
1 Eight-Dot braille enabled. Now one can add languages with 8 dot's. The default keys are Z for dot 7 and period for dot-8. This can be remapped using preferences.
My next task was to show instead of all layouts, filter them on the basis of language. My first option I decided to do filtering based on locale. So instead of ACTION_INPUT_METHOD_SUBTYPE_SETTINGS we can use ACTION_LOCALE_SETTINGS but the problem here was that it was giving a list of all the locales in the system instead of the locales in our app. So I skipped this idea. And then decided to create a list and enable users selection on that. But there was no way to connect that to enabled system subtypes. I was stuck on this for quite some time .We ditched the plan and moved on to the “Theme selection” task.
I am currently working on the Theme Selection task . I have successfully added the step . But now I am working on adding the fragment instead of the whole activity . After I am done with this, I will move to adding the images of the themes. I will hopefully complete this task by the weekend.
Also , after a meeting with the mentor, it is decided that after this task I will work on merging AOSP source code to the keyboard as the current keyboard doesn’t have changes that were released along with  android M code drop because of which target sdk is not 23 . So my next task will be merging AOSP code which will give the benifit of run time permissions.
One of the stuff I love doing is teaching what I know to others. Though it is a Cliché dialogue, I know from experience that when we teach others our knowledge expands. From 10 students, you often get 25 different doubts and minimum 5 of them would be ones you haven’t even thought yourself earlier. In that way, teaching drives a our curiosity to find out more.
I was asked to take a LaTeX training for B.Tech students as a bridge course (happening during their semester breaks. Poor kids!). The usual scenario is faculty taking class and we PG students assisting them. But, since all the faculty members were busy with their own subjects’ bridge courses and LaTeX was something like an additional skill that the students need for their next semesters for their report preparation, I was asked to take to take it with the assistance of my classmates. At first, I was asked to take a two-day session for third year IT students. But later, HOD decided that both CS and IT students should have that class, and guess what - I had to teach for four days. Weirdly, the IT class was split to two non-continuous dates - Monday and Wednesday. So, I didn’t have to take class for four consecutive days, but only three. :D
The syllabus I followed is as follows:
Basic LaTeX – Session I
Brief introduction about LaTeX, general document structure, packages etc.
Text Formatting
Lists – Bullets and Numbering
Graphics and Formulas – Session II
Working with Images
Tables
Basic Mathematical Formulas
Academic Document Generation (Reports and Papers) – Session III
Sectioning and Chapters
Header and Footer
Table of Contents
Adding Bibliography and Citations
IEEETran template
Presentations using Beamer – Session IV
As (I, not the faculty) expected, only half of the students came (Classes on semester breaks, I was surprised when even half came!). Both the workshops - for CS and IT - were smooth without any much issues or hinderences. Students didn’t hesitate much to ask doubts or tips on how to do stuff that I didn’t teach (Unfortunately, I didn’t have time to go off-syllabus, so I directed them to Internet. :D). Analysing the students, CS students were more lively and interactive but they took some time to grasp the concept. Compared to them, even though kind of silent, IT students learned stuff fast.
By Friday, I had completed 4 days, around 22 hours of teaching and that too non-stop. I was tired each day after the class, but it was still fun to share the stuff I know. I would love to get this chance again.
Finally, I can start the work towards Milestone — 2, which is completing the development of Language Model for Malayalam. Time to completely switch to Ubuntu from here on. Why?
Well, all the forums related to CMU Sphinx keep telling that they won’t monitor the reports from Windows anyways, and since all the commands and codes mentioned in the documentation is more inclined to Linux, let’s just stick to it as well. After all, when it comes to Open-Source, why should I develop using Microsoft Windows. (** Giggle **)
What is a Statistical Language Model?
Statistical language models describe more complex language, which in our case is Malayalam. They contain probabilities of the words and word combinations. Those probabilities are estimated from a sample data ( the sentence file ) and automatically have some flexibility.
This means, every combination from the vocabulary is possible, though probability of such combination might vary.
Let’s say if you create statistical language model from a list of words , which is what I did for my Major Project work, it will still allow to decode word combinations ( phrases or sentences for that matter. ) though it might not be our intent.
Overall, statistical language models are recommended for free-form input where user could say anything in a natural language and they require way less engineering effort than grammars, you just list the possible sentences using the words from the vocabulary.
Let me explain this with a traditional Malayalam example:
Suppose we have these two sentences “ ഞാനും അവനും ഭക്ഷണം കഴിച്ചു ” and “ ചേട്ടൻ ഭക്ഷണം കഴിച്ചില്ലേ ”.
If we use the statistical language model of this set of sentences, then it is possible to derive more sentences from the words( vocabulary ).
ഞാനും (1) , അവനും (1) , ഭക്ഷണം (2) , കഴിച്ചു (1) , ചേട്ടൻ (1) , കഴിച്ചില്ലേ (1)
That is, we can have sentences like “ ഞാനും കഴിച്ചു ഭക്ഷണം ” or maybe “ഭക്ഷണം കഴിച്ചില്ലേ ”, or “ അവനും കഴിച്ചില്ലേ ” and so on. It’s like the Transitive Property of Equality but in a more complex manner. Here it's related to probability of occurrence of a given word after a word. Now this is calculated using the sample data that we provide as the database.
Now, you might be wondering what the numbers inside the parenthesis mean. Those are nothing but the number of occurrences of each word in the given complete set of sentences. This is calculated by the set of C libraries provided by a toolkit that I will introduce shortly.
Update — July 18
Okay!
Let’s start building. If you remember from my previous blog post/articles, you can recollect me writing about extracting words and then transcribing those to phonetic representation. Those words are nothing but the vocabulary that I just showed.
For building a language model of such a large scale vocabulary, you will need to use specialized tools or algorithms. One such set of algorithms are provided as C Libraries by the name “CMU-Cambridge Statistical Language Modeling Toolkit” or in short CMU-CLMTK. You can head over to their official page to know more about it. I have already installed it. So we are ready to go.
So according to the documentation,
The first step is to find out the number of occurrences. (text2wfreq)
cat ml.txt | text2wfreq > ml.wfreq
Next we need the .wfreq to .vocab file without the numbers and stuff. Just the words.
cat ml.wfreq | wfreq2vocab -top 20000 > ml.vocab
Oops, there are some issues with the generated vocab file regarding repetitions and additional words here and there which are not required. This might have happened while I was filtering the sentences file but forgot to update or skipped updating the transcription file. Some delay in further process. It's already late night! I need to sleep!
With this guy, its easy to compare everything and make changes simultaneously. It should be done by today!
.
.
Done!
Okay, now that the issue have been handled, we are getting somewhere. It should be pretty much straight forward now.
Next we need find list of every id n-gram which occurred in the text, along with its number of occurrences. i.e. Generate a binary id 3-gram of the training text ( ml.txt ), based on this vocabulary ( ml.vocab ).
By default, the id n-gram file is written out as binary file, unless the -write_ascii switch is used in the command.
-temp ./ switch can be used if youwant to run the command without root permission and use the current working directory as the temp folder. Or you can just run it as root, without any use, which by default will use /usr/tmp as temp folder.
Last Saturday, that is 16th July, I attendeda a meeting regarding the upcoming Kerala State IT Policy. It was a stakeholder consultation organized by DAKF, Software Freedom Law Centre and Ernakulam Public Library Infopark branch. The program was presided by Prasanth Sugathan of SFLC (I had met him during Swatanthra, when I helped Praveen in the Privacy track) and was inaugurated by M. P Sukumaran Nair, advisor to the Minister of Industries. The agenda of the meeting was to discuss about the suggestions that needs to be submitted to the Government before they draft the official IT policy, that will be in effect for the next few years. I attended the meeting representing Swathanthra Malayalam Computing. Even though the meeting had a small audience, some of the key topics were brought into the mix.
Professor Jyothi John, retired principal of Model Engg. College, discussed about MOOCs to improve the education standard of the State. He also talked about improving the industry-academia-research relationship that is in a pathetic state as of now. I was asked to talk a few words. But, since SMC hadn’t taken any official stand or points for the meeting, I actually talked about my views about the issue. Obviously, my topics were more focused on Language Computing, Digital empowerment of the language and as well as how FOSS should be the key stone of the IT policy. I also mentioned about the E-Waste problem that Anivar had discussed the other day on the Whatsapp group.
Me Talking | PC: Sivahari
Mr. Joseph Thomas, the president of FSMI also talked on the importance of FOSS in IT policy (Kiran Thomas had some pretty strong disagreements with it. :D ). Following that, Babu Dominic from BSNL talked about their success stories with FOSS and how the project was scraped by government. There were some brilliant insights from Satheesh, who is a Social Entrepreneur now and once ran an IT-based company.
Following that, the meeting took the form of a round table discussion where interesting points regarding E-Waste and the money-saving nature of FOSS (Microsoft has been targetting Institutions for pirated copies, not home users) were raised by Mr. Bijumon, Asst Professor of Model Engg College. Mr. Jayasreekumar, who is a journalist talked about the important issue of the downtrodden people, or the people in the lower socio-economic belt were not part of the discussion and the image of digital divide that carves. We have to seriously increase diversity of participants in these meetings, as a large part of the population has no representation in them. Such meetings will be only fruitful, if the sidelined communities who also should benefit from this policy are brought together to participate in them.
The general theme of the meeting was pointing towards how the IT policy should focus more on the internal market, and how it should be helpful in entrepreneurs in competing with foreign competitors, atleast in the domestic market.
News Coverage | PC: Deshabhimani
More and more meetings of this nature are a must, if the state is to advance in the domain of IT.
Hi,
About two months passed. We do many testing on online braille-input tool. And some widgets rearranged for user comforts. In the recent weeks we made a good progress in both Firefox and Chrome browser addons. But still we suffer from a grate problem with these addons, The plugins are not working in google chat and Facebook chat entry's. We are seeking the solution...
Last two weeks were seeing less coding and more polishing. I was fixing the LibIndic modules to utilize the concept of namespace packages (PEP 420) to obtain the libindic.module structure. In the stemmer module, I introduced the namespace package concept and it worked well. I also made the inflector a part of stemmer itself. Since inflector's functionality was heavily dependent on the output of the stemmer, it made more sense to make inflector a part of stemmer itself, rather than an individual package. Also, I made the inflector language-agnostic so that it will accept a language parameters as input during initialization and select the appropriate rules file.
In spellchecker also, I implemented the namespace concept and removed the bundled packages of stemmer and inflector. Other modifications were needed to make the tests run with this namespace concept, fixing coverage to pick the change etc. In the coding side, I added weights to the three metrics so as to generate suggestions more efficiently. I am thinking about formulating an algorithm to make comparison of suggestions and root words more efficient. Also, I may try handling spelling mistakes in the suffixes.
This week, I met with Hrishi and discussed about the project. He is yet to go through the algorithm and comment on that. However he made a suggestion to split out the languages to each file and make init.py more clean (just importing these split language files). He was ok with the work so far, as he tried out the web version of the stemmer.
[caption id="attachment_852" align="aligncenter" width="800"] Hrishi testing out the spellchecker[/caption]
“ In open source, we feel strongly that to really do something well, you have to get a lot of people involved. — Linus Torvalds ”
I have always loved the idea of Open Source and have been fortunate enough to be participating in one of the world’s best platform for a student to develop, grow, and learn. Google Summer of Code 2016 have gone past it’s mid-term evaluation, and so have I. The last couple of weeks have been in a slow pace compared to the weeks in June.
contribution graph — May-June-July
This is simply because, I was ahead of my schedule while in the Mid-term evaluation period , also I didn’t want to rush things up and screw it up. But, I thought this is the right time to mention the contributions that have been taking place towards this Open Source Project.
Gathering recordings or speech data for training would mean that a lot of people have to individually record their part, and then send it to me. Now this might seem simple enough to some of you out there, but believe me, recording 250 lines or sentences in Malayalam with all its care is not going to be that interesting.
Nonetheless, pull requests have been piling up on my Repository since the early days of the project. The contribution has been really awesome.
What more can you ask for when your little brother who have absolutely no idea about what the heck am doing, but decides to record 250 sentences in his voice so that I could be successful in completing the project! (** aww… you little prankster… **)
And he did all this without making much of a mistake or even complaining about the steps I instructed him to follow. He was so careful that he decided to save after I confirm each and every sentence as he records them. (** giggles **). For those who are interested in knowing what he contributed, take a look at this commit and this. Oh and by the way, he is just 11 years old :) .
To not mention other friends along with this, would be unfair.
So here is a big shout out to all 18 other guys and gals without whom this would not have reached this far.
I know this blog post was not much about the project when looking in one aspect but, when you look it in another point of view, this is one of the most important part of my GSoC work.
With the final evaluation, coming up in 4.5 weeks or so, it is time to start wrapping up my work and put up a final submission in a manner that someone with same enthusiasm or even better can take up this and continue to work on it to better it.
I guess that’s it for this week’s update. More to follow as I near the completion of this awesome experience.
The week started with continuing the task for detection of supported locales. I was facing some problems initially. I was trying to first change the contents of a static file during runtime which I later realised couldn’t be done. So as directed by mentor I changed the approach and decided to prompt the user at the setup time about which languages might not be supported by the phone.
It looks something like this:
Unfortunately my system crashed and the later part of my time was given to formatting the laptop,taking backup, installing the OS and re-setup of the project. Then I went home for my parents wedding anniversary for 3 days.
My next task : Improving the setup wizard . Since the user might not be interested in all the languages , so instead of showing all the layouts at once , we are planning to first ask the user to chose the language and then the corresponding layout in it. I have to discuss more with Jishnu regarding this task.
With this two weeks we have done many testing with users and done many additions according to their needs. The first one is Key reassigning. as you know there are many keyboard variants also user like to set there own keys instead of using f,d,s,j,k and l. But this make the necessity of saving user preferences. So we done this using jstorage. it's working fine https://github.com/anwar3746/braille-input/commit/9e8bb0b5ef9a54d61dfa5081d0966ec9d10f01a0
Last two weeks were spent mostly in getting basic spellchecker module to work. In the first week, I tried to polish the stemmer module by organizing tags for different inflections in an unambiguous way. These tags were to be used in spellchecker module to recreate the inflected forms of the suggestions. For this purpose, an inflector module was added. It takes the output of stemmer module and reverses its operations. Apart from that, I spent time in testing out the stemmer module and made many tiny modifications like converting everything to a sinlge encoding, using Unicode always, and above all changed the library name to an unambiguous one - libindic-stemmer (The old name was stemmer which was way too general).
In the second week, I forked out the spellchecker module, convert the directory structure to match the one I've been using for other modules and added basic building-testing-integration setup with pbr-testtools-Travis combination. Also, I implemented the basic spell checking and suggestion generation system. Like stemmer, marisa_trie was used to store the corpus. Three metrics were used to generate suggestions - Soundex similarity, Levenshtein Distance and Jaccard's Index. With that, I got my MVP (Minimum Viable Product) up and running.
So, as of now, spell checking and suggestion generation works. But, it needs more tweaking to increase efficiency. Also, I need to formulate another comparison algorithm, one tailored for Indic languages and spell checking.
On a side note, I also touched indicngram module, ported it to support Python3 and reformatted it to match the proposed directory that I have been using for other modules. A PR has been created and am waiting for someone to accept it.
I am given the task to detect whether a language is supported by the keyboard or not. In my phone Punjabi is not supported so I did all the testing with that. Whenever a language is not supported it is displayed as blank so that gave me an idea on how I will work on this issue. So I created the bitmap for all the characters of the language and compared it with an empty bitmap. So If the language was not supported it had empty bitmap and I declared it as not supported.
I have to improve on : Currently it is checking every time when the keyboard is opening. So I will do it such that it checks for all languages during the setup wizard and stores the info.
My task for next week is checking in setup wizard for all languages and in the list displaying the languages which cannot be supported as not supportable so that the user can know.
Hi,
The first month is over, the webpage is almost finished we gone through many bugs in the last week. Sathyaseelan mash and Balaram G really helped us to find out the bugs. One of the crucial and not easy to detect was the bug with map initialization. We take lot of time to find and fix it. Another one was with the insertion of text at the middle. following are the names of other commits
CapsLock(G) and Beginning-Middle switch(Alt)
Simple mode checkbox
Word and letter deletion enabled
Abbreviation enabled
I started with making the designs of the layouts. The task was to make Santali Olchiki and Soni layouts for the keyboard. I looked at the code of the other layouts to get a basic understanding of how they were working.
Soni Layout
It took some time to understand how the transliteration codes were working.I did changes in the ime submodule for the layout. I messed up with the locale names and fixed that later. The changes were merged! Then I updated the submodule on the Indic keyboard branch .
Santali Olchiki Layout
Previously I made the inscript layout of Santali Olchiki but after discussion with the mentor, it was decided to work on the phonetic layout as it can fit in smaller layout and thus easier to type too. I made the design of the keyboard and wrote the code for it and tested on the device. It is coming out fine too.
After that I explored various keyboard apps to see their setup wizards.
My task for the next week is to detect whether a language is supported by the system or not. I am planning to do it by checking if a character is typed it gives empty results or not. I will look for other ways too. I will update about the progress in the next blog.
First of all, apologies for skipping the last week’s post. Last two weeks were somewhat a bit rocky :P
Discussion with my mentor suggested me to start of with the varnam_initmethod. Following with the initial trial build sent in a queue of issues to be resolved. Following are the errors in order
Finally after a few resolution of machine dependency and reinstallation I got it finally running :) I am now finishing varnam_initand will move on to the whole libvarnam in the coming week.
As three weeks passed, After developing basic Chrome and Firefox extensions we moved to development of webpage braille-input where one can type in six key way. For achieving this we have gone through a lot of things such as Ajax, jQuery, JSON, Apache web server, etc.. The most major referred links are given at the end of this post. even my mentor also new to web based developments he always suggest me to keep it more ideal as possible. Even the concept of Map switching bitween the begining, middle and contraction list was bit difficult to understand later I realize that's the way it should be. Finaly when we requesting for a space to host the web page one of another mentor from my organization Akshay S Dinesh gave us a hint about facility in Github itself to host. So we done it with a simple effort even we faced jQery download problem and Contraction file listing.
Now we have to implement Abbreviations, Simple-Mode, Open, New , Save, Option to change font, font size, Background and Foreground Color etc.. as done in Sharada-Braille-Writer.
Hi All,
,
Yes the community bonding period is over and coding period started. Me and my mentor really happy to announce that with this community bonding period we just made the basic chrome and firefox extensions that can show how it's going to be!! Once again thanks to my mentor and varnam project. The code is hosted on github with the name braille-browser-addons.
To test it in firefox do the following steps
1 - git clone https://github.com/anwar3746/braille-browser-addons.git
2 - cd braille-browser-addons/firefox/
3 - jpm run -b /usr/bin/firefox
4 - Go to google.com and right click on text entry,from the context menu select Enable from braille sub menu.
5 - Try typing l(fds), k(fs), a(f), b(fd), c(fj)
To test it in chrome
1 - git clone https://github.com/anwar3746/braille-browser-addons.git
2 - Open chrome browser
3 - Go to settings and select extensions
4 - Check Developer mode
5 - Click Load unpacked extensions
6 - Choose chrome folder from braille-browser-addons
7 - Go to google.com and right click on text entry,from the context menu select Enable from braille sub menu.
8 - Try typing l(fds), k(fs), a(f), b(fd), c(fj)
This week started with the successful gradle build of the project on my system.The build was successful with version of gradle : 2.13, SDK version : 22 , build tools version : 22.0.1 . After that I deployed the app on the emulator and on my phone. I am currently working on making Santali Olchiki and Soni layouts for the keyboard.
A week into GSoC has come to an end and it’s been a wonderful learning experience. My goal for the next couple of weeks would be to compile libvarnam in Android. So I decided to play around with the working and the flow of NDK to use native code in the Java program.
Android NDK code flow
The above image roughly explains the workflow of the NDK in Android. This tutorial explains very well as to how one can get started with NDK. The NDK knowledge would come very handy in the fututre for the progress on this project. Another excellent resource is the set of videos by Aleksander Gargenta which can be found here,
I did follow the entire playlist, and found it extremely useful. He explains each and every detail of the process and I would highly suggest it for people looking to get started with NDK. So the future plan is to implement a very basic application which calls the libvarnam module inside the Java application and then hook the skeleton of the program to the Indic Keyboard app.
//Java System.out.println("That's all for now, see you next time!");