blog dds

2009.06.25

Greek Numerals in OpenOffice.org

OpenOffice.org doesn't support Greek numerals, and this is a problem for its Greek localization, because such numerals are often used for section and list numbering. As an exercise in large scale code reading and in the writing of code I'm supposed to teach to undergraduate students, I decided to contribute an implementation to OpenOffice.org.

My first step was to download and unpack the OpenOffice.org 3.1.0 source code. To add a feature in existing code the best way forward is to locate code that does something similar. I've found that often looking for related words is the most productive technique. I thus started searching for the word "roman", reasoning that this would be a rare word that would occur only in the code providing the Roman numeral implementation. This could then lead me to the place where I should add the Greek numeral support. I first looked through all the files, but this produced results of a huge size.

find . -type f | xargs grep -i roman
I then examined the files a bit, observed the code was mostly written in C++ (such was my ignorance of the source code at that point), and saw that the most common source code extension was cxx. I modified my search accordingly
find . -name '*.cxx' | xargs grep -i roman
and, sure enough, this search led me to the file defaultnumeringprovider.cxx. Although I was expecting that localization code would be separate from the default implementation, this was not the case. To my surprise this file also contained the localized code for various languages, so this was my target file.

My next step was to implement a Greek numeral conversion. After some fascinating reading in the Wikipedia Greek numerals entry, and Thomas Heath's excellent book A History of Greek Mathematics I understood Greek numerals. Counting up to 9999 was relatively easy, though I still needed to consult the Unicode Greek and Coptic documentation to find details of some more exotic characters and symbols: stigma, koppa, sampi, left and right keraia. Interestingly, larger numbers are counted using a base 10,000 system based on this quantity (myrias). Formatting these numbers in a single line proved to be tricky, because the exponent is often written above the base. Fortunately, Diophantus, perhaps anticipating the limitations of console terminals, devised a dot-based notation that can be written left to right, and this is the one I adopted.

Although I've written orders of magnitude more C++ code than Java, I decided to implement first the code in Java, because I thought that handling and testing Unicode I/O in C++ would be too complex. After I had a stable implementation I would port it to the OpenOffice.org Unicode-based C++ conventions. The file GreekNumeral.java contains my Java-based Greek numeral implementation, and on this page you can see a table of some generated conversions. If the code's porting and integration to OpenOffice.org proves sufficiently interesting, I will detail it in a next blog entry.

Read and post comments    AddThis Social Bookmark Button


Creative Commons License Last modified: Thursday, June 25, 2009 9:27 pm
Unless otherwise expressly stated, all original material on this page created by Diomidis Spinellis is licensed under a Creative Commons Attribution-Share Alike 3.0 Greece License.