Archive for April, 2007

Khmer New Year, 2007

Thursday, April 12th, 2007

These days, all Cambodians all over the world are preparing themselfs for a three day new year holiday. It is a very important holiday for Cambodian’s. It starts this Saterday and lasts three days. Phnom Penh will become an almost deserted city. The area around Psar Thmei and the new Soriya shopping mall will be down during these three days.

As is the new year’s tradition, everyone will be driving up and down the riverfront road, dousing each other with water.

We here in Europe are heading to Paris (next time Phnom Penh?). Located near the ‘Bois de Vincennes’, there is a Bouddha temple. Cambodians gather together for praying, making music, eating, talking,… And the weather forecasts are promising for this WE.

Happy New Year!


សួស្តី​ឆ្នាំ​ថ្មី

Posting in Khmer, Part 3

Thursday, April 12th, 2007

Why are there some many different webbrowsers? Why are there different versions of the same webbrowser around? They call it ‘progress’!

As mentioned previously, search engines work with words. Khmer does not use word separators. Therefore, when you write in khmer, you add ZWSP’s (zero width space) in your text. They are there, but are not visible (zero width). This character has the Unicode value 200B.

But it seams that not all browsers, in casu my older version of Internet Explorer 6, do ‘know’ about this character. These browsers show this character as a ‘not so zero width space’!

I’m getting frustrated! Why do other sites, like Wikipedia, do not have the problem I am looking at? After several hours (hacking), I found that the difference was in the style sheet. I had to add the ‘Lucida Sans Unicode’ font to the font-family list. The ZWSP is now a real ZWSP.

Posting in Khmer, Part 2

Wednesday, April 11th, 2007

In my previous post, I wrote a song text down in Khmer. Before I could post it on this weblog, there were some technical issues to resolve like fonts and the like. Some issues were resolved with the help of some of you out there. By this, thank you all.

The last few days, I was struggeling with the following: a text written in Khmer appears as long sentences without word separators. In a western language, we put ‘whitespace’ characters between the words. ‘Whitespace’ characters is the technical term for blanks, tabs, newlines, etc… At first sight, written khmer does not. How would search engines like Google handle this? The basic elements a search engine works with are words (also called terms). Do they break these long sentences into words? Are there any rules one can use to develop some piece of software?

I found a Java program (khwrdbrk.jar) that can do the job, or at least tries to. If you give this program a Unicode file with a Unicode encoded text in khmer, you get an output file in Unicode. But when I opened the output file, the text looked exactly like the original text. The output file on the other hand was bigger than the original?! The reason why is that this program inserts indeed a word break character between each word. This word break caracter is called the ZWSP. It is not visible, or has a zero width! I tested the program on one of my texts. The result is not 100% correct. Some word boundaries were not found.

I was so proud of my previous post, but now I have to admit that it is NOT what it should be. Time for a second try!

The song text from the previous post is from a song sang by a boy. This song is based on an older song, sang by a girl and made popular by the singer Oeun Sreymom. Her song in its turn is the khmer version of an even older Khmer Surin song.

The song is about a girl that saved some money, sold three chickens for money (without telling her mother), just to buy herself a new shirt (shirt: អាវ) which she (and the boys) likes so much that she does not want to take it off (undress: ដោះ). It is this last word (breast: ដោះ) that made others to put new lyrics on the same music. And this introduces a classic word play because of the double meaning of “ឃើញ​ដោះ”. These two words can be interpreted as “wearing a shirt that none seen taking off” or “wearing a shirt that never shows her boobs”.

The text of the song, and now with the necessary ZWSP’s, goes as follows:

​អាវ​ថ្មី​មិន​ខ្ចី​ដោះ

មាន​អាវ​មួយ​សន្សំ​លុយ​យូរ​ខៃ
ស្រលាញ់​ម្លេះ​ទេ    អាវ​ថ្មី​ចេញ​ម៉ូត​ស្រស់
​ពាក់​អោយ​កេ​ដឹង    ថា​មិន​ដែល​ឃើញ​ដោះ
​ទៅ​នេះ​មក​នោះ   ពាក់​តែ​អាវ​មួយ​ហ្នឹង

​អាវ​ជិត​ក​ល្អ​ត្រូវ​ចិត្ត​ស្រី
​លក់​មេ​មាន់​បី មិន​អោយ​ម៉ែ​គត់​ដឹង
​ពាក់​ដើរ​រាល់​ថ្ងៃ ប្រុស​លួច​សម្លឹង
​ខ្លះ​ស្ទើរ​ភ្លឹក​ព្រលឹង សរសើរ​ស្រលាញ់​ខ្ញុំ

​ពាក់​អាវ​មិន​ដែល​ឃើញ​ដោះ ពាក់​អាវ​មិន​ដែល​ឃើញ​ដោះ
​គិត​អីឬ​វាយ៉ា​ង​ណា យ៉ាង​ម៉េច​ស្រីង៉ា ​បង​មិន​ដែល​ឃើញ​ដោះ
​អាវ​ថ្មី​ខ្ញុំ​មិន​ខ្ចី​ដោះ អាវ​ថ្មី​ខ្ញុំ​មិន​ខ្ចី​ដោះ
​កំលោះ​នាំ​គ្នា​ចោម​រោម បើ​សរសើរ​ខ្ញុំ ខ្ញុំ​រិត​តែ​លែង​ដោះ

​រូប​រាង​ស្រី​ទាំ​ង​សម្ដី​វាចា
​ឬ​កពា​ចរិយា អាវ​ថ្មី​ឆើត​ស្រស់
​មាន​អាវ​ថ្មី​មួយ ពាក់​មិន​ខ្ចី​ដោះ
​ប្រុសណា​ស្រណោះ ចូល​ដល់​យាយ​តា

​រូប​បង​ប្រុស​មិន​យល់​សោះ​ចិត្ត​ស្រី
​ស្រុក​សីវីល័យ ស្រី​តែង​ខ្លូន​សង្ហា
​ពាក់​មិន​ឃើញ​ដោះ អាវ​នោះ​យ៉ាង​ណា
​បើ​ប្រុស​សង្ហា ស្វែង​យល់​ខ្លូន​ឯង

​ពាក់​អាវ​មិន​ដែល​ឃើញ​ដោះ ពាក់​អាវ​មិន​ដែល​ឃើញ​ដោះ
​កើត​អី​ឬ​វាយ៉ាង​ណា យ៉ាង​ម៉េច​ស្រីង៉ា ​បង​មិន​ដែល​ឃើញ​ដោះ
​អាវ​ថ្មី​ខ្ញុំ​មិន​ខ្ចី​ដោះ អាវ​ថ្មី​ខ្ញុំ​មិន​ខ្ចី​ដោះ
​កំលោះ​នាំ​គ្នា​ចោម​រោម បើ​សរសើរ​ខ្ញុំ ខ្ញុំ​រិត​តែ​លែង​ដោះ

I checked it carefully: all ZWSP’s are there! I will keep an eye on Google to see if a search on one or more of the above words will hit this weblog.

I will correct the previous post asap.

My First Post in Khmer

Thursday, April 5th, 2007

And finaly, here is my first post in khmer.

It took me a long time to find out that there are currently two ways in use to publish a piece of text written in khmer.

The older system uses fonts which show a garbage looking text in nice khmer. This system is still in use on for example the Khosanthepeap website. Their page starts with a message saying that Firefox users should first download some fonts. Internet Explorer users on the other hand, no issues (héhé Microsoft). With an extra entry in the cascading style sheet, the same fonts are downloaded to the cache automagically.

The newer system uses Unicode. Gone are all these character encodings. One encoding for everyone, the dream of a programmer. This is how the khmer version of Wikipedia works.

I installed the Khmer Unicode Support, and…nothing happened. These small boxes were still on the screen. A quick visit to M|O|N|G|K|O|L to verify if the khmer text is visible, but no luck. A new test: to the computer of my daughter, installed the Khmer Unicode Support, and…bingo, Wikipedia ok, M|O|N|G|K|O|L ok.

So, what’s the difference between her and my computer? Internet Explorer 6 (hers) ok, Internet Explorer 7 (mine) not ok. Google help: with IE7 you have to specify which fonts to use for pages in the khmer language (héhé Microsoft).

And now a word about the text. It took me almost 2 hours to get it into a document. But don’t laugh, this was my first try and I am still learning. The text is from a khmer song I found in a karaoke movie on YouTube. In fact I found two versions: the first one a Khmer Surin version, the second a ‘normal’ khmer version. So, I sat down, watched it, paused it, typed it over, continued it, paused it again, typed over the next piece, copy/pasted some sentences, etc…

Cambodians like word games (at least that is what I experienced), and the song is such a thing. It’s all about the word ដោះ (dah or doh) which has two meanings…

ពាក់​អាវ​មិន​ដែល​ឃើញ​ដោះ

​មាន​អាវ​មួយ ពាក់​មិន​ដែល​ឃើញ​ដោះ មាន​មួយ​ប៉ុណ្ណោះ ឬ​យ៉ាង​ណា​ស្រី​ង៉ា
​ឃើញ​សព្វដង មិន​ថា​ង៉ៃ​ណា ឃើញ​កល់​ណា ពាក់​តា​អាវ​ដែល​ៗ

​អាវ​យឺត​ក្រហម ល្មម​ត្រូវ​ចិត្ត​ស្រី ថា​មើល​ស្រដី មិន​ដែល​ដោះ​ម៉ាក់​ម្តង
​ប្រមាណ​ង៉ៃ​ហើយ ក្រមុំ​ផន់​បង ដល់​ណា​ដោះ​ម៉ាក់​ម្តង ព្រោះ​តា​បង​ចង់​ឃើញ

​ពាក់​អាវ​មិន​ដែល​ឃើញ​ដោះ ពាក់​អាវ​មិន​ដែល​ឃើញ​ដោះ កើត​អុយ ឬ​មួយ​យ៉ាង​ណា
​រម៉េច​ស្រី​ង៉ា បង​មិន​គើយ​ឃើញ​ដោះ រម៉េច​ស្រី​ង៉ា បង​មិន​គើយ​ឃើញ​ដោះ

​មុខ​មាត់​ស្រី មើល​ទៅ​ស្នំស្នួន តែ​អាវ​ក្នុង​ខ្លួន ពាក់​មក​ច្រើន​ង៉ៃ
​អាវ​ដែល​ៗ ក្អែល​ចាប់​ហើយ​ស្រី ប្តូ​អាវ​ថ្មី បាន​ទេ​ស្រី​ង៉ា

​បង​សង្ស័យ មក​ច្រើន​ង៉ៃ​ហើយ មិន​ដែល​ដោះ​ទ្បើយ អូន​គិត​យ៉ាង​ណា
​បង​ចង់​ដិញ ទំ​និយ​ស្រី​ង៉ា ឃើញ​កល់​ណា ពាក់​មិន​ដែល​ឃើញ​ដោះ

​ពាក់​អាវ​មិន​ដែល​ឃើញ​ដោះ ពាក់​អាវ​មិន​ដែល​ឃើញ​ដោះ កើត​អុយ ឬ​មួយ​យ៉ាង​ណា
​រម៉េច​ស្រី​ង៉ា បង​មិន​គើយ​ឃើញ​ដោះ រម៉េច​ស្រី​ង៉ា បង​មិន​គើយ​ឃើញ​ដោះ

​បង​សង្ស័យ មក​ច្រើន​ង៉ៃ​ហើយ មិន​ដែល​ដោះ​ទ្បើយ អូន​គិត​យ៉ាង​ណា
​បង​ចង់​ដិញ ទំនិយ​ស្រី​ង៉ា ឃើញ​កល់​ណា ពាក់​មិន​ដែល​ឃើញ​ដោះ

​ពាក់​អាវ​មិន​ដែល​ឃើញ​ដោះ ពាក់​អាវ​មិន​ដែល​ឃើញ​ដោះ កើត​អុយ ឬ​មួយ​យ៉ាង​ណា
​រម៉េច​ស្រី​ង៉ា បង​មិន​គើយ​ឃើញ​ដោះ រម៉េច​ស្រី​ង៉ា បង​មិន​គើយ​ឃើញ​ដោះ

Sorry for those who can not read khmer. But don’t worry, stay tuned! One day, I will post a reading course…

(The translation will follow in a few days)