Found In Translation

Based on some feedback from IFComp, for the current version, the cover art was dezombified and a serif font was chosen.

This year, I learned a few things about implementing text games in languages other than English, or more specifically, in porting a single game across three languages. My IF Comp game, “En Garde” was originally written in French for the 2018 Francophone IF Competition. Subsequently, I translated it to English for IFComp and the game is now part of the 2018 Russian KRIL Competition thanks to translation by Valentin Kopeltsev. I would like to share some practical experience regarding this effort and some of the solutions that I found along the way.

Background

For background, the game was written in the most recent release of Inform 7 and the preview version of Vorple 3. I thought that this would be a good game to experiment with in terms of translation because of three major constraints: the amount and complexity of the text is limited (mostly because of my French language skills); it is a limited parser game, with essentially ten single word commands; and finally, the interface involves clicking on hyperlinks that issue commands rather than typing. These constraints reduced complexity and bypassed some technical issues related to porting the game between languages.

Minimized Grammar

For this project, I did not employ language-specific versions of Inform 7. While there is a Russian-localized version of Inform 6, to my knowledge Inform 7 has not been ported (and probably won’t be for structural reasons discussed below). There are, however, French versions of both Inform 6 and Inform 7. The French version of Inform 7 is implemented as a extension to Inform 7 release 6G60. In it, the inform programming language itself is rendered in French, and of course it generates all library responses in French as well; with minor tweaks, this version is compatible with the current release of Vorple.

To keep things simple for myself, I stuck with my familiar English version of Inform 7 for all three implementations (sources for the French, English, and Russian versions are on github). The game consists of doing simple things and some canned conversational text between turns. Since the conversation text is fixed, I only had to worry about responses generated by doing the ten things implemented as commands (go a direction, eat something, open a door, press something, etc.), plus any implicit action that Inform would generate, such as looking when entering a room.

Rather than implement true grammar for each language, I took some short cuts. A number of the commands needed to be defined as new actions since the default commands such as open, unlock, etc., require a direct object. Since I defined these actions from scratch, it was trivial to bake in customized responses. I also added a customized “you can’t go that way” message and a rule for implicitly taking items, i.e., “(first taking the …)”.

The only point where the parser had free rein was in describing the location, where it would list the items present, whether they are open or closed, and describe the containment relationships. For French, I found it helpful to add a male/female flag to objects so that they would get the appropriate definite or indefinite article. I also wrote a customized rule for listing nondescript items in the location. The rule made all sorts of assumptions for the sake of simplifying the coding, but since it only needed to work in the context of this specific game, I thought that was a reasonable way to go.

One comment I received in the French comp was that the way locations were described was authentically parser-like, in the old school tradition. The “You see a X, on which is a Y, containing a Z”. Of course, this was the case — that is exactly what was going on under the hood.

So far, so good. When I translated to English, I removed the lines related to grammar customization and fell back to the default library responses. Knowing that the Russian translation was coming, though, I further limited the parser’s role in generating output by creating some rules for “printing a locale paragraph about” items of interest in each location. I thought that improved the way the room descriptions scanned.

Russian is more of a challenge because the language is inflected for gender, number, and case. On the other hand, there are no articles to worry about. I addressed this by putting the print name of objects in the accusative case. That turned out to work for all the parser-generated messages encountered in this game, for example, the list of objects that you can see.

I had to be careful to avoid expressions that would require objects to be in a different case, like negated expressions where the direct object would be in the genitive case, containment expressions with items in or on other items in the prepositional case, or the need for the instrumental case after the word “with”, and so on. My work-around for these situation was almost entirely handled with rules for printing a location paragraph about the item. I had some of this in the back of my head as I was writing the English translation, knowing that the Russian version was coming next.

Character Encoding

Inform 7 has no issues with French diacritical marks, but there are some significant gotchas in going beyond the extended Latin character set, for example with Cyrillic characters. The issue is more complicated on the input side, so let me begin with output.

In terms of mechanics, there is no problem typing in Russian within the I7 IDE — at least that is true of the platform I’m using, a Mac Pro. Printed names can be given in Cyrillic characters without a problem. Similarly, you can mix Russian, Greek, Latin, etc., characters within a “say” phrase — but only up to a certain length. I forget the limit that I hit, but it was something like 270 characters, about two and a half lines of text for my set up. Any longer and the IDE spits out an error message during compilation:

The cause is obvious — during an early compilation step, every non-latin character is replaced by a unicode substitution of the form “[unicode 1077]”. So, what was a single character inflates to 14 characters (brackets included), which exceeds the amount of text permitted between quotation marks in Inform 7.

That in itself is not a game breaker. You can always break up text into several “say” phrases, but that would be unsightly and kind of inefficient. I settled on making a “tell” phrase to replace large “say” phrases.

The “tell” phrase operates on a list of text. In practice, I just drop the text I want to say between some curly braces and break it up into comma-separated bits of quoted text about two lines long. Kind of kludgy, but it does keep all the text together as a unit, for example:

I think it is also relatively efficient as invariant phrases can be stored as literal text. Where this doesn’t work great is in text with embedded [one of] options. In that case, it does make sense to use multiple say statements, like “[one of][phrase1][or][phrase2][or][phase3][at random].”

Beyond that, things get hairy. Cyrillic characters won’t work for grammar tokens, so you cannot say:

Sorry for the picture here rather than text. WordPress keeps murdering non-Latin characters. I think it’s the database settings.

The non-Latin characters themselves can’t be stored in dictionary words; the underlying data structures were not designed to accommodate unicode characters. The only way around this that I’ve seen is the I7 extension “Unicode Parser written by Andrew Plotkin, which performs dark magic at the I6 level, replacing bits and pieces of code with unicode-friendly versions. The most recent version I see in the public library is version 7, which was built for Inform 7 release 6K92. It probably works with the current release, 6M62, but hasn’t specifically been tested against it. There are some caveats and limitations mentioned in the documentation for this extension, but it looks like it was a ton of work to write, so hopefully work will continue on this extension.

However, reading the characters is not the whole story, additional processing would be required to match the entered text against a token. For verbs, commands are traditionally entered in the infinitive (except when the imperative is used to give instructions to another actor), so that is not a huge problem. However, nouns would be declined according to their function in the input phrase. Those input nouns would then need to match up against the corresponding grammar token. Perhaps some sort of happy medium could be struck using regular expressions to match noun stems, trading off grammatical precision versus ease of implementation.

For my game, I got around all this by using English behind the scenes. All of the commands are issued by clicking hyperlinks, but the command that is issued is divorced from the label that appears on the hyperlink. To provide the appearance that commands were being issued in Russian, I echoed a prompt and the corresponding Russian command to the screen before processing each command.

A General Vorple-based Solution To Unicode Text Entry

I have found a way to allow entry of (I think) any language into an Inform 7 game without using the Unicode Parser extension. My solution is to leverage Vorple such that the parser only sees transliterated characters that fall within the Latin character set.

Running in the Vorple interpreter as part of a web page, the command prompt is a text input box and the text entered is piped over to the Inform parser each time the input line form is submitted. Between hitting “return” and Inform seeing the text, there is an opportunity to transform it.

I poked through the underlying javaScript, found that spot, and spliced in some of my own code. I’m a doctor, not a javaScript programmer, dammit, but it works. The way I did it is decidedly not pretty — the right way would have been to rebuild all of Vorple with this additional bit of code, but I wanted a quick and dirty proof of principle.

The code is on my github repository. The demo game can be played (don’t get too excited — the game consists of taking items and putting them in a box) online.

While this is a humble beginning and the game isn’t much fun to play, this solution could facilitate projects in languages not previously attempted on the Inform 7 platform. While my implementation is hacky, it wouldn’t be too hard to incorporate a feature into some subsequent release of Vorple where a user could optionally provide a text file with of transliteration mapping. That would avoid any monkeying with the internals of Vorple, but allow pretty much any language to be used along with Inform 7.

Punctuation

One of the things that required more attention that I would have suspected was punctuation. Here is the same passage in French, English, and Russian:

Starting with the French, a subtle but important difference relative to English is that there is space around some punctuation: after an opening quotation mark and before a closing one, and to each side of question marks, colons, semicolons, and percentage signs, but not commas and periods. Not only are there spaces where I wouldn’t ordinarily put them, but they are non-breaking spaces (espaces insécables) to assure that a question mark does not get stranded after a line wrap. I used [unicode 160] for this purpose, although it’s not entirely correct as not only should the space be non-breaking, but narrower than a typical inter-word space. Believe me, it is very easy to dive down a rat hole in researching all the different kinds of space characters in unicode, but the limiting factor is really what code points are implemented on user systems. I didn’t want to take a chance that some exotic albeit more correct space character would show up as a question mark or black diamond, so I went with what seems to be a well-supported non-breaking space.

French uses guillemets « and » rather than quotation marks, and these characters are widely supported. The real issue is how to format quotations. French is very efficient when it comes to dialogue-heavy passages: a single guillemet opens the quotation and then each speaker’s dialogue is set off with a long dash. To my eye, [unicode 8212] looked nice for this purpose. Finally, another guillemet closes the dialogue block. This style lent itself to very concise code, and was much more pleasant to type than in English where every line begins and ends with [quotation mark]. For internal quotations, I used English-style double quotation marks.

Guillemets are traditional for quoted material in Russian, although they also use inverted double commas. For internal quotations, they often use German-style quotation marks, the initial one below the line, the closing one above. The placement of punctuation is just enough different from English to require some adjustment, particularly when there is a speaker attribution in mid-sentence or after dialogue, as seen in the second example, below:

Help

A final word on this project, if anyone has remained awake through the above tract on grammar and punctuation: I had a lot of help. The French IF community was very supportive and even before beta-testing, a few folks did proofing passes to help iron out some of the more major gaffes. When it came to Russian, Valentin was a huge help not only with the translation, but because he was working with the output of the English game, his suggestions made the English version much stronger as well, so he gets double-billing as beta-tester on the English version and translator on the Russian version.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.