Most modern operating systems have a notion of the
“current locale”–that is, the region or
country whose localization conventions are honored. These
conventions–typically chosen by some runtime
configuration mechanism on the computer–affect the way
in which programs present data to the user, as well as the way
in which they accept user input.
On most Unix-like systems, you can check the values of the
locale-related runtime configuration options by running the
locale command:
$ locale
LANG=
LC_COLLATE="C"
LC_CTYPE="C"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL="C"
The output is a list of locale-related environment
variables and their current values. In this example, the
variables are all set to the default C
locale, but users can set these variables to specific
country/language code combinations. For example, if one were
to set the LC_TIME variable to
fr_CA, then programs would know to present
time and date information formatted according a
French-speaking Canadian's expectations. And if one were to
set the LC_MESSAGES variable to
zh_TW, then programs would know to present
human-readable messages in Traditional Chinese. Setting the
LC_ALL variable has the effect of changing
every locale variable to the same value. The value of
LANG is used as a default value for any
locale variable that is unset. To see the list of available
locales on a Unix system, run the command locale
-a.
On Windows, locale configuration is done via the
“Regional and Language Options” control panel
item. There you can view and select the values of individual
settings from the available locales, and even customize (at a
sickening level of detail) several of the display formatting
conventions.
Subversion's use of locales
The Subversion client, svn, honors the
current locale configuration in two ways. First, it notices
the value of the LC_MESSAGES variable and
attempts to print all messages in the specified language. For
example:
$ export LC_MESSAGES=de_DE
$ svn help cat
cat: Gibt den Inhalt der angegebenen Dateien oder URLs aus.
Aufruf: cat ZIEL[@REV]…
…
This behavior works identically on both Unix and Windows
systems. Note, though, that while your operating system might
have support for a certain locale, the Subversion client still
may not be able to speak the particular language. In order to
produce localized messages, human volunteers must provide
translations for each language. The translations are written
using the GNU gettext package, which results in translation
modules that end with the .mo filename
extension. For example, the German translation file is named
de.mo. These translation files are
installed somewhere on your system. On Unix, they typically
live in /usr/share/locale/, while
on Windows they're often found in the
\share\locale\ folder in Subversion's
installation area. Once installed, a module is named after
the program it provides translations for. For example, the
de.mo file may ultimately end up
installed as
/usr/share/locale/de/LC_MESSAGES/subversion.mo.
By browsing the installed .mo files, you
can see which languages the Subversion client is able to
speak.
The second way in which the locale is honored involves how
svn interprets your input. The repository
stores all paths, filenames, and log messages in Unicode,
encoded as UTF-8. In that sense, the repository is
internationalized–that is, the
repository is ready to accept input in any human language.
This means, however, that the Subversion client is responsible
for sending only UTF-8 filenames and log messages into the
repository. In order to do this, it must convert the data
from the native locale into UTF-8.
For example, suppose you create a file named
caffè.txt, and then when committing the
file, you write the log message as “Adesso il caffè è
più forte”. Both the filename and log message contain
non-ASCII characters, but because your locale is set to
it_IT, the Subversion client knows to
interpret them as Italian. It uses an Italian character set
to convert the data to UTF-8 before sending them off to the
repository.
Note that while the repository demands UTF-8 filenames and
log messages, it does not pay attention
to file contents. Subversion treats file contents as opaque
strings of bytes, and neither client nor server makes an
attempt to understand the character set or encoding of the
contents.