On Sun, May 29, 2005 at 01:56:26PM -0400, Micah Carrick wrote:
> Is there a routine I can use to determine the character encoding of a
> text file so I can then convert it to UTF-8 for display in a gtkTextView?
Generally, no. For a short text in arbitrary language
and arbitrary encoding even humans may not be able to
It's quite easy to tell apart legacy 8bit encoding and
unicode variants UTF-8, UTF-16, UCS-4. Quite a few programs
can do it (e.g., file) although there's no such routine in
But if you need to recognize legacy 8bit encodings, you are
in trouble (I've written a program Enca, that does it for
some East-European languages, but that's probably of little
help here; various detection routines for Asian languages
can be found on the web too; and methods to determine both
language and encoding exist too, but they need fairly
long/typical text). If it's reasonable to assume the text
is related to current locale somehow, you can simply try
nl_langinfo(CODESET) from non-Unicode version of that
locale. Or something like that, depending on the situation.
In all cases, if the file is user-supplied allow user to
choose the encoding.
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
Q: What is the most annoying thing on usenet and in e-mail?
gtk-list mailing list
[hidden email] http://mail.gnome.org/mailman/listinfo/gtk-list