How to get character encoding...

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

How to get character encoding...

Micah Carrick
Is there a routine I can use to determine the character encoding of a
text file so I can then convert it to UTF-8 for display in a gtkTextView?

Micah
_______________________________________________
gtk-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-list
Reply | Threaded
Open this post in threaded view
|

Re: How to get character encoding...

David Nečas (Yeti)-2
On Sun, May 29, 2005 at 01:56:26PM -0400, Micah Carrick wrote:
> Is there a routine I can use to determine the character encoding of a
> text file so I can then convert it to UTF-8 for display in a gtkTextView?

Generally, no.  For a short text in arbitrary language
and arbitrary encoding even humans may not be able to
determine it.

It's quite easy to tell apart legacy 8bit encoding and
unicode variants UTF-8, UTF-16, UCS-4.  Quite a few programs
can do it (e.g., file) although there's no such routine in
GLib AFAIK.

But if you need to recognize legacy 8bit encodings, you are
in trouble (I've written a program Enca, that does it for
some East-European languages, but that's probably of little
help here; various detection routines for Asian languages
can be found on the web too; and methods to determine both
language and encoding exist too, but they need fairly
long/typical text).  If it's reasonable to assume the text
is related to current locale somehow, you can simply try
nl_langinfo(CODESET) from non-Unicode version of that
locale.  Or something like that, depending on the situation.

In all cases, if the file is user-supplied allow user to
choose the encoding.

Yeti


--
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
_______________________________________________
gtk-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-list