Re: several messages about Netscape and charsets


From: The Radio Prague Staff of Highly Skilled Experts <barry@RADIO.CZ>
Subject: Re: several messages about Netscape and charsets
Date: Wed, 20 Dec 1995 16:10:47 +0100

Next Article (by Date): Re: Cestina ??? Stanislav Koci
Previous Article (by Date): Cestina, hacky, carky, donekonecna? "Vagoun, Mr Voyta"
Top of Thread: Re: several messages about Netscape and charsets guest
Articles sorted by: [Date] [Subject]


Another weekend, another followup...
 
On Fri, 15 Dec 1995, (ISO-8859-2) krem=BEsk=E1 HO=D8=C8ICE wrote:
 
> > >  (But wouldn't these tags confuse
> > > browsers, which use non-ISO-Latin2 fonts, like Netscape for MS-Window=
s?
>=20
> [...M]y opinion is that Netscape should recognize
> this encoding and be able to translate from this to CP1250
 
    I thought I would pass along part of the following message, which
was sent to the IETF (Internet Engineering Task Force) SMTP discussion
group recently, as it discusses HTTP charsets, even though the focus of
the message was slightly different...
 
 
    From: david_goldsmith@taligent.com (David Goldsmith)
    Newsgroups: info.ietf.smtp
    Subject: Re: Character set registration
    Date: 19 Dec 95 04:10:35 GMT
 
 
The following sections from the latest (version of September 5, 1995)
HTTP 1.0 spec seem to be relevant:
-----------------------------------------
[*snip*]
HTTP also redefines the default character set for text media in an entity
body. If a
textual media type defines a charset parameter with a registered default
value of
"US-ASCII", HTTP changes the default to be "ISO-8859-1". Since the
ISO-8859-1 [18]
character set is a superset of US-ASCII [17], this has no effect upon the
interpretation
of entity bodies which only contain octets within the US-ASCII set (0 -
127). The
presence of a charset parameter value in a Content-Type header field
overrides the
default.
 
It is recommended that the character set of an entity body be labelled as
the lowest
common denominator of the character codes used within a document, with the
exception that no label is preferred over the labels US-ASCII or
ISO-8859-1.
---------------------------
 
and (from 3.4):
--------------------
HTTP character sets are identified by case-insensitive tokens. The
complete set of
tokens are defined by the IANA Character Set registry [15]. However,
because that
registry does not define a single, consistent token for each character
set, we define here
the preferred names for those character sets most likely to be used with
HTTP entities.
These character sets include those registered by RFC 1521 [5] -- the
US-ASCII [17] and
ISO-8859 [18] character sets -- and other names specifically recommended
for use
within MIME charset parameters.
 
     charset =3D "US-ASCII"
             | "ISO-8859-1" | "ISO-8859-2" | "ISO-8859-3"
             | "ISO-8859-4" | "ISO-8859-5" | "ISO-8859-6"
             | "ISO-8859-7" | "ISO-8859-8" | "ISO-8859-9"
             | "ISO-2022-JP" | "ISO-2022-JP-2" | "ISO-2022-KR"
             | "UNICODE-1-1" | "UNICODE-1-1-UTF-7" | "UNICODE-1-1-UTF-8"
             | token
----------------------
 
In other words, HTTP specifically allows the use of multibyte character
sets which do not use the CRLF sequence, more specifically 16-bit Unicode
(unicode-1-1). It also recognizes that this differs from the behavior
specified by MIME.
 
 
David Goldsmith
Senior Scientist
Taligent, Inc.
10201 N. DeAnza Blvd.
Cupertino, CA 95014-2233
david_goldsmith@taligent.com
 
 
 
    What this means is that Slovak and Czech documents should not be
served with a charset tagging other than ISO-8859-2, and if no charset
tagging is provided, the browser should assume it's ISO-8859-1.
 
    This also means that effort should not be put into serving out HTTP
with a variety of different encodings, but instead this effort should
be focused on making the browser support ISO-8859-2 charsets even if the
display charset is something otherwise (such as Windows CP1250, DOS
CP852, or whatever a Mac would be using).  So the Netscape2.0 version
should be doing the mapping from the ISO 8859-2 charset to the Windows
or Mac display instead of expecting the document to be supplied with
the unregistered X-MAC-CE charset, or the CP1250 charset which, from
the documentation, it seems not to recognize.
 
    Now that I have made reference to this, can somebody point me to
some resources about Central European support for the Macintosh, such
as the font encoding, and sources for Mac fonts with ISO-8859-2
encoding, if such exist, for an application which does not perform
the translation to the internal display charset?  I would appreciate it.
 
 
Diky,
Barry Bouwsma
<barryb@tuke.sk>
vesel=E9 v=E1noce

Next Article (by Date): Re: Cestina ??? Stanislav Koci
Previous Article (by Date): Cestina, hacky, carky, donekonecna? "Vagoun, Mr Voyta"
Top of Thread: Re: several messages about Netscape and charsets guest
Articles sorted by: [Date] [Subject]


Go to listserv.cesnet.cz LWGate Home Page.