Re: Wordia

From: Petr Lampa <lampa@FEE.VUTBR.CZ>
Subject: Re: Wordia
Date: Sun, 22 Oct 1995 19:54:40 +0100
Next Article (by Subject): Re: Wordia zdenek Hladik
Previous Article (by Subject): Re: Wordia Vlastimil Vavrina
Top of Thread: Wordia Stepan Kasal
Next in Thread: Re: Wordia zdenek Hladik
Articles sorted by: [Author] [Subject]
>
> > > i v lokalizovanem Wordu. Snad se podarilo vyvojovy team presvedcit o
> tom,
> > > ze u nas piseme HTML stranky bez diakritiky a take anglicky, kde ISO
> Latin 1
> > > staci.
> >
> > Tak nevim, jestli je tohle mineno vazne: kdo o tom ten team
> > presvedcoval ? Ja za sebe mam radsi cestinu s diakritikou, kdyz si
> > muzu vybrat :-). Ano, pisu "take anglicky", ale prave ze "take",
> > a ne "jen".
> >
> > Honza Vejvalka
> >
> Je to mineno vazne - cestina v HTML podporovana neni, pouze  ISO Latin 1.
> ISO Latin 2 bude az v unicodove verzi zatim neznamo kdy - snad brzo.
>
 
Pokud je to mysleno tak, ze nejsou v HTML v. 2.0, ci 3.0 definovany
kody pro znaky ISO 8859-2 (tj. &ecaron, &iacute, etc.), pak je to samozrejme
pravda. Jinak ale definice jazyka HTML omezuje pouziti libovolnych
osmibitovych kodu (tj. ne viceslabikovych) pouze takto:
 
   Character sets
       The charset parameter (as defined in section 7.1.1 of RFC 1521)
       may be used with the text/html content type to specify the
       encoding used to represent the HTML document as a sequence of
       bytes. Normally, text/* media types specify a default of
       US-ASCII for the charset parameter. However, for text/html, if
       the byte stream contains data that is not in the 7-bit US-ASCII
       set, the HTML interpreting agent should assume a default charset
       of ISO-8859-1.
 
       When an HTML document is encoded using US-ASCII, the mechanisms
       of numeric character references and character entity references
       may be used to encode additional characters from ISO-8859-1.
       Character entity references are needed for symbols such as math
       and greek characters from other unspecified character sets.
 
       Other values for the charset parameter are not defined in this
       specification, but may be specified in future versions of HTML.
       It is envisioned that HTML will use the charset parameter to
       allow support for non-Latin characters such as Arabic, Hebrew,
       Cyrillic and Japanese, rather than relying on any SGML mechanism
       for doing so.
 
Tato citace pouze rika, ze implicitni kod je ISO 8859-1 a jina hodnota
charset nez 8859-1 neni v HTML 2.0 (3.0) definovana, tj. klient ji nemusi
rozumet. Nezakazuje, ale aby hodnota charset byla jina a klient ji rozumel.
Specifikace protokolu HTTP v. 1.0 povoluje uvest jako charset tyto kody:
 
     charset = "US-ASCII"
             | "ISO-8859-1" | "ISO-8859-2" | "ISO-8859-3"
             | "ISO-8859-4" | "ISO-8859-5" | "ISO-8859-6"
             | "ISO-8859-7" | "ISO-8859-8" | "ISO-8859-9"
             | "ISO-2022-JP" | "ISO-2022-JP-2" | "ISO-2022-KR"
             | "UNICODE-1-1" | "UNICODE-1-1-UTF-7" | "UNICODE-1-1-UTF-8"
             | token
 
Pokud tedy server poskytne hlavicky
 
Content-Type: text/html; charset=ISO-8859-2
Content-Language: cz
 
nic nebrani tomu, aby text byl cesky a klient ho spravne zobrazil (ktery
lient to ale umi?). Neni tim samozrejme vyreseno vse, to resi az draft
HTML 2.1 (viz napr. ftp://pub/WWW/draft-ietf-html-i18n-01.txt). Protoze ale
plna implementace interpretace Unicode 1.1 na strane klienta a serveru
nejaky cas potrva, je rozumnejsi vyuzivat soucasnych moznosti. Zabyval
se nekdo u nas myslenkou prinutit nektereho klienta rozumet hlavicce
Charset? (pripadne prinutit ho generovat hlavicky Accept-Charset
a Accept-Language?).
 
                                                        Petr Lampa
 
--
Technical University of Brno                     E-mail: lampa@fee.vutbr.cz
Faculty of El. Engineering and Comp. Science     Phone: (+42 5) 7275/111,225
Department of Computer Science and Engineering   Fax:  (+42 5) 41211141
Bozetechova 2, 612 66 Brno, Czech Republic
Go to listserv.cesnet.cz LWGate Home Page.