Обсуждение: UTF-8 docs

Поиск
Список
Период
Сортировка

UTF-8 docs

От
Alexander Law
Дата:
Hello,
I've just seen a discussion about docs endoding in pgsql-hackers.

https://www.postgresql.org/message-id/20160822.141645.655870136709055853.t-ishii%40sraoss.co.jp
Can we continue the discussion in this mailing list?
We (at Postgres Pro) have developed the whole build chain (with support
for l10n) so we can just share it.

Best regards,
Alexander



Re: UTF-8 docs

От
Tatsuo Ishii
Дата:
From: Alexander Law <exclusion@gmail.com>
Subject: UTF-8 docs
Date: Mon, 22 Aug 2016 16:36:14 +0300
Message-ID: <7fbf2e80-9507-0521-d0e9-913ab81a58df@gmail.com>

> Hello,
> I've just seen a discussion about docs endoding in pgsql-hackers.
>
> https://www.postgresql.org/message-id/20160822.141645.655870136709055853.t-ishii%40sraoss.co.jp
> Can we continue the discussion in this mailing list?
> We (at Postgres Pro) have developed the whole build chain (with
> support for l10n) so we can just share it.

I have been just subscribed to the pgsql-docs list.
Here is the last conversation with Peter at pgsql-hackers.

> On 8/22/16 9:32 AM, Tatsuo Ishii wrote:
>> I don't know what kind of problem you are seeing with encoding
>> handling, but at least UTF-8 is working for Japanese, French and
>> Russian.
>
> Those translations are using DocBook XML.

But in the mean time I can create UTF-8 HTML files like this:

make html
[snip]
/bin/mkdir -p html
SP_CHARSET_FIXED=1 SP_ENCODING=UTF-8 openjade  -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D . -c
/usr/share/sgml/docbook/stylesheet/dsssl/modular/catalog-d stylesheet.dsl -t sgml -i output-html -i include-index
postgres.sgml

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


Re: UTF-8 docs

От
Jürgen Purtz
Дата:

In the previous mails we have seen some statements concerning the source format of postgres' documentation and other statements to formats which are derived from it. In the following I'm only speaking about the original format. Premised this, I want to second Victor Wagner, who wrote on pgsql-hackers:

> Really, what change we need, it is conversion from SGML to XML format.
> It would solve some real problems, such as ability to include diagrams
> in the docs, and also let everyone to explicitely specify encoding in
> XML declaration (and probably cause switch to UTF-8 as side effect,
> because most XML-based tools use UTF-8 as default).

The real fundamental step is the switch from SGML to XML. He consists not only in a change of the markup format (omittag, shorttag). We must also replace SGML tools for parsing, validating and generating diverse output formats like HTML or PDF with modern XML tools. And we need additional XSLT steps or modifications of the CSS files to replace the DSSSL scripts. This work is in progress.

After we got rid of all SGML related parts we can profit from the actual XML tools and standards, eg.:

- Docbook itself is moving from 4.x to 5.x on the basis of XML. (Actually I don't recommend this additional step because of some incompatibilities in the migration to 5.x, see: https://lists.oasis-open.org/archives/docbook/201606/msg00007.html )

- The common attribute "xml:lang" for translations

- Extensions like XInclude, SVG, MathML, ...

- ...





On 23.08.2016 00:51, Tatsuo Ishii wrote:
From: Alexander Law <exclusion@gmail.com>
Subject: UTF-8 docs
Date: Mon, 22 Aug 2016 16:36:14 +0300
Message-ID: <7fbf2e80-9507-0521-d0e9-913ab81a58df@gmail.com>

Hello,
I've just seen a discussion about docs endoding in pgsql-hackers.

https://www.postgresql.org/message-id/20160822.141645.655870136709055853.t-ishii%40sraoss.co.jp
Can we continue the discussion in this mailing list?
We (at Postgres Pro) have developed the whole build chain (with
support for l10n) so we can just share it.
I have been just subscribed to the pgsql-docs list.
Here is the last conversation with Peter at pgsql-hackers.

On 8/22/16 9:32 AM, Tatsuo Ishii wrote:
I don't know what kind of problem you are seeing with encoding
handling, but at least UTF-8 is working for Japanese, French and
Russian.
Those translations are using DocBook XML.
But in the mean time I can create UTF-8 HTML files like this:

make html
[snip]
/bin/mkdir -p html
SP_CHARSET_FIXED=1 SP_ENCODING=UTF-8 openjade  -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D . -c /usr/share/sgml/docbook/stylesheet/dsssl/modular/catalog -d stylesheet.dsl -t sgml -i output-html -i include-index postgres.sgml

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: UTF-8 docs

От
Tatsuo Ishii
Дата:
>> Really, what change we need, it is conversion from SGML to XML format.
>> It would solve some real problems, such as ability to include diagrams
>> in the docs,

This argument sounds weak to me. Last time when I proposed to include
diagrams in the docs in the pgsql-hackers list, some developers were
against the idea because if the diagram is binary, it's hard to
maintain in git. However up to now, there's no consensus that which
text base diagram source (which allows to generate real diagrams from
it) is good for our purpose. I don't see why just migrating to XML
solves the problem.

(the discussion on diagrams stopped in 2011, as far as I know)
https://www.postgresql.org/message-id/1307972167.2862.518.camel@core2

Don't get me wrong. I am not against migrating to XML. I just want to
say that let's not pretend that migrating to XML would solve all the
problems we have.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


Re: UTF-8 docs

От
Alexander Law
Дата:
Hello,
If I understand you right, your goal to change the docs encoding is to
allow for the docs translation.
But I don't think that the translation should be done that way (with
replacing sgml/xml/whatever).
Just as we don't translate server messages by copying all the source
files and replacing strings in them, we shouldn't translate the docs by
replacing sgml contents.
We at Postgres Pro using the gettext technologies for this. And we have
complete working toolchain (including modified Makefile) for translation
the docs.
(If someone is interested in, we can share our results and provide
everything needed to get started).
Those who translated sgmls/xml before are moving to gettext/po (The
FreeBSD Documentation Project is most remarkable example) or similar
approaches.
So I think a broader view to the evolution of the PostgreSQL
documentation is needed.

Best regards,
Alexander

23.08.2016 17:43, Tatsuo Ishii пишет:
>>> Really, what change we need, it is conversion from SGML to XML format.
>>> It would solve some real problems, such as ability to include diagrams
>>> in the docs,
> This argument sounds weak to me. Last time when I proposed to include
> diagrams in the docs in the pgsql-hackers list, some developers were
> against the idea because if the diagram is binary, it's hard to
> maintain in git. However up to now, there's no consensus that which
> text base diagram source (which allows to generate real diagrams from
> it) is good for our purpose. I don't see why just migrating to XML
> solves the problem.
>
> (the discussion on diagrams stopped in 2011, as far as I know)
> https://www.postgresql.org/message-id/1307972167.2862.518.camel@core2
>
> Don't get me wrong. I am not against migrating to XML. I just want to
> say that let's not pretend that migrating to XML would solve all the
> problems we have.
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
>



Re: UTF-8 docs

От
Jürgen Purtz
Дата:
Arguments pro and contra diagrams are not the central focus of SGML to
XML conversion, nevertheless: "Diagrams" didn't mean any binary format -
only SVN or any other text-format is acceptable. And: if the SVN source
is generated by any program like Inkscape it tends to get unreadable.
But if we develop a SVN-library with our own predefined graphical
elements, the SVN source gets very clear. The discussion of 2011
mentioned below was continued in 2016:
https://www.postgresql.org/message-id/5690218B.9060103%40purtz.de.

Regards, Jürgen Purtz


On 23.08.2016 16:43, Tatsuo Ishii wrote:
>>> Really, what change we need, it is conversion from SGML to XML format.
>>> It would solve some real problems, such as ability to include diagrams
>>> in the docs,
> This argument sounds weak to me. Last time when I proposed to include
> diagrams in the docs in the pgsql-hackers list, some developers were
> against the idea because if the diagram is binary, it's hard to
> maintain in git. However up to now, there's no consensus that which
> text base diagram source (which allows to generate real diagrams from
> it) is good for our purpose. I don't see why just migrating to XML
> solves the problem.
>
> (the discussion on diagrams stopped in 2011, as far as I know)
> https://www.postgresql.org/message-id/1307972167.2862.518.camel@core2
>
> Don't get me wrong. I am not against migrating to XML. I just want to
> say that let's not pretend that migrating to XML would solve all the
> problems we have.
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>



Re: UTF-8 docs

От
Tatsuo Ishii
Дата:
> Arguments pro and contra diagrams are not the central focus of SGML to
> XML conversion, nevertheless: "Diagrams" didn't mean any binary format
> - only SVN or any other text-format is acceptable. And: if the SVN
> source is generated by any program like Inkscape it tends to get
> unreadable. But if we develop a SVN-library with our own predefined
> graphical elements, the SVN source gets very clear. The discussion of
> 2011 mentioned below was continued in 2016:
> https://www.postgresql.org/message-id/5690218B.9060103%40purtz.de.

Oh I didn't know that. Thank you for pointing it out. I hope this work
will be included in 10.0.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


Re: UTF-8 docs

От
Tatsuo Ishii
Дата:
> Hello,
> If I understand you right, your goal to change the docs encoding is to
> allow for the docs translation.
> But I don't think that the translation should be done that way (with
> replacing sgml/xml/whatever).
> Just as we don't translate server messages by copying all the source
> files and replacing strings in them, we shouldn't translate the docs
> by replacing sgml contents.
> We at Postgres Pro using the gettext technologies for this. And we
> have complete working toolchain (including modified Makefile) for
> translation the docs.

Sounds great but I am not sure if the technique can be applied to any
language including Japanese.

> (If someone is interested in, we can share our results and provide
> everything needed to get started).
> Those who translated sgmls/xml before are moving to gettext/po (The
> FreeBSD Documentation Project is most remarkable example) or similar
> approaches.

As far as I know, FreeBSD's Japanese document project does not use
the gettext technologies.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


Re: UTF-8 docs

От
Ioseph Kim
Дата:

Hi.

I greet this discussion. Korean PgDoc encoding is UTF-8 too.

korean SGML encoding is utf8 since 10 years.


currently, PgDoc is very inefficient for L10N work.

but, we have no  clean idea  for this workaround.


I think, English main document's encoding be continued currently.

Encoding of each other language documents, I wish to entrust translators.


Regards, Ioseph.

2016년 08월 23일 07:51에 Tatsuo Ishii 이(가) 쓴 글:
From: Alexander Law <exclusion@gmail.com>
Subject: UTF-8 docs
Date: Mon, 22 Aug 2016 16:36:14 +0300
Message-ID: <7fbf2e80-9507-0521-d0e9-913ab81a58df@gmail.com>

Hello,
I've just seen a discussion about docs endoding in pgsql-hackers.

https://www.postgresql.org/message-id/20160822.141645.655870136709055853.t-ishii%40sraoss.co.jp
Can we continue the discussion in this mailing list?
We (at Postgres Pro) have developed the whole build chain (with
support for l10n) so we can just share it.
I have been just subscribed to the pgsql-docs list.
Here is the last conversation with Peter at pgsql-hackers.

On 8/22/16 9:32 AM, Tatsuo Ishii wrote:
I don't know what kind of problem you are seeing with encoding
handling, but at least UTF-8 is working for Japanese, French and
Russian.
Those translations are using DocBook XML.
But in the mean time I can create UTF-8 HTML files like this:

make html
[snip]
/bin/mkdir -p html
SP_CHARSET_FIXED=1 SP_ENCODING=UTF-8 openjade  -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D . -c /usr/share/sgml/docbook/stylesheet/dsssl/modular/catalog -d stylesheet.dsl -t sgml -i output-html -i include-index postgres.sgml

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: UTF-8 docs

От
Alexander Law
Дата:
Hello,
24.08.2016 02:19, Tatsuo Ishii пишет:
>> We at Postgres Pro using the gettext technologies for this. And we
>> have complete working toolchain (including modified Makefile) for
>> translation the docs.
> Sounds great but I am not sure if the technique can be applied to any
> language including Japanese.
I don't think that gettext is Russian-focused. At
https://babel.postgresql.org/ I see a number of languages, including
Japanese.
Please look at the attached .pot for one of the doc files. You can use
any translation software (which supports .po (e.g. poedit, Lokalize,
...)) to translate it to any language.
And you can look at all the docs converted to .po for translation:
http://oc.postgrespro.ru/index.php/s/puEJKoUwbZ3dia5/download

>
>> (If someone is interested in, we can share our results and provide
>> everything needed to get started).
>> Those who translated sgmls/xml before are moving to gettext/po (The
>> FreeBSD Documentation Project is most remarkable example) or similar
>> approaches.
> As far as I know, FreeBSD's Japanese document project does not use
> the gettext technologies.
Unfortunately I can't read Japanese and don't understand how Japanese
FreeBSD team works (and why cant it use the gettext technologies) , but
the main FreeBSD Documentation team is moving to PO (see
https://www.bsdcan.org/2016/schedule/track/Hacking/680.en.html).

Best regards,
Alexander Lakhin
https://postgrespro.com/



Вложения

Re: UTF-8 docs

От
Tatsuo Ishii
Дата:
> Hello,
> 24.08.2016 02:19, Tatsuo Ishii пишет:
>>> We at Postgres Pro using the gettext technologies for this. And we
>>> have complete working toolchain (including modified Makefile) for
>>> translation the docs.
>> Sounds great but I am not sure if the technique can be applied to any
>> language including Japanese.
> I don't think that gettext is Russian-focused. At
> https://babel.postgresql.org/ I see a number of languages, including
> Japanese.

Yes, I know gettext can be used for Japanese messages translation in
PostgreSQL.  I just couldn't imagine how the technique can be expanded
to docs.

> Please look at the attached .pot for one of the doc files. You can use
> any translation software (which supports .po (e.g. poedit, Lokalize,
> ...)) to translate it to any language.
> And you can look at all the docs converted to .po for translation:
> http://oc.postgrespro.ru/index.php/s/puEJKoUwbZ3dia5/download

Ok, the idea is, each sentence in the doc are regarded as "very long
message". Interesting. Looks better than "replacing SGML contents with
translated contents" (which Japanese doc project is doing for now).

>>> (If someone is interested in, we can share our results and provide
>>> everything needed to get started).
>>> Those who translated sgmls/xml before are moving to gettext/po (The
>>> FreeBSD Documentation Project is most remarkable example) or similar
>>> approaches.
>> As far as I know, FreeBSD's Japanese document project does not use
>> the gettext technologies.
> Unfortunately I can't read Japanese and don't understand how Japanese
> FreeBSD team works (and why cant it use the gettext technologies) ,
> but the main FreeBSD Documentation team is moving to PO (see
> https://www.bsdcan.org/2016/schedule/track/Hacking/680.en.html).
>
> Best regards,
> Alexander Lakhin
> https://postgrespro.com/
>
>


Re: UTF-8 docs

От
Alexander Law
Дата:
24.08.2016 09:58, Tatsuo Ishii пишет:
> Please look at the attached .pot for one of the doc files. You can use
>> any translation software (which supports .po (e.g. poedit, Lokalize,
>> ...)) to translate it to any language.
>> And you can look at all the docs converted to .po for translation:
>> http://oc.postgrespro.ru/index.php/s/puEJKoUwbZ3dia5/download
> Ok, the idea is, each sentence in the doc are regarded as "very long
> message". Interesting. Looks better than "replacing SGML contents with
> translated contents" (which Japanese doc project is doing for now).
Yes, the DocBook format has the natural blocks such as <para>, <title>,
<indexterm>, ... which could be translated as the separate text
fragments. So it's the question of the most accurate mapping of those
blocks to .po fragments and back. (We use for it customized xml2po from
gnome-doc-utils but there are another programs too.) All the other text
processing could be done with the rich gettext toolset.

> Unfortunately I can't read Japanese and don't understand how Japanese
> FreeBSD team works (and why cant it use the gettext technologies) ,
> but the main FreeBSD Documentation team is moving to PO (see
> https://www.bsdcan.org/2016/schedule/track/Hacking/680.en.html).
Please look at the presentation, and the
http://wonkity.com/~wblock/translation/translation.pdf -- these
documents greatly explain all the challenges of the pre-existing
approach and the solutions.



Re: UTF-8 docs

От
Oleg Bartunov
Дата:
On Wed, Aug 24, 2016 at 2:04 AM, Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
>> Arguments pro and contra diagrams are not the central focus of SGML to
>> XML conversion, nevertheless: "Diagrams" didn't mean any binary format
>> - only SVN or any other text-format is acceptable. And: if the SVN
>> source is generated by any program like Inkscape it tends to get
>> unreadable. But if we develop a SVN-library with our own predefined
>> graphical elements, the SVN source gets very clear. The discussion of
>> 2011 mentioned below was continued in 2016:
>> https://www.postgresql.org/message-id/5690218B.9060103%40purtz.de.
>
> Oh I didn't know that. Thank you for pointing it out. I hope this work
> will be included in 10.0.

We discussed diagrams in docs at PGCon-2016. Heikki's attempt is here
https://wiki.postgresql.org/wiki/Figures_%26_Pics_in_Docs and Emre
made even better using markdeep https://casual-effects.com/markdeep/
I attached his version and it looks very interesting (open it in firefox).

Diagrams made in this way don't need any new tools to learn.

Oleg
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
>
> --
> Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-docs

Вложения