Обсуждение: UTF-8 docs
Hello, I've just seen a discussion about docs endoding in pgsql-hackers. https://www.postgresql.org/message-id/20160822.141645.655870136709055853.t-ishii%40sraoss.co.jp Can we continue the discussion in this mailing list? We (at Postgres Pro) have developed the whole build chain (with support for l10n) so we can just share it. Best regards, Alexander
From: Alexander Law <exclusion@gmail.com> Subject: UTF-8 docs Date: Mon, 22 Aug 2016 16:36:14 +0300 Message-ID: <7fbf2e80-9507-0521-d0e9-913ab81a58df@gmail.com> > Hello, > I've just seen a discussion about docs endoding in pgsql-hackers. > > https://www.postgresql.org/message-id/20160822.141645.655870136709055853.t-ishii%40sraoss.co.jp > Can we continue the discussion in this mailing list? > We (at Postgres Pro) have developed the whole build chain (with > support for l10n) so we can just share it. I have been just subscribed to the pgsql-docs list. Here is the last conversation with Peter at pgsql-hackers. > On 8/22/16 9:32 AM, Tatsuo Ishii wrote: >> I don't know what kind of problem you are seeing with encoding >> handling, but at least UTF-8 is working for Japanese, French and >> Russian. > > Those translations are using DocBook XML. But in the mean time I can create UTF-8 HTML files like this: make html [snip] /bin/mkdir -p html SP_CHARSET_FIXED=1 SP_ENCODING=UTF-8 openjade -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D . -c /usr/share/sgml/docbook/stylesheet/dsssl/modular/catalog-d stylesheet.dsl -t sgml -i output-html -i include-index postgres.sgml Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
In the previous mails we have seen some statements concerning the source format of postgres' documentation and other statements to formats which are derived from it. In the following I'm only speaking about the original format. Premised this, I want to second Victor Wagner, who wrote on pgsql-hackers:
> Really, what change we need, it is conversion from SGML to XML format.
> It would solve some real problems, such as ability to include diagrams
> in the docs, and also let everyone to explicitely specify encoding in
> XML declaration (and probably cause switch to UTF-8 as side effect,
> because most XML-based tools use UTF-8 as default).
The real fundamental step is the switch from SGML to XML. He consists not only in a change of the markup format (omittag, shorttag). We must also replace SGML tools for parsing, validating and generating diverse output formats like HTML or PDF with modern XML tools. And we need additional XSLT steps or modifications of the CSS files to replace the DSSSL scripts. This work is in progress.
After we got rid of all SGML related parts we can profit from the actual XML tools and standards, eg.:
- Docbook itself is moving from 4.x to 5.x on the basis of XML. (Actually I don't recommend this additional step because of some incompatibilities in the migration to 5.x, see: https://lists.oasis-open.org/archives/docbook/201606/msg00007.html )
- The common attribute "xml:lang" for translations
- Extensions like XInclude, SVG, MathML, ...
- ...
From: Alexander Law <exclusion@gmail.com> Subject: UTF-8 docs Date: Mon, 22 Aug 2016 16:36:14 +0300 Message-ID: <7fbf2e80-9507-0521-d0e9-913ab81a58df@gmail.com>Hello, I've just seen a discussion about docs endoding in pgsql-hackers. https://www.postgresql.org/message-id/20160822.141645.655870136709055853.t-ishii%40sraoss.co.jp Can we continue the discussion in this mailing list? We (at Postgres Pro) have developed the whole build chain (with support for l10n) so we can just share it.I have been just subscribed to the pgsql-docs list. Here is the last conversation with Peter at pgsql-hackers.On 8/22/16 9:32 AM, Tatsuo Ishii wrote:I don't know what kind of problem you are seeing with encoding handling, but at least UTF-8 is working for Japanese, French and Russian.Those translations are using DocBook XML.But in the mean time I can create UTF-8 HTML files like this: make html [snip] /bin/mkdir -p html SP_CHARSET_FIXED=1 SP_ENCODING=UTF-8 openjade -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D . -c /usr/share/sgml/docbook/stylesheet/dsssl/modular/catalog -d stylesheet.dsl -t sgml -i output-html -i include-index postgres.sgml Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
>> Really, what change we need, it is conversion from SGML to XML format. >> It would solve some real problems, such as ability to include diagrams >> in the docs, This argument sounds weak to me. Last time when I proposed to include diagrams in the docs in the pgsql-hackers list, some developers were against the idea because if the diagram is binary, it's hard to maintain in git. However up to now, there's no consensus that which text base diagram source (which allows to generate real diagrams from it) is good for our purpose. I don't see why just migrating to XML solves the problem. (the discussion on diagrams stopped in 2011, as far as I know) https://www.postgresql.org/message-id/1307972167.2862.518.camel@core2 Don't get me wrong. I am not against migrating to XML. I just want to say that let's not pretend that migrating to XML would solve all the problems we have. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Hello, If I understand you right, your goal to change the docs encoding is to allow for the docs translation. But I don't think that the translation should be done that way (with replacing sgml/xml/whatever). Just as we don't translate server messages by copying all the source files and replacing strings in them, we shouldn't translate the docs by replacing sgml contents. We at Postgres Pro using the gettext technologies for this. And we have complete working toolchain (including modified Makefile) for translation the docs. (If someone is interested in, we can share our results and provide everything needed to get started). Those who translated sgmls/xml before are moving to gettext/po (The FreeBSD Documentation Project is most remarkable example) or similar approaches. So I think a broader view to the evolution of the PostgreSQL documentation is needed. Best regards, Alexander 23.08.2016 17:43, Tatsuo Ishii пишет: >>> Really, what change we need, it is conversion from SGML to XML format. >>> It would solve some real problems, such as ability to include diagrams >>> in the docs, > This argument sounds weak to me. Last time when I proposed to include > diagrams in the docs in the pgsql-hackers list, some developers were > against the idea because if the diagram is binary, it's hard to > maintain in git. However up to now, there's no consensus that which > text base diagram source (which allows to generate real diagrams from > it) is good for our purpose. I don't see why just migrating to XML > solves the problem. > > (the discussion on diagrams stopped in 2011, as far as I know) > https://www.postgresql.org/message-id/1307972167.2862.518.camel@core2 > > Don't get me wrong. I am not against migrating to XML. I just want to > say that let's not pretend that migrating to XML would solve all the > problems we have. > > Best regards, > -- > Tatsuo Ishii > SRA OSS, Inc. Japan > English: http://www.sraoss.co.jp/index_en.php > Japanese:http://www.sraoss.co.jp > >
Arguments pro and contra diagrams are not the central focus of SGML to XML conversion, nevertheless: "Diagrams" didn't mean any binary format - only SVN or any other text-format is acceptable. And: if the SVN source is generated by any program like Inkscape it tends to get unreadable. But if we develop a SVN-library with our own predefined graphical elements, the SVN source gets very clear. The discussion of 2011 mentioned below was continued in 2016: https://www.postgresql.org/message-id/5690218B.9060103%40purtz.de. Regards, Jürgen Purtz On 23.08.2016 16:43, Tatsuo Ishii wrote: >>> Really, what change we need, it is conversion from SGML to XML format. >>> It would solve some real problems, such as ability to include diagrams >>> in the docs, > This argument sounds weak to me. Last time when I proposed to include > diagrams in the docs in the pgsql-hackers list, some developers were > against the idea because if the diagram is binary, it's hard to > maintain in git. However up to now, there's no consensus that which > text base diagram source (which allows to generate real diagrams from > it) is good for our purpose. I don't see why just migrating to XML > solves the problem. > > (the discussion on diagrams stopped in 2011, as far as I know) > https://www.postgresql.org/message-id/1307972167.2862.518.camel@core2 > > Don't get me wrong. I am not against migrating to XML. I just want to > say that let's not pretend that migrating to XML would solve all the > problems we have. > > Best regards, > -- > Tatsuo Ishii > SRA OSS, Inc. Japan > English: http://www.sraoss.co.jp/index_en.php > Japanese:http://www.sraoss.co.jp >
> Arguments pro and contra diagrams are not the central focus of SGML to > XML conversion, nevertheless: "Diagrams" didn't mean any binary format > - only SVN or any other text-format is acceptable. And: if the SVN > source is generated by any program like Inkscape it tends to get > unreadable. But if we develop a SVN-library with our own predefined > graphical elements, the SVN source gets very clear. The discussion of > 2011 mentioned below was continued in 2016: > https://www.postgresql.org/message-id/5690218B.9060103%40purtz.de. Oh I didn't know that. Thank you for pointing it out. I hope this work will be included in 10.0. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
> Hello, > If I understand you right, your goal to change the docs encoding is to > allow for the docs translation. > But I don't think that the translation should be done that way (with > replacing sgml/xml/whatever). > Just as we don't translate server messages by copying all the source > files and replacing strings in them, we shouldn't translate the docs > by replacing sgml contents. > We at Postgres Pro using the gettext technologies for this. And we > have complete working toolchain (including modified Makefile) for > translation the docs. Sounds great but I am not sure if the technique can be applied to any language including Japanese. > (If someone is interested in, we can share our results and provide > everything needed to get started). > Those who translated sgmls/xml before are moving to gettext/po (The > FreeBSD Documentation Project is most remarkable example) or similar > approaches. As far as I know, FreeBSD's Japanese document project does not use the gettext technologies. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Hi.
I greet this discussion. Korean PgDoc encoding is UTF-8 too.
korean SGML encoding is utf8 since 10 years.
currently, PgDoc is very inefficient for L10N work.
but, we have no clean idea for this workaround.
I think, English main document's encoding be continued currently.
Encoding of each other language documents, I wish to entrust translators.
Regards, Ioseph.
From: Alexander Law <exclusion@gmail.com> Subject: UTF-8 docs Date: Mon, 22 Aug 2016 16:36:14 +0300 Message-ID: <7fbf2e80-9507-0521-d0e9-913ab81a58df@gmail.com>Hello, I've just seen a discussion about docs endoding in pgsql-hackers. https://www.postgresql.org/message-id/20160822.141645.655870136709055853.t-ishii%40sraoss.co.jp Can we continue the discussion in this mailing list? We (at Postgres Pro) have developed the whole build chain (with support for l10n) so we can just share it.I have been just subscribed to the pgsql-docs list. Here is the last conversation with Peter at pgsql-hackers.On 8/22/16 9:32 AM, Tatsuo Ishii wrote:I don't know what kind of problem you are seeing with encoding handling, but at least UTF-8 is working for Japanese, French and Russian.Those translations are using DocBook XML.But in the mean time I can create UTF-8 HTML files like this: make html [snip] /bin/mkdir -p html SP_CHARSET_FIXED=1 SP_ENCODING=UTF-8 openjade -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D . -c /usr/share/sgml/docbook/stylesheet/dsssl/modular/catalog -d stylesheet.dsl -t sgml -i output-html -i include-index postgres.sgml Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Hello, 24.08.2016 02:19, Tatsuo Ishii пишет: >> We at Postgres Pro using the gettext technologies for this. And we >> have complete working toolchain (including modified Makefile) for >> translation the docs. > Sounds great but I am not sure if the technique can be applied to any > language including Japanese. I don't think that gettext is Russian-focused. At https://babel.postgresql.org/ I see a number of languages, including Japanese. Please look at the attached .pot for one of the doc files. You can use any translation software (which supports .po (e.g. poedit, Lokalize, ...)) to translate it to any language. And you can look at all the docs converted to .po for translation: http://oc.postgrespro.ru/index.php/s/puEJKoUwbZ3dia5/download > >> (If someone is interested in, we can share our results and provide >> everything needed to get started). >> Those who translated sgmls/xml before are moving to gettext/po (The >> FreeBSD Documentation Project is most remarkable example) or similar >> approaches. > As far as I know, FreeBSD's Japanese document project does not use > the gettext technologies. Unfortunately I can't read Japanese and don't understand how Japanese FreeBSD team works (and why cant it use the gettext technologies) , but the main FreeBSD Documentation team is moving to PO (see https://www.bsdcan.org/2016/schedule/track/Hacking/680.en.html). Best regards, Alexander Lakhin https://postgrespro.com/
Вложения
> Hello, > 24.08.2016 02:19, Tatsuo Ishii пишет: >>> We at Postgres Pro using the gettext technologies for this. And we >>> have complete working toolchain (including modified Makefile) for >>> translation the docs. >> Sounds great but I am not sure if the technique can be applied to any >> language including Japanese. > I don't think that gettext is Russian-focused. At > https://babel.postgresql.org/ I see a number of languages, including > Japanese. Yes, I know gettext can be used for Japanese messages translation in PostgreSQL. I just couldn't imagine how the technique can be expanded to docs. > Please look at the attached .pot for one of the doc files. You can use > any translation software (which supports .po (e.g. poedit, Lokalize, > ...)) to translate it to any language. > And you can look at all the docs converted to .po for translation: > http://oc.postgrespro.ru/index.php/s/puEJKoUwbZ3dia5/download Ok, the idea is, each sentence in the doc are regarded as "very long message". Interesting. Looks better than "replacing SGML contents with translated contents" (which Japanese doc project is doing for now). >>> (If someone is interested in, we can share our results and provide >>> everything needed to get started). >>> Those who translated sgmls/xml before are moving to gettext/po (The >>> FreeBSD Documentation Project is most remarkable example) or similar >>> approaches. >> As far as I know, FreeBSD's Japanese document project does not use >> the gettext technologies. > Unfortunately I can't read Japanese and don't understand how Japanese > FreeBSD team works (and why cant it use the gettext technologies) , > but the main FreeBSD Documentation team is moving to PO (see > https://www.bsdcan.org/2016/schedule/track/Hacking/680.en.html). > > Best regards, > Alexander Lakhin > https://postgrespro.com/ > >
24.08.2016 09:58, Tatsuo Ishii пишет: > Please look at the attached .pot for one of the doc files. You can use >> any translation software (which supports .po (e.g. poedit, Lokalize, >> ...)) to translate it to any language. >> And you can look at all the docs converted to .po for translation: >> http://oc.postgrespro.ru/index.php/s/puEJKoUwbZ3dia5/download > Ok, the idea is, each sentence in the doc are regarded as "very long > message". Interesting. Looks better than "replacing SGML contents with > translated contents" (which Japanese doc project is doing for now). Yes, the DocBook format has the natural blocks such as <para>, <title>, <indexterm>, ... which could be translated as the separate text fragments. So it's the question of the most accurate mapping of those blocks to .po fragments and back. (We use for it customized xml2po from gnome-doc-utils but there are another programs too.) All the other text processing could be done with the rich gettext toolset. > Unfortunately I can't read Japanese and don't understand how Japanese > FreeBSD team works (and why cant it use the gettext technologies) , > but the main FreeBSD Documentation team is moving to PO (see > https://www.bsdcan.org/2016/schedule/track/Hacking/680.en.html). Please look at the presentation, and the http://wonkity.com/~wblock/translation/translation.pdf -- these documents greatly explain all the challenges of the pre-existing approach and the solutions.
On Wed, Aug 24, 2016 at 2:04 AM, Tatsuo Ishii <ishii@sraoss.co.jp> wrote: >> Arguments pro and contra diagrams are not the central focus of SGML to >> XML conversion, nevertheless: "Diagrams" didn't mean any binary format >> - only SVN or any other text-format is acceptable. And: if the SVN >> source is generated by any program like Inkscape it tends to get >> unreadable. But if we develop a SVN-library with our own predefined >> graphical elements, the SVN source gets very clear. The discussion of >> 2011 mentioned below was continued in 2016: >> https://www.postgresql.org/message-id/5690218B.9060103%40purtz.de. > > Oh I didn't know that. Thank you for pointing it out. I hope this work > will be included in 10.0. We discussed diagrams in docs at PGCon-2016. Heikki's attempt is here https://wiki.postgresql.org/wiki/Figures_%26_Pics_in_Docs and Emre made even better using markdeep https://casual-effects.com/markdeep/ I attached his version and it looks very interesting (open it in firefox). Diagrams made in this way don't need any new tools to learn. Oleg > > Best regards, > -- > Tatsuo Ishii > SRA OSS, Inc. Japan > English: http://www.sraoss.co.jp/index_en.php > Japanese:http://www.sraoss.co.jp > > > -- > Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-docs