Обсуждение: [MASSMAIL]Converting README documentation to Markdown

Поиск
Список
Период
Сортировка

[MASSMAIL]Converting README documentation to Markdown

От
Daniel Gustafsson
Дата:
Over in [0] I asked whether it would be worthwhile converting all our README
files to Markdown, and since it wasn't met with pitchforks I figured it would
be an interesting excercise to see what it would take (my honest gut feeling
was that it would be way too intrusive).  Markdown does brings a few key
features however so IMHO it's worth attempting to see:

* New developers are very used to reading/writing it
* Using a defined format ensures some level of consistency
* Many users and contributors new *as well as* old like reading documentation
  nicely formatted in a browser
* The documentation now prints really well
* pandoc et.al can be used to render nice looking PDF's
* All the same benefits as discussed in [0]

The plan was to follow Grubers original motivation for Markdown closely:

    "The idea is that a Markdown-formatted document should be publishable
    as-is, as plain text, without looking like it’s been marked up with
    tags or formatting instructions."

This translates to making the least amount of changes to achieve a) retained
plain text readability at todays level, b) proper Markdown rendering, not
looking like text files in a HTML window, and c) absolutly no reflows and
minimal impact on git blame.

Turns out we've been writing Markdown for quite some time, so it really didn't
take much at all.  I renamed all the files .md and with almost just changing
whitespace achieved what I think is pretty decent results.  The rendered
versions can be seen by browsing the tree below:

    https://github.com/danielgustafsson/postgres/tree/markdown

The whitespace changes are mostly making sure that code (anything which is to
be rendered without styling really) is indented from column 0 with tab or 4
spaces (depending on what was already used in the file) and has a blank line
before and after.  This is the bulk of the changes.  The non-whitespace changes
introduced are:

* Section/subsection markers: Basically all our files underline the main
  section with ==== and subsections with ----.  This renders perfectly well with.
  Markdown so add these to the few that didn't have them.

* The SSL readme starts a sentence with ">" which renders as quote, removing
  that fixes rendering and makes the plain text version better IMHO.

* In the regex README there are two file references using * as a wildcard, but
  the combination of the two makes Markdown render the text between them in
  italics.  Wrapping these in backticks solves it, but I'm not a fan since we
  don't do that elsewhere.  A solution which avoids backticks would ne nice.

* Some bulletlists characters are changed to match the syntax, which also makes
  them more consistent with all the other README files in the tree.  In one
  case (SSL test readme) there were no bullets at all which is both
  inconsistent and renders poorly.

* Anything inside <> is rendered as a link if it matches, so in cases where <X>
  is used to indicatee "replace with X" I added whitespace like "< X >" which
  might be a bit ugly, but works.  When referencing header files with <time.h>
  the <> are removed to just say the header name, which seemed like the least bad
  option there.

* Text quoted with backticks, like `foo' is replaced with 'foo' to keep it from
  rendering like code.

* Rather than indenting the whole original README for bsd_indent I added ``` to
  make it a code block, ie render without formatting.

The README files in doc/ are left untouched as they contain lots of <foo> XML
tags which all would need to be wrapped in backticks at the cost of plain text
readability.  Might not be controversial and in that case they can be done too,
but I left them for now since they deviated from the least-changes-possible
plan for the patchset.  It can probably be argued thats lots of other READMEs
can be skipped as well, like all the ones in test modules which have 4 lines
saying the directory contains a test for the thing which the name of the
directory already gave away.  For completeness I left those in though, they for
the most part go untouched.

It's not perfect by any stretch, there are still for example cases where a * in
the text turns on italic rendering which wasn't the intention if the author.
Resisting the temptation to go overboard with changes is however a design goal,
these are after all work documents and should be functional and practical.

In order to make review a bit easier I've split the patch into two, one for the
file renaming and one for the changes.  Inspecting the 0002 diff by skipping
whitespace shows the above discussed changes.

Thoughts?

--
Daniel Gustafsson

[0] 20240405000935.2zujjc5t5e2jai4k@awork3.anarazel.de
[1] CAG6XLEmGE95DdKqjk+Dd9vC8mfN7BnV2WFgYk_9ovW6ikN0YSg@mail.gmail.com
[2] https://daringfireball.net/projects/markdown/


Вложения

Re: Converting README documentation to Markdown

От
Erik Wienhold
Дата:
On 2024-04-08 21:29 +0200, Daniel Gustafsson wrote:
> Over in [0] I asked whether it would be worthwhile converting all our README
> files to Markdown, and since it wasn't met with pitchforks I figured it would
> be an interesting excercise to see what it would take (my honest gut feeling
> was that it would be way too intrusive).  Markdown does brings a few key
> features however so IMHO it's worth attempting to see:
> 
> * New developers are very used to reading/writing it
> * Using a defined format ensures some level of consistency
> * Many users and contributors new *as well as* old like reading documentation
>   nicely formatted in a browser
> * The documentation now prints really well
> * pandoc et.al can be used to render nice looking PDF's
> * All the same benefits as discussed in [0]
> 
> The plan was to follow Grubers original motivation for Markdown closely:
> 
>     "The idea is that a Markdown-formatted document should be publishable
>     as-is, as plain text, without looking like it’s been marked up with
>     tags or formatting instructions."

+1 for keeping the plaintext readable.

> This translates to making the least amount of changes to achieve a) retained
> plain text readability at todays level, b) proper Markdown rendering, not
> looking like text files in a HTML window, and c) absolutly no reflows and
> minimal impact on git blame.
> 
> Turns out we've been writing Markdown for quite some time, so it really didn't
> take much at all.  I renamed all the files .md and with almost just changing
> whitespace achieved what I think is pretty decent results.  The rendered
> versions can be seen by browsing the tree below:
> 
>     https://github.com/danielgustafsson/postgres/tree/markdown
> 
> The whitespace changes are mostly making sure that code (anything which is to
> be rendered without styling really) is indented from column 0 with tab or 4
> spaces (depending on what was already used in the file) and has a blank line
> before and after.  This is the bulk of the changes.

I've only peeked at a couple of those READMEs, but they look alright so
far (at least on GitHub).  Should we settle on a specific Markdown
flavor[1]?  Because I'm never sure if some markups only work on
specific code-hosting sites.  Maybe also a guide on writing Markdown
that renders properly, especially with regard to escaping that may be
necessary (see below).

> The non-whitespace changes introduced are:
> 
> [...]
> 
> * In the regex README there are two file references using * as a wildcard, but
>   the combination of the two makes Markdown render the text between them in
>   italics.  Wrapping these in backticks solves it, but I'm not a fan since we
>   don't do that elsewhere.  A solution which avoids backticks would ne nice.

Escaping does the trick: regc_\*.c

> [...]
> 
> * Anything inside <> is rendered as a link if it matches, so in cases where <X>
>   is used to indicatee "replace with X" I added whitespace like "< X >" which
>   might be a bit ugly, but works.  When referencing header files with <time.h>
>   the <> are removed to just say the header name, which seemed like the least bad
>   option there.

Can be escaped as well: \<X>

[1] https://markdownguide.offshoot.io/extended-syntax/#lightweight-markup-languages

-- 
Erik



Re: Converting README documentation to Markdown

От
Daniel Gustafsson
Дата:
> On 8 Apr 2024, at 22:30, Erik Wienhold <ewie@ewie.name> wrote:
> On 2024-04-08 21:29 +0200, Daniel Gustafsson wrote:

> I've only peeked at a couple of those READMEs, but they look alright so
> far (at least on GitHub).  Should we settle on a specific Markdown
> flavor[1]?  Because I'm never sure if some markups only work on
> specific code-hosting sites.

Probably, but if we strive for maintained textual readability with avoiding
most of the creative markup then we're probably close to the original version.
But I agree, it should be evaluated.

> Maybe also a guide on writing Markdown
> that renders properly, especially with regard to escaping that may be
> necessary (see below).

That's a good point, if we opt for an actual format there should be some form
of documentation about that format, especially if we settle for using a
fraction of the capabilities of the format.

>> * In the regex README there are two file references using * as a wildcard, but
>>  the combination of the two makes Markdown render the text between them in
>>  italics.  Wrapping these in backticks solves it, but I'm not a fan since we
>>  don't do that elsewhere.  A solution which avoids backticks would ne nice.
>
> Escaping does the trick: regc_\*.c

Right, but that makes the plaintext version less readable than the backticks I
think.

> Can be escaped as well: \<X>

..and same with this one. It's all very subjective though.

--
Daniel Gustafsson




Re: Converting README documentation to Markdown

От
Peter Eisentraut
Дата:
On 08.04.24 21:29, Daniel Gustafsson wrote:
> Over in [0] I asked whether it would be worthwhile converting all our README
> files to Markdown, and since it wasn't met with pitchforks I figured it would
> be an interesting excercise to see what it would take (my honest gut feeling
> was that it would be way too intrusive).  Markdown does brings a few key
> features however so IMHO it's worth attempting to see:
> 
> * New developers are very used to reading/writing it
> * Using a defined format ensures some level of consistency
> * Many users and contributors new*as well as*  old like reading documentation
>    nicely formatted in a browser
> * The documentation now prints really well
> * pandoc et.al can be used to render nice looking PDF's
> * All the same benefits as discussed in [0]
> 
> The plan was to follow Grubers original motivation for Markdown closely:
> 
>     "The idea is that a Markdown-formatted document should be publishable
>     as-is, as plain text, without looking like it’s been marked up with
>     tags or formatting instructions."
> 
> This translates to making the least amount of changes to achieve a) retained
> plain text readability at todays level, b) proper Markdown rendering, not
> looking like text files in a HTML window, and c) absolutly no reflows and
> minimal impact on git blame.

I started looking through this and immediately found a bunch of tiny 
problems.  (This is probably in part because the READMEs under 
src/backend/access/ are some of the more complicated ones, but then they 
are also the ones that might benefit most from better rendering.)

One general problem is that original Markdown and GitHub-flavored 
Markdown (GFM) are incompatible in some interesting aspects.  For 
example, the line

     A split initially marks the left page with the F_FOLLOW_RIGHT flag.

is rendered by GFM as you'd expect.  But original Markdown converts it to

     A split initially marks the left page with the F<em>FOLLOW</em>RIGHT
     flag.

This kind of problem is pervasive, as you'd expect.

Another incompatibility is that GFM accepts "1)" as a list marker (which 
appears to be used often in the READMEs), but original Markdown does 
not.  This then also affects surrounding formatting.

Also, the READMEs often do not indent lists in a non-ambiguous way.  For 
example, if you look into src/backend/optimizer/README, section "Join 
Tree Construction", there are two list items, but it's not immediately 
clear which paragraphs belong to the list and which ones follow the 
list.  This also interacts with the previous point.  The resulting 
formatting in GFM is quite misleading.

src/port/README.md is a similar case.

There are also various places where whitespace is used for ad-hoc 
formatting.  Consider for example in src/backend/access/gin/README

   the "category" of the null entry.  These are the possible categories:

     1 = ordinary null key value extracted from an indexable item
     2 = placeholder for zero-key indexable item
     3 = placeholder for null indexable item

   Placeholder null entries are inserted into the index because otherwise

But this does not preserve the list-like formatting, it just flows it 
together.

There is a similar case with the authors list at the end of 
src/backend/access/gist/README.md.

src/test/README.md wasn't touched by your patch, but it also needs 
adjustments for list formatting.


In summary, I think before we could accept this, we'd need to go through 
this with a fine-toothed comb line by line and page by page to make sure 
the formatting is still sound.  And we'd need to figure out which 
Markdown flavor to target.




Re: Converting README documentation to Markdown

От
Daniel Gustafsson
Дата:
> On 13 May 2024, at 09:20, Peter Eisentraut <peter@eisentraut.org> wrote:

> I started looking through this and immediately found a bunch of tiny problems.  (This is probably in part because the
READMEsunder src/backend/access/ are some of the more complicated ones, but then they are also the ones that might
benefitmost from better rendering.) 

Thanks for looking!

> One general problem is that original Markdown and GitHub-flavored Markdown (GFM) are incompatible in some interesting
aspects.

That's true, but virtually every implementation of Markdown in practical use
today is incompatible with Original Markdown.

Reading my email I realize I failed to mention the markdown platforms I was
targeting (and thus flavours), and citing Gruber made it even more confusing.
For online reading I verified with Github and VS Code since they have a huge
market presence.  For offline work I targeted rendering with pandoc since we
already have a dependency on it in the tree.  I don't think targeting the
original Markdown implementation is useful, or even realistic.

Another aspect of platform/flavour was to make the markdown version easy to
maintain for hackers writing content.  Requiring the minimum amount of markup
seems like the developer-friendly way here to keep productivity as well as
document quality high.

Most importantly though, I targeted reading the files as plain text without any
rendering.  We keep these files in text format close to the code for a reason,
and maintaining readability as text was a north star.

>  For example, the line
>
>    A split initially marks the left page with the F_FOLLOW_RIGHT flag.
>
> is rendered by GFM as you'd expect.  But original Markdown converts it to
>
>    A split initially marks the left page with the F<em>FOLLOW</em>RIGHT
>    flag.
>
> This kind of problem is pervasive, as you'd expect.

Correct, but I can't imagine that we'd like to wrap every instance of a name
with underscores in backticks like `F_FOLLOW_RIGHT`.  There are very few
Markdown implementations which don't support underscores like this (testing
just now on the top online editors and sites providing markdown editing I
failed to find a single one).

> Also, the READMEs often do not indent lists in a non-ambiguous way.  For example, if you look into
src/backend/optimizer/README,section "Join Tree Construction", there are two list items, but it's not immediately clear
whichparagraphs belong to the list and which ones follow the list.  This also interacts with the previous point.  The
resultingformatting in GFM is quite misleading. 

I agree that the rendered version excacerbates this problem.  Writing a bullet
point list where each item spans multiple paragraphs indented the same way as
the paragraphs following the list is not helpful to the reader.  In these cases
both the markdown and the text version will be improved by indentation.

> There are also various places where whitespace is used for ad-hoc formatting.  Consider for example in
src/backend/access/gin/README
>
>  the "category" of the null entry.  These are the possible categories:
>
>    1 = ordinary null key value extracted from an indexable item
>    2 = placeholder for zero-key indexable item
>    3 = placeholder for null indexable item
>
>  Placeholder null entries are inserted into the index because otherwise
>
> But this does not preserve the list-like formatting, it just flows it together.

That's the kind of sublists which need to be found as part of this work, and
the items prefixed with a list identifier.  In this case, prefixing each row in
the sublist with '-' yields the correct result.

> src/test/README.md wasn't touched by your patch, but it also needs adjustments for list formatting.

I didn't re-indent that one in order to keep the changes to the absolute
minimum, since I considered the rendered version passable even if not
particularly good.  Re-indenting files like this will for sure make the end
result better, as long as the changes keep the text version readability.

> In summary, I think before we could accept this, we'd need to go through this with a fine-toothed comb line by line
andpage by page to make sure the formatting is still sound.   

Absolutely.  I've been over every file to ensure they aren't blatantly wrong,
but I didn't want to spend the time if this was immmediately shot down as
something the community don't want to maintain.

> And we'd need to figure out which Markdown flavor to target.

Absolutely, and as I mentioned above, we need to pick based both the final
result (text and rendered) as well as the developer experience for maintaining
this.

--
Daniel Gustafsson