Обсуждение: [MASSMAIL]Converting README documentation to Markdown
Over in [0] I asked whether it would be worthwhile converting all our README files to Markdown, and since it wasn't met with pitchforks I figured it would be an interesting excercise to see what it would take (my honest gut feeling was that it would be way too intrusive). Markdown does brings a few key features however so IMHO it's worth attempting to see: * New developers are very used to reading/writing it * Using a defined format ensures some level of consistency * Many users and contributors new *as well as* old like reading documentation nicely formatted in a browser * The documentation now prints really well * pandoc et.al can be used to render nice looking PDF's * All the same benefits as discussed in [0] The plan was to follow Grubers original motivation for Markdown closely: "The idea is that a Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions." This translates to making the least amount of changes to achieve a) retained plain text readability at todays level, b) proper Markdown rendering, not looking like text files in a HTML window, and c) absolutly no reflows and minimal impact on git blame. Turns out we've been writing Markdown for quite some time, so it really didn't take much at all. I renamed all the files .md and with almost just changing whitespace achieved what I think is pretty decent results. The rendered versions can be seen by browsing the tree below: https://github.com/danielgustafsson/postgres/tree/markdown The whitespace changes are mostly making sure that code (anything which is to be rendered without styling really) is indented from column 0 with tab or 4 spaces (depending on what was already used in the file) and has a blank line before and after. This is the bulk of the changes. The non-whitespace changes introduced are: * Section/subsection markers: Basically all our files underline the main section with ==== and subsections with ----. This renders perfectly well with. Markdown so add these to the few that didn't have them. * The SSL readme starts a sentence with ">" which renders as quote, removing that fixes rendering and makes the plain text version better IMHO. * In the regex README there are two file references using * as a wildcard, but the combination of the two makes Markdown render the text between them in italics. Wrapping these in backticks solves it, but I'm not a fan since we don't do that elsewhere. A solution which avoids backticks would ne nice. * Some bulletlists characters are changed to match the syntax, which also makes them more consistent with all the other README files in the tree. In one case (SSL test readme) there were no bullets at all which is both inconsistent and renders poorly. * Anything inside <> is rendered as a link if it matches, so in cases where <X> is used to indicatee "replace with X" I added whitespace like "< X >" which might be a bit ugly, but works. When referencing header files with <time.h> the <> are removed to just say the header name, which seemed like the least bad option there. * Text quoted with backticks, like `foo' is replaced with 'foo' to keep it from rendering like code. * Rather than indenting the whole original README for bsd_indent I added ``` to make it a code block, ie render without formatting. The README files in doc/ are left untouched as they contain lots of <foo> XML tags which all would need to be wrapped in backticks at the cost of plain text readability. Might not be controversial and in that case they can be done too, but I left them for now since they deviated from the least-changes-possible plan for the patchset. It can probably be argued thats lots of other READMEs can be skipped as well, like all the ones in test modules which have 4 lines saying the directory contains a test for the thing which the name of the directory already gave away. For completeness I left those in though, they for the most part go untouched. It's not perfect by any stretch, there are still for example cases where a * in the text turns on italic rendering which wasn't the intention if the author. Resisting the temptation to go overboard with changes is however a design goal, these are after all work documents and should be functional and practical. In order to make review a bit easier I've split the patch into two, one for the file renaming and one for the changes. Inspecting the 0002 diff by skipping whitespace shows the above discussed changes. Thoughts? -- Daniel Gustafsson [0] 20240405000935.2zujjc5t5e2jai4k@awork3.anarazel.de [1] CAG6XLEmGE95DdKqjk+Dd9vC8mfN7BnV2WFgYk_9ovW6ikN0YSg@mail.gmail.com [2] https://daringfireball.net/projects/markdown/
Вложения
On 2024-04-08 21:29 +0200, Daniel Gustafsson wrote: > Over in [0] I asked whether it would be worthwhile converting all our README > files to Markdown, and since it wasn't met with pitchforks I figured it would > be an interesting excercise to see what it would take (my honest gut feeling > was that it would be way too intrusive). Markdown does brings a few key > features however so IMHO it's worth attempting to see: > > * New developers are very used to reading/writing it > * Using a defined format ensures some level of consistency > * Many users and contributors new *as well as* old like reading documentation > nicely formatted in a browser > * The documentation now prints really well > * pandoc et.al can be used to render nice looking PDF's > * All the same benefits as discussed in [0] > > The plan was to follow Grubers original motivation for Markdown closely: > > "The idea is that a Markdown-formatted document should be publishable > as-is, as plain text, without looking like it’s been marked up with > tags or formatting instructions." +1 for keeping the plaintext readable. > This translates to making the least amount of changes to achieve a) retained > plain text readability at todays level, b) proper Markdown rendering, not > looking like text files in a HTML window, and c) absolutly no reflows and > minimal impact on git blame. > > Turns out we've been writing Markdown for quite some time, so it really didn't > take much at all. I renamed all the files .md and with almost just changing > whitespace achieved what I think is pretty decent results. The rendered > versions can be seen by browsing the tree below: > > https://github.com/danielgustafsson/postgres/tree/markdown > > The whitespace changes are mostly making sure that code (anything which is to > be rendered without styling really) is indented from column 0 with tab or 4 > spaces (depending on what was already used in the file) and has a blank line > before and after. This is the bulk of the changes. I've only peeked at a couple of those READMEs, but they look alright so far (at least on GitHub). Should we settle on a specific Markdown flavor[1]? Because I'm never sure if some markups only work on specific code-hosting sites. Maybe also a guide on writing Markdown that renders properly, especially with regard to escaping that may be necessary (see below). > The non-whitespace changes introduced are: > > [...] > > * In the regex README there are two file references using * as a wildcard, but > the combination of the two makes Markdown render the text between them in > italics. Wrapping these in backticks solves it, but I'm not a fan since we > don't do that elsewhere. A solution which avoids backticks would ne nice. Escaping does the trick: regc_\*.c > [...] > > * Anything inside <> is rendered as a link if it matches, so in cases where <X> > is used to indicatee "replace with X" I added whitespace like "< X >" which > might be a bit ugly, but works. When referencing header files with <time.h> > the <> are removed to just say the header name, which seemed like the least bad > option there. Can be escaped as well: \<X> [1] https://markdownguide.offshoot.io/extended-syntax/#lightweight-markup-languages -- Erik
> On 8 Apr 2024, at 22:30, Erik Wienhold <ewie@ewie.name> wrote: > On 2024-04-08 21:29 +0200, Daniel Gustafsson wrote: > I've only peeked at a couple of those READMEs, but they look alright so > far (at least on GitHub). Should we settle on a specific Markdown > flavor[1]? Because I'm never sure if some markups only work on > specific code-hosting sites. Probably, but if we strive for maintained textual readability with avoiding most of the creative markup then we're probably close to the original version. But I agree, it should be evaluated. > Maybe also a guide on writing Markdown > that renders properly, especially with regard to escaping that may be > necessary (see below). That's a good point, if we opt for an actual format there should be some form of documentation about that format, especially if we settle for using a fraction of the capabilities of the format. >> * In the regex README there are two file references using * as a wildcard, but >> the combination of the two makes Markdown render the text between them in >> italics. Wrapping these in backticks solves it, but I'm not a fan since we >> don't do that elsewhere. A solution which avoids backticks would ne nice. > > Escaping does the trick: regc_\*.c Right, but that makes the plaintext version less readable than the backticks I think. > Can be escaped as well: \<X> ..and same with this one. It's all very subjective though. -- Daniel Gustafsson
On 08.04.24 21:29, Daniel Gustafsson wrote: > Over in [0] I asked whether it would be worthwhile converting all our README > files to Markdown, and since it wasn't met with pitchforks I figured it would > be an interesting excercise to see what it would take (my honest gut feeling > was that it would be way too intrusive). Markdown does brings a few key > features however so IMHO it's worth attempting to see: > > * New developers are very used to reading/writing it > * Using a defined format ensures some level of consistency > * Many users and contributors new*as well as* old like reading documentation > nicely formatted in a browser > * The documentation now prints really well > * pandoc et.al can be used to render nice looking PDF's > * All the same benefits as discussed in [0] > > The plan was to follow Grubers original motivation for Markdown closely: > > "The idea is that a Markdown-formatted document should be publishable > as-is, as plain text, without looking like it’s been marked up with > tags or formatting instructions." > > This translates to making the least amount of changes to achieve a) retained > plain text readability at todays level, b) proper Markdown rendering, not > looking like text files in a HTML window, and c) absolutly no reflows and > minimal impact on git blame. I started looking through this and immediately found a bunch of tiny problems. (This is probably in part because the READMEs under src/backend/access/ are some of the more complicated ones, but then they are also the ones that might benefit most from better rendering.) One general problem is that original Markdown and GitHub-flavored Markdown (GFM) are incompatible in some interesting aspects. For example, the line A split initially marks the left page with the F_FOLLOW_RIGHT flag. is rendered by GFM as you'd expect. But original Markdown converts it to A split initially marks the left page with the F<em>FOLLOW</em>RIGHT flag. This kind of problem is pervasive, as you'd expect. Another incompatibility is that GFM accepts "1)" as a list marker (which appears to be used often in the READMEs), but original Markdown does not. This then also affects surrounding formatting. Also, the READMEs often do not indent lists in a non-ambiguous way. For example, if you look into src/backend/optimizer/README, section "Join Tree Construction", there are two list items, but it's not immediately clear which paragraphs belong to the list and which ones follow the list. This also interacts with the previous point. The resulting formatting in GFM is quite misleading. src/port/README.md is a similar case. There are also various places where whitespace is used for ad-hoc formatting. Consider for example in src/backend/access/gin/README the "category" of the null entry. These are the possible categories: 1 = ordinary null key value extracted from an indexable item 2 = placeholder for zero-key indexable item 3 = placeholder for null indexable item Placeholder null entries are inserted into the index because otherwise But this does not preserve the list-like formatting, it just flows it together. There is a similar case with the authors list at the end of src/backend/access/gist/README.md. src/test/README.md wasn't touched by your patch, but it also needs adjustments for list formatting. In summary, I think before we could accept this, we'd need to go through this with a fine-toothed comb line by line and page by page to make sure the formatting is still sound. And we'd need to figure out which Markdown flavor to target.
> On 13 May 2024, at 09:20, Peter Eisentraut <peter@eisentraut.org> wrote: > I started looking through this and immediately found a bunch of tiny problems. (This is probably in part because the READMEsunder src/backend/access/ are some of the more complicated ones, but then they are also the ones that might benefitmost from better rendering.) Thanks for looking! > One general problem is that original Markdown and GitHub-flavored Markdown (GFM) are incompatible in some interesting aspects. That's true, but virtually every implementation of Markdown in practical use today is incompatible with Original Markdown. Reading my email I realize I failed to mention the markdown platforms I was targeting (and thus flavours), and citing Gruber made it even more confusing. For online reading I verified with Github and VS Code since they have a huge market presence. For offline work I targeted rendering with pandoc since we already have a dependency on it in the tree. I don't think targeting the original Markdown implementation is useful, or even realistic. Another aspect of platform/flavour was to make the markdown version easy to maintain for hackers writing content. Requiring the minimum amount of markup seems like the developer-friendly way here to keep productivity as well as document quality high. Most importantly though, I targeted reading the files as plain text without any rendering. We keep these files in text format close to the code for a reason, and maintaining readability as text was a north star. > For example, the line > > A split initially marks the left page with the F_FOLLOW_RIGHT flag. > > is rendered by GFM as you'd expect. But original Markdown converts it to > > A split initially marks the left page with the F<em>FOLLOW</em>RIGHT > flag. > > This kind of problem is pervasive, as you'd expect. Correct, but I can't imagine that we'd like to wrap every instance of a name with underscores in backticks like `F_FOLLOW_RIGHT`. There are very few Markdown implementations which don't support underscores like this (testing just now on the top online editors and sites providing markdown editing I failed to find a single one). > Also, the READMEs often do not indent lists in a non-ambiguous way. For example, if you look into src/backend/optimizer/README,section "Join Tree Construction", there are two list items, but it's not immediately clear whichparagraphs belong to the list and which ones follow the list. This also interacts with the previous point. The resultingformatting in GFM is quite misleading. I agree that the rendered version excacerbates this problem. Writing a bullet point list where each item spans multiple paragraphs indented the same way as the paragraphs following the list is not helpful to the reader. In these cases both the markdown and the text version will be improved by indentation. > There are also various places where whitespace is used for ad-hoc formatting. Consider for example in src/backend/access/gin/README > > the "category" of the null entry. These are the possible categories: > > 1 = ordinary null key value extracted from an indexable item > 2 = placeholder for zero-key indexable item > 3 = placeholder for null indexable item > > Placeholder null entries are inserted into the index because otherwise > > But this does not preserve the list-like formatting, it just flows it together. That's the kind of sublists which need to be found as part of this work, and the items prefixed with a list identifier. In this case, prefixing each row in the sublist with '-' yields the correct result. > src/test/README.md wasn't touched by your patch, but it also needs adjustments for list formatting. I didn't re-indent that one in order to keep the changes to the absolute minimum, since I considered the rendered version passable even if not particularly good. Re-indenting files like this will for sure make the end result better, as long as the changes keep the text version readability. > In summary, I think before we could accept this, we'd need to go through this with a fine-toothed comb line by line andpage by page to make sure the formatting is still sound. Absolutely. I've been over every file to ensure they aren't blatantly wrong, but I didn't want to spend the time if this was immmediately shot down as something the community don't want to maintain. > And we'd need to figure out which Markdown flavor to target. Absolutely, and as I mentioned above, we need to pick based both the final result (text and rendered) as well as the developer experience for maintaining this. -- Daniel Gustafsson