Обсуждение: "GIN and GiST Index Types" page is about usage in full text search, but looks general purpose

Поиск
Список
Период
Сортировка

"GIN and GiST Index Types" page is about usage in full text search, but looks general purpose

От
PG Doc comments form
Дата:
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/14/textsearch-indexes.html
Description:

Hey,

when you google for "postgresql gist gin index" you will most probably see
this page (or an older version of it) as #1 and the only result from
postgresql.org:
https://www.postgresql.org/docs/current/textsearch-indexes.html This led me
an others in our team to initially misunderstand that GiST and GIN indexes
are purely a full text search thing in PostgreSQL. But they are of course so
much more, but from this page you would not be able to discover that. (It is
interesting that even searching for `GiST` on postgresql.org lists that page
first, and that for example https://www.postgresql.org/docs/14/sql.html only
lists that page if you Ctrl+F for `gin` or `gist`).

It would probably be a good idea to link to
https://www.postgresql.org/docs/14/gin.html and
https://www.postgresql.org/docs/14/gist.html (or whatever are the best pages
to explain GIN and GiST indexes) in the introduction of this article to lead
people in the right direction. (Bonus points if this can be added to older
versions of the docs as well, as those are ranking on Google and not
everyone clicks through to `current` I guess - including me sometimes.)

Even more effective would be to update the page title and/or headline to
make clear that it is about using GIN and GiST indexes in context of full
text search only.

For the page content itself, it might be beneficial to highlight that the
code example itself is a shorthand that skips the (implied via the type)
definition of an operator class (although it might be possible I do not
understand the full picture here right now - docs are pretty scarce or hard
to find after all).

Let me know if there is a public GH repo where I could send PRs to suggest
these changes of course.

Best
Jan Piotrowski

Re: "GIN and GiST Index Types" page is about usage in full text search, but looks general purpose

От
Peter Geoghegan
Дата:
On Tue, Apr 12, 2022 at 12:12 PM PG Doc comments form
<noreply@postgresql.org> wrote:
> Even more effective would be to update the page title and/or headline to
> make clear that it is about using GIN and GiST indexes in context of full
> text search only.

I agree that the overall structure is unclear, and seems to be more of
an accident than a deliberate choice.

The page in question is "12.9. GIN and GiST Index Types", but it's
really supplementary information for "12.2.2. Creating Indexes". The
fact that the former has greater prominence than the latter (a general
discussion of FTS indexing) seems like a problem in itself.

At one point GiST was competitive with GIN for full text search
performance (or at least more competitive). These days use of GiST for
FTS should be rare. So the title should suggest that GiST FTS indexing
is the nonstandard choice.

-- 
Peter Geoghegan



Peter Geoghegan <pg@bowt.ie> writes:
> The page in question is "12.9. GIN and GiST Index Types", but it's
> really supplementary information for "12.2.2. Creating Indexes". The
> fact that the former has greater prominence than the latter (a general
> discussion of FTS indexing) seems like a problem in itself.

> At one point GiST was competitive with GIN for full text search
> performance (or at least more competitive). These days use of GiST for
> FTS should be rare. So the title should suggest that GiST FTS indexing
> is the nonstandard choice.

I think we should take the index type names out of the section title
entirely, and name it something generic like "Preferred Index Types for
Full Text Search".  Unfortunately, with the EOL'd documentation versions
being pretty much frozen in time, it's not clear that we can prevent
Google from continuing to find that 9.1 page when the search terms
include GIN and GIST.  I suspect it's keying off those terms appearing
in the page title :-(

After the recent changes discussed on the -www list, it's possible
that Google will eventually stop indexing the 9.1 page altogether,
but I'm not holding my breath.

            regards, tom lane



Re: "GIN and GiST Index Types" page is about usage in full text search, but looks general purpose

От
Peter Geoghegan
Дата:
On Tue, Apr 12, 2022 at 12:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I think we should take the index type names out of the section title
> entirely, and name it something generic like "Preferred Index Types for
> Full Text Search".

Agreed.

> After the recent changes discussed on the -www list, it's possible
> that Google will eventually stop indexing the 9.1 page altogether,
> but I'm not holding my breath.

There is always the extreme option of excluding older versions in
robots.txt. I bet that would work. Do you see any downside with that
solution, Jonathan?

--
Peter Geoghegan



Peter Geoghegan <pg@bowt.ie> writes:
> On Tue, Apr 12, 2022 at 12:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think we should take the index type names out of the section title
>> entirely, and name it something generic like "Preferred Index Types for
>> Full Text Search".

> Agreed.

Proposed patch attached.  The existing text already says "GIN indexes are
the preferred text search index type", so I'm not sure we need to go
further than that about guiding people which one to use.  In particular,
since GIN can't support included columns, we can't really deprecate GiST
altogether here.

> There is always the extreme option of excluding older versions in
> robots.txt. I bet that would work.

Yeah, I was wondering about that too.  It's sort of the nuclear option,
but if we don't want to modify EOL'd versions then we may not have any
other way to keep Google from glomming onto them.

            regards, tom lane

diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml
index 20db7b7afe..6afaf9e62c 100644
--- a/doc/src/sgml/textsearch.sgml
+++ b/doc/src/sgml/textsearch.sgml
@@ -3618,7 +3618,7 @@ SELECT plainto_tsquery('supernovae stars');
  </sect1>
 
  <sect1 id="textsearch-indexes">
-  <title>GIN and GiST Index Types</title>
+  <title>Preferred Index Types for Text Search</title>
 
   <indexterm zone="textsearch-indexes">
    <primary>text search</primary>
@@ -3627,10 +3627,16 @@ SELECT plainto_tsquery('supernovae stars');
 
   <para>
    There are two kinds of indexes that can be used to speed up full text
-   searches.
+   searches:
+   <link linkend="gin"><acronym>GIN</acronym></link> and
+   <link linkend="gist"><acronym>GiST</acronym></link>.
    Note that indexes are not mandatory for full text searching, but in
    cases where a column is searched on a regular basis, an index is
    usually desirable.
+  </para>
+
+  <para>
+   To create such an index, do one of:
 
    <variablelist>


Re: "GIN and GiST Index Types" page is about usage in full text search, but looks general purpose

От
Peter Geoghegan
Дата:
On Tue, Apr 12, 2022 at 1:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Proposed patch attached.  The existing text already says "GIN indexes are
> the preferred text search index type", so I'm not sure we need to go
> further than that about guiding people which one to use.  In particular,
> since GIN can't support included columns, we can't really deprecate GiST
> altogether here.

LGTM.

> > There is always the extreme option of excluding older versions in
> > robots.txt. I bet that would work.
>
> Yeah, I was wondering about that too.  It's sort of the nuclear option,
> but if we don't want to modify EOL'd versions then we may not have any
> other way to keep Google from glomming onto them.

I think that our recent decision to just live with the downsides that
go with making the most recent stable release docs canonical was a
wise one, on balance. The reality is that we have very few ways of
influencing search results from Google.

I don't know enough about the topic to be able to claim that the
robots.txt solution would also work out well, in about the same way.
But I suspect that it might, and know that it's a reversible process.

-- 
Peter Geoghegan



Peter Geoghegan <pg@bowt.ie> writes:
> On Tue, Apr 12, 2022 at 1:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Proposed patch attached.  The existing text already says "GIN indexes are
>> the preferred text search index type", so I'm not sure we need to go
>> further than that about guiding people which one to use.  In particular,
>> since GIN can't support included columns, we can't really deprecate GiST
>> altogether here.

> LGTM.

Done that way, then.

> I don't know enough about the topic to be able to claim that the
> robots.txt solution would also work out well, in about the same way.
> But I suspect that it might, and know that it's a reversible process.

Yeah, it's outside my expertise too.

            regards, tom lane