Обсуждение: pgsql-server/ oc/src/sgml/ref/cluster.sgml rc/ ...

Поиск
Список
Период
Сортировка

pgsql-server/ oc/src/sgml/ref/cluster.sgml rc/ ...

От
momjian@postgresql.org (Bruce Momjian - CVS)
Дата:
CVSROOT:    /cvsroot
Module name:    pgsql-server
Changes by:    momjian@postgresql.org    02/08/10 16:43:46

Modified files:
    doc/src/sgml/ref: cluster.sgml
    src/backend/commands: cluster.c
Added files:
    src/test/regress/output: cluster.out
    src/test/regress/sql: cluster.sql

Log message:
    Major improvement in CLUSTER which preserves table characteristics using
    relfilenode.

    I sent the CLUSTER patch a few days ago and I think it was missed.  I
    append it again, this time including the regression test files.  For the
    committer, please note that you have to cvs add the files as they don't
    exist.  Maybe add to the parallel and serial schedules also, but I don't
    know such stuff.

    Alvaro Herrera (<alvherre[a]atentus.com>)


Re: pgsql-server/ oc/src/sgml/ref/cluster.sgml rc/ ...

От
Tom Lane
Дата:
momjian@postgresql.org (Bruce Momjian - CVS) writes:
>     Major improvement in CLUSTER which preserves table characteristics using
>     relfilenode.

This patch is still a few bricks shy of a load.  In particular,
it completely destroys TOASTed data.

regression=# create table f11 (f1 int unique, f2 text);
NOTICE:  CREATE TABLE / UNIQUE will create implicit index 'f11_f1_key' for table 'f11'
CREATE TABLE
regression=# insert into f11 values(0, repeat('xyzzy', 100000));
INSERT 691177 1
regression=# select f1,length(f2) from f11;
 f1 | length
----+--------
  0 | 500000
(1 row)

regression=# cluster f11_f1_key on f11;
CLUSTER
regression=# select f1,length(f2) from f11;
ERROR:  Relation 691181 does not exist


As-is the patch is entirely unacceptable.  Ideally we should find a way
to move tuples into the new table without invoking the TOAST code at
all, but I'm not sure what that will entail.

            regards, tom lane

Re: pgsql-server/ oc/src/sgml/ref/cluster.sgml rc/ ...

От
Bruce Momjian
Дата:
Tom Lane wrote:
> momjian@postgresql.org (Bruce Momjian - CVS) writes:
> >     Major improvement in CLUSTER which preserves table characteristics using
> >     relfilenode.
>
> This patch is still a few bricks shy of a load.  In particular,
> it completely destroys TOASTed data.
>
> regression=# create table f11 (f1 int unique, f2 text);
> NOTICE:  CREATE TABLE / UNIQUE will create implicit index 'f11_f1_key' for table 'f11'
> CREATE TABLE
> regression=# insert into f11 values(0, repeat('xyzzy', 100000));
> INSERT 691177 1
> regression=# select f1,length(f2) from f11;
>  f1 | length
> ----+--------
>   0 | 500000
> (1 row)
>
> regression=# cluster f11_f1_key on f11;
> CLUSTER
> regression=# select f1,length(f2) from f11;
> ERROR:  Relation 691181 does not exist
>
>
> As-is the patch is entirely unacceptable.  Ideally we should find a way

                     ^^^^^^^^^^^^^^^^^^^^^
> to move tuples into the new table without invoking the TOAST code at
> all, but I'm not sure what that will entail.

OK, Tom, you found a problem.  No need to make the submitter feel any
worse by adding critical language to your comments.

The basic question is should be back it out or put it on the open items
list for 7.3 and keep working on it?  Either way, it will be working for
7.3, I am sure.

Myself, I thought the toast table would come along with the heap table
during the relfilenode change, and I am not sure why it didn't.

Also, do we still need the FlushRelationBuffers() calls now that we
handle the local buffer differently?  Would clustering temp table still
require the call?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: pgsql-server/ oc/src/sgml/ref/cluster.sgml rc/ ...

От
Tom Lane
Дата:
I said:
> Ideally we should find a way
> to move tuples into the new table without invoking the TOAST code at
> all, but I'm not sure what that will entail.

I was too tired last night to really think about this.  What I had in
mind when I wrote the above was to somehow disable the TOAST machinery
while copying tuples, so that any toasted values in the new heap would
still reference the old toast table.  That would avoid "unnecessary"
copying of the toasted datums ... but it looks ugly to do.

This morning, however, I prefer the idea of letting the system build
a new TOAST table along with the new heap, and then simply swapping the
toast-table links along with the relfilenode links.  This would be a
trivial addition to Alvaro's patch.  The net effect would be that the
contents of the TOAST table would also be re-sorted into the clustered
order, which seems at least potentially a win for access time.

The main downside of this approach is that you'd need enough free disk
space for a new copy of the TOAST table, not only the main table and its
indexes.  In some scenarios this will mean a very large increase in the
disk space needed for CLUSTER, since the TOAST table may well out-bulk
its parent by large amounts.

I'm engaged in code review for Alvaro's patch today, and will add the
toast-table link swap as a quick-and-dirty solution.  If anyone is
sufficiently swayed by the disk-space issue to investigate doing it the
other way, go right ahead --- we could remove the link swap again if
another solution is offered.

            regards, tom lane

Re: pgsql-server/ oc/src/sgml/ref/cluster.sgml rc/ ...

От
Alvaro Herrera
Дата:
Bruce Momjian dijo:

> Tom Lane wrote:
> > momjian@postgresql.org (Bruce Momjian - CVS) writes:
> > >     Major improvement in CLUSTER which preserves table characteristics using
> > >     relfilenode.
> >
> > This patch is still a few bricks shy of a load.  In particular,
> > it completely destroys TOASTed data.

> The basic question is should be back it out or put it on the open items
> list for 7.3 and keep working on it?  Either way, it will be working for
> 7.3, I am sure.

Don't back it out just yet.  I think I'm finding why this happens and
how to correct it.  I'm working on it.

One question (that is unrelated to the fix I'm currently working on): do
each TOASTed tuple have a pointer to the TOAST table?  Not to the tuple
in the table, but to the table itself?

--
Alvaro Herrera (<alvherre[a]atentus.com>)
Voy a acabar con todos los humanos / con los humanos yo acabaré
voy a acabar con todos / con todos los humanos acabaré (Bender)


Re: pgsql-server/ oc/src/sgml/ref/cluster.sgml rc/ ...

От
Alvaro Herrera
Дата:
Tom Lane dijo:

> I'm engaged in code review for Alvaro's patch today, and will add the
> toast-table link swap as a quick-and-dirty solution.

I already did that.  I was bitten by the dependency tracking thing that
drops the new TOAST table; I just got up to fix it.

--
Alvaro Herrera (<alvherre[a]atentus.com>)
Officer Krupke, what are we to do?
Gee, officer Krupke, Krup you! (West Side Story, "Gee, Officer Krupke")


Re: pgsql-server/ oc/src/sgml/ref/cluster.sgml rc/ ...

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@atentus.com> writes:
> I already did that.  I was bitten by the dependency tracking thing that
> drops the new TOAST table; I just got up to fix it.

Yeah, I'm on that now.  I got distracted by the problems the code
induced in the relcache --- didn't you get any
WARNING: trying to delete a rd_node reldesc that does not exist.
messages in your testing?  They popped up for me anytime I modified
cluster.c at all.

Anyway I think I'm close to a version that actually works ;-)

            regards, tom lane

Re: pgsql-server/ oc/src/sgml/ref/cluster.sgml rc/ ...

От
Alvaro Herrera
Дата:
Tom Lane dijo:

> Alvaro Herrera <alvherre@atentus.com> writes:
> > I already did that.  I was bitten by the dependency tracking thing that
> > drops the new TOAST table; I just got up to fix it.
>
> Yeah, I'm on that now.  I got distracted by the problems the code
> induced in the relcache --- didn't you get any
> WARNING: trying to delete a rd_node reldesc that does not exist.
> messages in your testing?  They popped up for me anytime I modified
> cluster.c at all.

No, I didn't.  But I didn't came down to writing even one line about the
dependency fixing.

> Anyway I think I'm close to a version that actually works ;-)

Ok, I leave it in your hands then.  Thank you.

--
Alvaro Herrera (<alvherre[a]atentus.com>)
"La Primavera ha venido. Nadie sabe como ha sido" (A. Machado)


TOAST & DROP COLUMN (Was: RE: pgsql-server/ oc/src/sgml/ref/cluster.sgml rc/ ... )

От
"Christopher Kings-Lynne"
Дата:
> I was too tired last night to really think about this.  What I had in
> mind when I wrote the above was to somehow disable the TOAST machinery
> while copying tuples, so that any toasted values in the new heap would
> still reference the old toast table.  That would avoid "unnecessary"
> copying of the toasted datums ... but it looks ugly to do.

Incidentally - will a dropped column definitely have its TOAST data
reclaimed when you do the "update set col=col;" ?

Chris


Re: TOAST & DROP COLUMN (Was: RE: pgsql-server/ oc/src/sgml/ref/cluster.sgml rc/ ... )

От
Tom Lane
Дата:
"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:
> Incidentally - will a dropped column definitely have its TOAST data
> reclaimed when you do the "update set col=col;" ?

The code is there to do it, but I must admit I didn't test that it
followed through.  Please check it ...

            regards, tom lane

Re: pgsql-server/ oc/src/sgml/ref/cluster.sgml rc/ ...

От
Bruce Momjian
Дата:
Tom, I am glad you added this repeat()/TOAST test to the cluster
regression test.  It is a nice way to test TOAST.  Folks, are there
other areas where we should testing TOAST in the regression tests?

---------------------------------------------------------------------------

Tom Lane wrote:
> momjian@postgresql.org (Bruce Momjian - CVS) writes:
> >     Major improvement in CLUSTER which preserves table characteristics using
> >     relfilenode.
>
> This patch is still a few bricks shy of a load.  In particular,
> it completely destroys TOASTed data.
>
> regression=# create table f11 (f1 int unique, f2 text);
> NOTICE:  CREATE TABLE / UNIQUE will create implicit index 'f11_f1_key' for table 'f11'
> CREATE TABLE
> regression=# insert into f11 values(0, repeat('xyzzy', 100000));
> INSERT 691177 1
> regression=# select f1,length(f2) from f11;
>  f1 | length
> ----+--------
>   0 | 500000
> (1 row)
>
> regression=# cluster f11_f1_key on f11;
> CLUSTER
> regression=# select f1,length(f2) from f11;
> ERROR:  Relation 691181 does not exist
>
>
> As-is the patch is entirely unacceptable.  Ideally we should find a way
> to move tuples into the new table without invoking the TOAST code at
> all, but I'm not sure what that will entail.
>
>             regards, tom lane
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: pgsql-server/ oc/src/sgml/ref/cluster.sgml rc/ ...

От
Joe Conway
Дата:
Bruce Momjian wrote:
> Tom, I am glad you added this repeat()/TOAST test to the cluster
> regression test.  It is a nice way to test TOAST.  Folks, are there
> other areas where we should testing TOAST in the regression tests?
>

Possibly strings.sql for the text substring tests (think toast slicing).
I'm working in this area right now, so I'll add some tests to cover
toasted data (and bytea substrings as well) before I'm done.

Joe