Обсуждение: Re: [PATCHES] Eliminate more detoast copies for packed varlenas

Поиск
Список
Период
Сортировка

Re: [PATCHES] Eliminate more detoast copies for packed varlenas

От
Gregory Stark
Дата:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
>> Ok, this removes what should be most if not all of the call sites where we're
>> detoasting text or byteas. In particular it gets all the regexp/like functions
>> and all the trim/pad functions. It also gets hashtext and hash_any.
>
> Applied with some fixes --- you'd missed like_match.c, which doubtless
> explains Guillame's complaint that it didn't work ...

Strange. It passed all regression tests for me and it seems like this is
something that would have been caught even in single-byte mode by a simple
test. It seems to me that like_match.c only used for SIMILAR is that right?
That would explain it as there don't appear to be any tests of SIMILAR.

Separately:

> Although I'd misdiagnosed the reason for the recent failures on buildfarm
> member grebe, I see no reason to revert the 1-byte-header-friendly changes I
> made in varlena.c. Instead, tweak the code a little bit to get more
> advantage out of that.

This may have been a misdiagnosis of the buildfarm failures but it looks like
a correct bug fix to me. That is, it looks like regexp_split_to_array() was
capable of passing a packed varlena to setup_regexp_matches which wasn't
expecting it. But this I don't understand why it wouldn't cause regression
failures and indeed when I tested it with my build it didn't cause any
problems.


This all brings up the question of what other files should be considered for
fixing. There is the possibility that some of the other sites could turn out
to be performance critical for a given application. In particular I'm
primarily concerned with calls which could be invoked during a data load or
index build.

Of the following list it seems to me the calls in adt/acl.c are probably
interesting to fix. Especially since it seems we could just replace all those
text* parameters with Datum parameters since they're only going to be passed
to textin anyways.

After that the tsearch and xml data types are probably interesting, and the
various data type input functions and contrib gist/gin operator classes.

I'm fine doing the drudge work but wanted to check before I do it that I'm not
doing something we don't think is worth doing at this point in time.

src/backend/access/transam/xlog.c:3
src/backend/catalog/pg_conversion.c:2
src/backend/commands/sequence.c:1
src/backend/libpq/be-fsstubs.c:2
src/backend/tsearch/dict.c:3
src/backend/tsearch/to_tsany.c:6
src/backend/tsearch/wparser.c:6
src/backend/utils/adt/acl.c:61
src/backend/utils/adt/char.c:1
src/backend/utils/adt/date.c:3
src/backend/utils/adt/dbsize.c:2
src/backend/utils/adt/encode.c:1
src/backend/utils/adt/formatting.c:14
src/backend/utils/adt/genfile.c:3
src/backend/utils/adt/not_in.c:2
src/backend/utils/adt/quote.c:2
src/backend/utils/adt/regproc.c:1
src/backend/utils/adt/ruleutils.c:6
src/backend/utils/adt/tid.c:1
src/backend/utils/adt/timestamp.c:8
src/backend/utils/adt/tsquery_rewrite.c:1
src/backend/utils/adt/tsvector_op.c:3
src/backend/utils/adt/xml.c:24
src/backend/utils/mb/mbutils.c:1
src/tutorial/funcs_new.c:3

contrib/adminpack/adminpack.c:6
contrib/chkpass/chkpass.c:2
contrib/dblink/dblink.c:41
contrib/fuzzystrmatch/dmetaphone.c:2
contrib/fuzzystrmatch/fuzzystrmatch.c:6
contrib/hstore/hstore_gin.c:1
contrib/hstore/hstore_gist.c:1
contrib/hstore/hstore_op.c:6
contrib/intarray/_int_op.c:1
contrib/ltree/ltree_op.c:3
contrib/pageinspect/btreefuncs.c:3
contrib/pageinspect/rawpage.c:1
contrib/pg_trgm/trgm_gin.c:2
contrib/pg_trgm/trgm_gist.c:1
contrib/pg_trgm/trgm_op.c:3
contrib/pgcrypto/pgcrypto.c:10
contrib/pgcrypto/pgp-pgsql.c:1
contrib/pgrowlocks/pgrowlocks.c:1
contrib/pgstattuple/pgstatindex.c:2
contrib/pgstattuple/pgstattuple.c:1
contrib/sslinfo/sslinfo.c:2
contrib/tablefunc/tablefunc.c:14
contrib/tsearch2/dict.c:3
contrib/tsearch2/dict_ex.c:1
contrib/tsearch2/dict_ispell.c:1
contrib/tsearch2/dict_snowball.c:3
contrib/tsearch2/dict_syn.c:1
contrib/tsearch2/dict_thesaurus.c:1
contrib/tsearch2/query.c:4
contrib/tsearch2/query_rewrite.c:1
contrib/tsearch2/ts_cfg.c:1
contrib/tsearch2/ts_stat.c:2
contrib/tsearch2/tsvector.c:2
contrib/tsearch2/wparser.c:9
contrib/uuid-ossp/uuid-ossp.c:2
contrib/xml2/xpath.c:25
contrib/xml2/xslt_proc.c:3


--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: [PATCHES] Eliminate more detoast copies for packed varlenas

От
Andrew Dunstan
Дата:

Gregory Stark wrote:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>
>   
>> Gregory Stark <stark@enterprisedb.com> writes:
>>     
>>> Ok, this removes what should be most if not all of the call sites where we're
>>> detoasting text or byteas. In particular it gets all the regexp/like functions
>>> and all the trim/pad functions. It also gets hashtext and hash_any.
>>>       
>> Applied with some fixes --- you'd missed like_match.c, which doubtless
>> explains Guillame's complaint that it didn't work ...
>>     
>
> Strange. It passed all regression tests for me and it seems like this is
> something that would have been caught even in single-byte mode by a simple
> test. It seems to me that like_match.c only used for SIMILAR is that right?
> That would explain it as there don't appear to be any tests of SIMILAR.
>
>   

No. like_match.c contains the template for all the various incarnations 
of LIKE and ILIKE functions. It is included multiple times with 
different sets of #defines in like.c to create those functions 
(currently there are 4 of them). It also supplies the template for the 
like_escape functions, and this is where the macros are used. Those 
functions are apparently only called if there is an explicit ESCAPE 
clause. Some of our regression tests do have this, so I'm not sure what 
happened.

cheers

andrew




Re: [PATCHES] Eliminate more detoast copies for packed varlenas

От
Tom Lane
Дата:
Gregory Stark <stark@enterprisedb.com> writes:
> This may have been a misdiagnosis of the buildfarm failures but it looks like
> a correct bug fix to me. That is, it looks like regexp_split_to_array() was
> capable of passing a packed varlena to setup_regexp_matches which wasn't
> expecting it. But this I don't understand why it wouldn't cause regression
> failures and indeed when I tested it with my build it didn't cause any
> problems.

The particular regression tests we have for those functions seem to pass
constants to them, not table columns, and so they don't see packed inputs.

(It might be interesting to make textin produce a packed result when
possible, just to see what breaks; but I would be afraid to try to do
that for production...)

> This all brings up the question of what other files should be considered for
> fixing.

I'm very much against such a wholesale edit as you seem to have in mind
here.  We already had some destabilization from the limited patch that
went in; now when we're trying to get to beta is not the time for more.
Maybe at the beginning of 8.4 devel cycle would be a reasonable time
to consider touching a lot of files.
        regards, tom lane


Re: [PATCHES] Eliminate more detoast copies for packed varlenas

От
Gregory Stark
Дата:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
>
> (It might be interesting to make textin produce a packed result when
> possible, just to see what breaks; but I would be afraid to try to do
> that for production...)
>
>> This all brings up the question of what other files should be considered for
>> fixing.
>
> I'm very much against such a wholesale edit as you seem to have in mind
> here.  We already had some destabilization from the limited patch that
> went in; now when we're trying to get to beta is not the time for more.
> Maybe at the beginning of 8.4 devel cycle would be a reasonable time
> to consider touching a lot of files.

Well I did expect that sort of concern if I went ahead and did all of them, or
nearly all of them. That's why I'm asking if any of the list seem like they
might be important enough to do now.

For 8.4 I'm starting to think it would make sense to make the distinction
between a "real" varlena and a possibly-unaligned pointer so text* wouldn't be
an ambiguous type which might not be what it appears to be.


--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: [PATCHES] Eliminate more detoast copies for packed varlenas

От
Gregory Stark
Дата:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> (It might be interesting to make textin produce a packed result when
> possible, just to see what breaks; but I would be afraid to try to do
> that for production...)

Reassuringly all checks pass with a hack like that in place. (attached)

I think the right approach here is to define a new type text_packed * (which
would just be a char* or varattrib_1b* or something like that). Then
PG_GETARG_*_PP would return one of these new pointers.

This does leave us with warnings whenever an old-style function calls a
new-style function but I think there would be relatively few of those since
we'll tackle the higher traffic areas first which will be the lower level
functions.

The benefit is that it will give us a warning if we try to pass a pointer from
a new-style function to an old-style function which isn't prepared to receive
it.


--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Вложения

Re: [PATCHES] Eliminate more detoast copies for packed varlenas

От
Gregory Stark
Дата:
"Gregory Stark" <stark@enterprisedb.com> writes:

> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>
>> (It might be interesting to make textin produce a packed result when
>> possible, just to see what breaks; but I would be afraid to try to do
>> that for production...)
>
> Reassuringly all checks pass with a hack like that in place. (attached)

For the record I've been doing some more testing and found one place that
could be a problem down the road. I'm not sure why it didn't show up
previously. In selfuncs.c we use VARDATA/VARSIZE on data that is taken from
parser Const nodes and from the histogram arrays without detoasting them.

Currently this is safe as array elements are not packed and parser nodes
contain values read using textin and never stored in a tuple. But down the
road I expect we'll want to pack array element so this code would need to
detoast the elements or prepare to handle packed elements.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: [PATCHES] Eliminate more detoast copies for packed varlenas

От
Tom Lane
Дата:
Gregory Stark <stark@enterprisedb.com> writes:
> For the record I've been doing some more testing and found one place that
> could be a problem down the road. I'm not sure why it didn't show up
> previously. In selfuncs.c we use VARDATA/VARSIZE on data that is taken from
> parser Const nodes and from the histogram arrays without detoasting them.

> Currently this is safe as array elements are not packed and parser nodes
> contain values read using textin and never stored in a tuple. But down the
> road I expect we'll want to pack array element so this code would need to
> detoast the elements or prepare to handle packed elements.

Hmmm ... I think this should be fixed now, actually.  I'm far from
convinced that a Const could never contain a toasted datum.  Consider
constant-folding in the planner --- it just stuffs the result of a
function into a Const node.

In fact, it seems there's a different risk here: if such a datum were
toasted out-of-line, the reference in a cached plan might live longer
than the underlying toast-table data.  Maybe we need a forcible detoast
in evaluate_function().
        regards, tom lane


Re: [PATCHES] Eliminate more detoast copies for packed varlenas

От
Tom Lane
Дата:
I wrote:
> In fact, it seems there's a different risk here: if such a datum were
> toasted out-of-line, the reference in a cached plan might live longer
> than the underlying toast-table data.  Maybe we need a forcible detoast
> in evaluate_function().

Sure enough, it seems we do.  The attached script fails in every release
back to 7.3.  It's a rather contrived case, because a function marked
immutable probably shouldn't be reading from a table at all, especially
not one you are likely to change or drop.  That's probably why we've not
heard reports of this before.  But it's definitely a bug.
        regards, tom lane

create table big(f1 text);
insert into big values(repeat('xyzzy',100000));

create or replace function getbig() returns text as
'select f1 from big' language sql immutable;

create or replace function usebig(text) returns bool as '
begin return $1 ~ ''xyzzy''; end
' language plpgsql;

prepare foo as select usebig(getbig()) from int4_tbl;

execute foo;

drop table big;

execute foo;


Re: [PATCHES] Eliminate more detoast copies for packed varlenas

От
Bruce Momjian
Дата:
This has been saved for the 8.4 release:
http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Gregory Stark wrote:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
> 
> > Gregory Stark <stark@enterprisedb.com> writes:
> >> Ok, this removes what should be most if not all of the call sites where we're
> >> detoasting text or byteas. In particular it gets all the regexp/like functions
> >> and all the trim/pad functions. It also gets hashtext and hash_any.
> >
> > Applied with some fixes --- you'd missed like_match.c, which doubtless
> > explains Guillame's complaint that it didn't work ...
> 
> Strange. It passed all regression tests for me and it seems like this is
> something that would have been caught even in single-byte mode by a simple
> test. It seems to me that like_match.c only used for SIMILAR is that right?
> That would explain it as there don't appear to be any tests of SIMILAR.
> 
> Separately:
> 
> > Although I'd misdiagnosed the reason for the recent failures on buildfarm
> > member grebe, I see no reason to revert the 1-byte-header-friendly changes I
> > made in varlena.c. Instead, tweak the code a little bit to get more
> > advantage out of that.
> 
> This may have been a misdiagnosis of the buildfarm failures but it looks like
> a correct bug fix to me. That is, it looks like regexp_split_to_array() was
> capable of passing a packed varlena to setup_regexp_matches which wasn't
> expecting it. But this I don't understand why it wouldn't cause regression
> failures and indeed when I tested it with my build it didn't cause any
> problems.
> 
> 
> This all brings up the question of what other files should be considered for
> fixing. There is the possibility that some of the other sites could turn out
> to be performance critical for a given application. In particular I'm
> primarily concerned with calls which could be invoked during a data load or
> index build.
> 
> Of the following list it seems to me the calls in adt/acl.c are probably
> interesting to fix. Especially since it seems we could just replace all those
> text* parameters with Datum parameters since they're only going to be passed
> to textin anyways.
> 
> After that the tsearch and xml data types are probably interesting, and the
> various data type input functions and contrib gist/gin operator classes.
> 
> I'm fine doing the drudge work but wanted to check before I do it that I'm not
> doing something we don't think is worth doing at this point in time.
> 
> src/backend/access/transam/xlog.c:3
> src/backend/catalog/pg_conversion.c:2
> src/backend/commands/sequence.c:1
> src/backend/libpq/be-fsstubs.c:2
> src/backend/tsearch/dict.c:3
> src/backend/tsearch/to_tsany.c:6
> src/backend/tsearch/wparser.c:6
> src/backend/utils/adt/acl.c:61
> src/backend/utils/adt/char.c:1
> src/backend/utils/adt/date.c:3
> src/backend/utils/adt/dbsize.c:2
> src/backend/utils/adt/encode.c:1
> src/backend/utils/adt/formatting.c:14
> src/backend/utils/adt/genfile.c:3
> src/backend/utils/adt/not_in.c:2
> src/backend/utils/adt/quote.c:2
> src/backend/utils/adt/regproc.c:1
> src/backend/utils/adt/ruleutils.c:6
> src/backend/utils/adt/tid.c:1
> src/backend/utils/adt/timestamp.c:8
> src/backend/utils/adt/tsquery_rewrite.c:1
> src/backend/utils/adt/tsvector_op.c:3
> src/backend/utils/adt/xml.c:24
> src/backend/utils/mb/mbutils.c:1
> src/tutorial/funcs_new.c:3
> 
> contrib/adminpack/adminpack.c:6
> contrib/chkpass/chkpass.c:2
> contrib/dblink/dblink.c:41
> contrib/fuzzystrmatch/dmetaphone.c:2
> contrib/fuzzystrmatch/fuzzystrmatch.c:6
> contrib/hstore/hstore_gin.c:1
> contrib/hstore/hstore_gist.c:1
> contrib/hstore/hstore_op.c:6
> contrib/intarray/_int_op.c:1
> contrib/ltree/ltree_op.c:3
> contrib/pageinspect/btreefuncs.c:3
> contrib/pageinspect/rawpage.c:1
> contrib/pg_trgm/trgm_gin.c:2
> contrib/pg_trgm/trgm_gist.c:1
> contrib/pg_trgm/trgm_op.c:3
> contrib/pgcrypto/pgcrypto.c:10
> contrib/pgcrypto/pgp-pgsql.c:1
> contrib/pgrowlocks/pgrowlocks.c:1
> contrib/pgstattuple/pgstatindex.c:2
> contrib/pgstattuple/pgstattuple.c:1
> contrib/sslinfo/sslinfo.c:2
> contrib/tablefunc/tablefunc.c:14
> contrib/tsearch2/dict.c:3
> contrib/tsearch2/dict_ex.c:1
> contrib/tsearch2/dict_ispell.c:1
> contrib/tsearch2/dict_snowball.c:3
> contrib/tsearch2/dict_syn.c:1
> contrib/tsearch2/dict_thesaurus.c:1
> contrib/tsearch2/query.c:4
> contrib/tsearch2/query_rewrite.c:1
> contrib/tsearch2/ts_cfg.c:1
> contrib/tsearch2/ts_stat.c:2
> contrib/tsearch2/tsvector.c:2
> contrib/tsearch2/wparser.c:9
> contrib/uuid-ossp/uuid-ossp.c:2
> contrib/xml2/xpath.c:25
> contrib/xml2/xslt_proc.c:3
> 
> 
> -- 
>   Gregory Stark
>   EnterpriseDB          http://www.enterprisedb.com
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
> 
>                http://archives.postgresql.org

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: [PATCHES] Eliminate more detoast copies for packed varlenas

От
Bruce Momjian
Дата:
Added to TODO:

* Research reducing deTOASTing in more places
 http://archives.postgresql.org/pgsql-hackers/2007-09/msg00895.php


---------------------------------------------------------------------------

Gregory Stark wrote:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
> 
> > Gregory Stark <stark@enterprisedb.com> writes:
> >> Ok, this removes what should be most if not all of the call sites where we're
> >> detoasting text or byteas. In particular it gets all the regexp/like functions
> >> and all the trim/pad functions. It also gets hashtext and hash_any.
> >
> > Applied with some fixes --- you'd missed like_match.c, which doubtless
> > explains Guillame's complaint that it didn't work ...
> 
> Strange. It passed all regression tests for me and it seems like this is
> something that would have been caught even in single-byte mode by a simple
> test. It seems to me that like_match.c only used for SIMILAR is that right?
> That would explain it as there don't appear to be any tests of SIMILAR.
> 
> Separately:
> 
> > Although I'd misdiagnosed the reason for the recent failures on buildfarm
> > member grebe, I see no reason to revert the 1-byte-header-friendly changes I
> > made in varlena.c. Instead, tweak the code a little bit to get more
> > advantage out of that.
> 
> This may have been a misdiagnosis of the buildfarm failures but it looks like
> a correct bug fix to me. That is, it looks like regexp_split_to_array() was
> capable of passing a packed varlena to setup_regexp_matches which wasn't
> expecting it. But this I don't understand why it wouldn't cause regression
> failures and indeed when I tested it with my build it didn't cause any
> problems.
> 
> 
> This all brings up the question of what other files should be considered for
> fixing. There is the possibility that some of the other sites could turn out
> to be performance critical for a given application. In particular I'm
> primarily concerned with calls which could be invoked during a data load or
> index build.
> 
> Of the following list it seems to me the calls in adt/acl.c are probably
> interesting to fix. Especially since it seems we could just replace all those
> text* parameters with Datum parameters since they're only going to be passed
> to textin anyways.
> 
> After that the tsearch and xml data types are probably interesting, and the
> various data type input functions and contrib gist/gin operator classes.
> 
> I'm fine doing the drudge work but wanted to check before I do it that I'm not
> doing something we don't think is worth doing at this point in time.
> 
> src/backend/access/transam/xlog.c:3
> src/backend/catalog/pg_conversion.c:2
> src/backend/commands/sequence.c:1
> src/backend/libpq/be-fsstubs.c:2
> src/backend/tsearch/dict.c:3
> src/backend/tsearch/to_tsany.c:6
> src/backend/tsearch/wparser.c:6
> src/backend/utils/adt/acl.c:61
> src/backend/utils/adt/char.c:1
> src/backend/utils/adt/date.c:3
> src/backend/utils/adt/dbsize.c:2
> src/backend/utils/adt/encode.c:1
> src/backend/utils/adt/formatting.c:14
> src/backend/utils/adt/genfile.c:3
> src/backend/utils/adt/not_in.c:2
> src/backend/utils/adt/quote.c:2
> src/backend/utils/adt/regproc.c:1
> src/backend/utils/adt/ruleutils.c:6
> src/backend/utils/adt/tid.c:1
> src/backend/utils/adt/timestamp.c:8
> src/backend/utils/adt/tsquery_rewrite.c:1
> src/backend/utils/adt/tsvector_op.c:3
> src/backend/utils/adt/xml.c:24
> src/backend/utils/mb/mbutils.c:1
> src/tutorial/funcs_new.c:3
> 
> contrib/adminpack/adminpack.c:6
> contrib/chkpass/chkpass.c:2
> contrib/dblink/dblink.c:41
> contrib/fuzzystrmatch/dmetaphone.c:2
> contrib/fuzzystrmatch/fuzzystrmatch.c:6
> contrib/hstore/hstore_gin.c:1
> contrib/hstore/hstore_gist.c:1
> contrib/hstore/hstore_op.c:6
> contrib/intarray/_int_op.c:1
> contrib/ltree/ltree_op.c:3
> contrib/pageinspect/btreefuncs.c:3
> contrib/pageinspect/rawpage.c:1
> contrib/pg_trgm/trgm_gin.c:2
> contrib/pg_trgm/trgm_gist.c:1
> contrib/pg_trgm/trgm_op.c:3
> contrib/pgcrypto/pgcrypto.c:10
> contrib/pgcrypto/pgp-pgsql.c:1
> contrib/pgrowlocks/pgrowlocks.c:1
> contrib/pgstattuple/pgstatindex.c:2
> contrib/pgstattuple/pgstattuple.c:1
> contrib/sslinfo/sslinfo.c:2
> contrib/tablefunc/tablefunc.c:14
> contrib/tsearch2/dict.c:3
> contrib/tsearch2/dict_ex.c:1
> contrib/tsearch2/dict_ispell.c:1
> contrib/tsearch2/dict_snowball.c:3
> contrib/tsearch2/dict_syn.c:1
> contrib/tsearch2/dict_thesaurus.c:1
> contrib/tsearch2/query.c:4
> contrib/tsearch2/query_rewrite.c:1
> contrib/tsearch2/ts_cfg.c:1
> contrib/tsearch2/ts_stat.c:2
> contrib/tsearch2/tsvector.c:2
> contrib/tsearch2/wparser.c:9
> contrib/uuid-ossp/uuid-ossp.c:2
> contrib/xml2/xpath.c:25
> contrib/xml2/xslt_proc.c:3
> 
> 
> -- 
>   Gregory Stark
>   EnterpriseDB          http://www.enterprisedb.com
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
> 
>                http://archives.postgresql.org

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +