Re: [HACKERS] Custom compression methods

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: [HACKERS] Custom compression methods
Дата
Msg-id 20210315225835.2toe2ceanp7gi6fw@alap3.anarazel.de
обсуждение исходный текст
Ответ на [HACKERS] Custom compression methods  (Ildus Kurbangaliev <i.kurbangaliev@postgrespro.ru>)
Ответы Re: [HACKERS] Custom compression methods  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hi,

On 2021-03-15 15:29:05 -0400, Robert Haas wrote:
> On Mon, Mar 15, 2021 at 8:14 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > In the attached patches I have changed this, ...
>
> OK, so just looking over this patch series, here's what I think:
>
> - 0001 and 0002 are now somewhat independent of the rest of this work,
> and could be dropped, but I think they're a good idea, so I'd like to
> commit them. I went over 0001 carefully this morning and didn't find
> any problems. I still need to do some more review of 0002.

I don't particularly like PG_RETURN_HEAPTUPLEHEADER_RAW(). What is "raw"
about it?  It also seems to me like there needs to at least be a
sentence or two explaining when to use which of the functions.

I think heap_copy_tuple_as_raw_datum() should grow an assert checking
there are no external columns?

The commit messages could use a bit more explanation about motivation.


I'm don't like that after 0002 ExecEvalRow(), ExecEvalFieldStoreForm()
contain a nearly identical copy of the same code.  And
make_tuple_from_row() also is similar. It seem that there should be a
heap_form_tuple() version doing this for us?


> - 0003 through 0005 are the core of this patch set. I'd like to get
> them into this release, but I think we're likely to run out of time.


Comments about 0003:
- why is HIDE_TOAST_COMPRESSION useful? Doesn't quite seem to be
  comparable to HIDE_TABLEAM?

- (you comment on this later): toast_get_compression_method() needing to
  fetch some of the data to figure out the compression method is pretty
  painful. Especially because it then goes and throws away that data!

- Adding all these indirect function calls via toast_compression[] just
  for all of two builtin methods isn't fun either.

- I guess NO_LZ4_SUPPORT() is a macro so it shows the proper
  file/function name?

- I wonder if adding compression to the equalTupleDesc() is really
  necessary / won't cause problems (thinking of cases like the
  equalTupleDesc() call in pg_proc.c).

- Is nodeModifyTable.c really the right place for the logic around
  CompareCompressionMethodAndDecompress()?  And is doing it in every
  place that does "user initiated" inserts really the right way? Why
  isn't this done on the tuptoasting level?

- CompareCompressionMethodAndDecompress() is pretty deeply
  indented. Perhaps rewrite a few more of the conditions to be
  continue;?


Comments about 0005:
- I'm personally not really convinced tracking the compression type in
  pg_attribute the way you do is really worth it (. Especially given
  that it's right now only about new rows anyway.  Seems like it'd be
  easier to just treat it as a default for new rows, and dispense with
  all the logic around mismatching compression types etc?


> The biggest thing that jumps out at me while looking at this with
> fresh eyes is that the patch doesn't touch varatt_external.va_extsize
> at all. In a varatt_external, we can't use the va_rawsize to indicate
> the compression method, because there are no bits free, because the 2
> bits not required to store the size are used to indicate what type of
> varlena we've got.

Once you get to varatt_external, you could also just encode it via
vartag_external...


> But, that means that the size of a varlena is limited to 1GB, so there
> are 2 bits free in varatt_external.va_extsize, just like there are in
> va_compressed.va_rawsize. We could store the same two bits in
> varatt_external.va_extsize that we're storing in
> va_compressed.va_rawsize aka va_tcinfo. That's a big deal, because
> then toast_get_compression_method() doesn't have to call
> toast_fetch_datum_slice() any more, which is a rather large savings.
> If it's only impacting pg_column_compression() then whatever, but
> that's not the case: we've got calls to
> CompareCompressionMethodAndDecompress in places like intorel_receive()
> and ExecModifyTable() that look pretty performance-critical.

Yea, I agree, that does seem problematic.


> There's another, rather brute-force approach to this problem, too. We
> could just decide that lz4 will only be used for external data, and
> that there's no such thing as an inline-compressed lz4 varlena.
> deotast_fetch_datum() would just notice that the value is lz4'd and
> de-lz4 it before returning it, since a compressed lz4 datum is
> impossible.

That seems fairly terrible.


> I'm open to being convinced that we don't need to do either of these
> things, and that the cost of iterating over all varlenas in the tuple
> is not so bad as to preclude doing things as you have them here. But,
> I'm afraid it's going to be too expensive.

I mean, I would just define several of those places away by not caring
about tuples in a different compressino formation ending up in a
table...

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Justin Pryzby
Дата:
Сообщение: Re: [PATCH]: Allow errors in parameter values to be reported during the BIND phase itself..
Следующее
От: Andres Freund
Дата:
Сообщение: Re: New IndexAM API controlling index vacuum strategies