Re: Compressed pluggable storage experiments

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Compressed pluggable storage experiments
Дата
Msg-id 20191018102505.b67rcudveao7fwyd@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Compressed pluggable storage experiments  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Ответы Re: Compressed pluggable storage experiments  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
Hi,

On 2019-10-17 12:47:47 -0300, Alvaro Herrera wrote:
> On 2019-Oct-10, Ildar Musin wrote:
> 
> > 1. Unlike FDW API, in pluggable storage API there are no routines like
> > "begin modify table" and "end modify table" and there is no shared
> > state between insert/update/delete calls.
> 
> Hmm.  I think adding a begin/end to modifytable is a reasonable thing to
> do (it'd be a no-op for heap and zheap I guess).

I'm fairly strongly against that. Adding two additional "virtual"
function calls for something that's rarely going to be used, seems like
adding too much overhead to me.


> > 2. It looks like I cannot implement custom storage options. E.g. for
> > compressed storage it makes sense to implement different compression
> > methods (lz4, zstd etc.) and corresponding options (like compression
> > level). But as i can see storage options (like fillfactor etc) are
> > hardcoded and are not extensible. Possible solution is to use GUCs
> > which would work but is not extremely convinient.
> 
> Yeah, the reloptions module is undergoing some changes.  I expect that
> there will be a way to extend reloptions from an extension, at the end
> of that set of patches.

Cool.


> > 3. A bit surprising limitation that in order to use bitmap scan the
> > maximum number of tuples per page must not exceed 291 due to
> > MAX_TUPLES_PER_PAGE macro in tidbitmap.c which is calculated based on
> > 8kb page size. In case of 1mb page this restriction feels really
> > limiting.
> 
> I suppose this is a hardcoded limit that needs to be fixed by patching
> core as we make table AM more pervasive.

That's not unproblematic - a dynamic limit would make a number of
computations more expensive, and we already spend plenty CPU cycles
building the tid bitmap. And we'd waste plenty of memory just having all
that space for the worst case.  ISTM that we "just" need to replace the
TID bitmap with some tree like structure.


> > 4. In order to use WAL-logging each page must start with a standard 24
> > byte PageHeaderData even if it is needless for storage itself. Not a
> > big deal though. Another (acutally documented) WAL-related limitation
> > is that only generic WAL can be used within extension. So unless
> > inserts are made in bulks it's going to require a lot of disk space to
> > accomodate logs and wide bandwith for replication.
> 
> Not sure what to suggest.  Either you should ignore this problem, or
> you should fix it.

I think if it becomes a problem you should ask for an rmgr ID to use for
your extension, which we encode and then then allow to set the relevant
rmgr callbacks for that rmgr id at startup.  But you should obviously
first develop the WAL logging etc, and make sure it's beneficial over
generic wal logging for your case.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Craig Ringer
Дата:
Сообщение: Re: [PATCH] Race condition in logical walsender causes longpostgresql shutdown delay
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: v12.0: segfault in reindex CONCURRENTLY