Re: [HACKERS] GSOC - TOAST'ing in slices

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [HACKERS] GSOC - TOAST'ing in slices
Дата
Msg-id 19235.1489586039@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: [HACKERS] GSOC - TOAST'ing in slices  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: [HACKERS] GSOC - TOAST'ing in slices
Список pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Mar 14, 2017 at 10:03 PM, George Papadrosou
> <gpapadrosou@gmail.com> wrote:
>> The project’s idea is implement different slicing approaches according to
>> the value’s datatype. For example a text field could be split upon character
>> boundaries while a JSON document would be split in a way that allows fast
>> access to it’s keys or values.

> Hmm.  So if you had a long text field containing multibyte characters,
> and you split it after, say, every 1024 characters rather than after
> every N bytes, then you could do substr() without detoasting the whole
> field.  On the other hand, my guess is that you'd waste a fair amount
> of space in the TOAST table, because it's unlikely that the chunks
> would be exactly the right size to fill every page of the table
> completely.  On balance it seems like you'd be worse off, because
> substr() probably isn't all that common an operation.

Keep in mind also that slicing on "interesting" boundaries rather than
with the current procrustean-bed approach could save you at most one or
two chunk fetches per access.  So the upside seems limited.  Moreover,
how are you going to know whether a given toast item has been stored
according to your newfangled approach?  I doubt we're going to accept
forcing a dump/reload for this.

IMO, the real problem here is to be able to predict which chunk(s) to
fetch at all, and I'd suggest focusing on that part of the problem rather
than changes to physical storage.  It's hard to see how to do anything
very smart for text (except in the single-byte-encoding case, which is
already solved).  But the JSONB format was designed with some thought
to this issue, so you might be able to get some traction there.
        regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: [HACKERS] GUC for cleanup indexes threshold.
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: [HACKERS] Allow pg_dumpall to work without pg_authid