On Sun, Apr 01, 2018 at 03:48:07PM +0300, Konstantin Knizhnik wrote:
> Hi hackers,
>
> Vertical (columnar) storage mode is most optimal for analytic and this is why it is widely used in databases oriented
onOLAP, such as Vertica, HyPer,KDB,...
> In Postgres we have cstore extension which is not able to provide all benefits of vertical model because of lack of
supportof vector operations in executor.
> Situation can be changed if we will have pluggable storage API with support of vectorized execution.
>
> But veritcal model is not so good for updates and load of data (because data is mostly imported in horizontal
format).
> This is why in most of the existed systems data is presentin both formats (at least for some time).
>
> I want to announce new model, "diagonal storage" which combines benefits of both approaches.
> The idea is very simple: we first store column 1 of first record, then column 2 of second record, ... and so on until
wereach the last column.
> After it we store second column of first record, third column of the second record,...
>
> Profiling of TPC-H queries shows that mode of the time of query exectution (about 17%) is spent is
heap_deform_tuple.
> New format will allow to significantly reduce time of heap deforming, because there is just of column if the
particularrecord in each tile.
> Moreover over we can perform deforming of many tuples in parallel, which ids especially efficient at quantum
computers.
>
> Attach please find patch with first prototype implementation. It provides about 3.14 times improvement of performance
atmost of TPC-H queries.
You're sure it's not 3.14159265358979323...?
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate