Re: [PATCH]Tablesample Submission

Поиск
Список
Период
Сортировка
От Hitoshi Harada
Тема Re: [PATCH]Tablesample Submission
Дата
Msg-id CAP7QgmkMhjoUi3_RetR7nH4ZOm=B_M7BWdAkv5cf-rekYU5H0w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH]Tablesample Submission  (Qi Huang <huangqiyx@outlook.com>)
Ответы Re: [PATCH]Tablesample Submission  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Re: [PATCH]Tablesample Submission  (Simon Riggs <simon@2ndQuadrant.com>)
Список pgsql-hackers
On Tue, Aug 21, 2012 at 8:08 AM, Qi Huang <huangqiyx@outlook.com> wrote:
>> Please add your patch here:
>>
>> https://commitfest.postgresql.org/action/commitfest_view/open
>>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>
> Hi, Robert
>     I added it under "Miscellaneous".
>     https://commitfest.postgresql.org/action/patch_view?id=918
>
>

Patch does not apply cleanly against latest master.  outfuncs.c,
allpath.c and cost.h have rejected parts.  The make check failed in a
lot of cases up to 26 out of 133.  I didn't look into each issue but I
suggest rebasing on the latest master and making sure the regression
test passes.

Some of the patch don't follow our coding standard.  Please adjust
brace positions, for example.  For the header include list and
Makefile, place a new files in alphabetical order.

The patch doesn't include any documentation.  Consider add some doc
patch for such a big feature like this.

You should update kwlist.h for REPEATABLE but I'm not sure if
REPEATABLE should become a reserved keyword yet.

I don't see why you created T_TableSampleInfo.  TableSampleInfo looks
fine with a simple struct rather than a Node.

If we want to disable a cursor over a sampling table, we should check
it in the parser not the planner.  In the wiki page, one of the TODOs
says about cursor support, but how much difficult is it?  How does the
standard say?

s/skiped/skipped/

Don't we need to reset seed on ExecReScanSampleScan?  Should we add a
new executor node SampleScan?  It seems everything about random
sampling is happening under heapam.

It looks like BERNOULLI allocates heap tuple array beforehand, and
copy all the tuples into it.  This doesn't look acceptable when you
are dealing with a large number of rows in the table.

As wiki says, BERNOULLI relies on the statistics of the table, which
doesn't sound good to me.  Of course we could say this is our
restriction and say good-bye to users who hadn't run ANALYZE first,
but it is too hard for a normal users to use it.  We may need
quick-and-rough count(*) for this.

That is pretty much I have so far.  I haven't read all the code nor
the standard, so I might be wrong somewhere.

Thanks,
-- 
Hitoshi Harada



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Etsuro Fujita"
Дата:
Сообщение: Re: WIP patch: add (PRE|POST)PROCESSOR options to COPY
Следующее
От: Fujii Masao
Дата:
Сообщение: Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown