Re: Optimization for updating foreign tables in Postgres FDW

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Optimization for updating foreign tables in Postgres FDW
Дата
Msg-id CA+Tgmob_X7gsU+Frmok28F=WErye5wAi3p10LT2ULEPzHUDh1A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Optimization for updating foreign tables in Postgres FDW  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
Ответы Re: Optimization for updating foreign tables in Postgres FDW  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
Список pgsql-hackers
On Mon, Mar 7, 2016 at 7:53 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> Another option to avoid such a hazard would be to remove the two changes
> from ExecInitModifyTable and create ExecAuxRowMarks and junk filters even in
> the pushdown case.  I made the changes because we won't use ExecAuxRowMarks
> in that case since we don't need to do EvalPlanQual rechecks and because we
> won't use junk filters in that case since we do UPDATE/DELETE in the
> subplan.  But the creating cost is enough small, so simply removing the
> changes seems like a good idea.

Sure, that works.

>> This issue crops up elsewhere as well.  The changes to
>> ExecModifyTable() have the same problem -- in that case, it might be
>> wise to move the code that's going to have to be indented yet another
>> level into a separate function.   That code also says this:
>>
>> +                       /* No need to provide scan tuple to
>> ExecProcessReturning. */
>> +                       slot = ExecProcessReturning(resultRelInfo,
>> NULL, planSlot);
>>
>> ...but, uh, why not?  The comment says what the code does, but what it
>> should do is explain why it does it.
>
> As documented in IterateDMLPushdown in fdwhandler.sgml, the reason for that
> is that in the pushdown case it's the IterateDMLPushdown's responsiblity to
> get actually inserted/updated/deleted tuples and make those tuples available
> to the ExecProcessReturning.  I'll add comments.

Comments are good things to have.  :-)

>> On a broader level, I'm not very happy with the naming this patch
>> uses.  Here's an example:
>>
>> +    <para>
>> +     If an FDW supports optimizing foreign table updates, it still needs
>> to
>> +     provide <function>PlanDMLPushdown</>, <function>BeginDMLPushdown</>,
>> +     <function>IterateDMLPushdown</> and <function>EndDMLPushdown</>
>> +     described below.
>> +    </para>
>>
>> "Optimizing foreign table updates" is both inaccurate (since it
>> doesn't only optimize updates) and so vague as to be meaningless
>> unless you already know what it means.  The actual patch uses
>> terminology like "fdwPushdowns" which is just as bad.  We might push a
>> lot of things to the foreign side -- sorts, joins, aggregates, limits
>> -- and this is just one of them.  Worse, "pushdown" is itself
>> something of a term of art - will people who haven't been following
>> all of the mammoth, multi-hundred-email threads on this topic know
>> what that means?  I think we need some better terminology here.
>>
>> The best thing that I can come up with offhand is "bulk modify".  So
>> we'd have PlanBulkModify, BeginBulkModify, IterateBulkModify,
>> EndBulkModify, ExplainBulkModify.  Other suggestions welcome.   The
>> ResultRelInfo flag could be ri_usesFDWBulkModify.
>
> I'm not sure that "bulk modify" is best.  Yeah, this would improve the
> performance especially in the bulk-modification case, but would improve the
> performance even in the case where an UPDATE/DELETE modifies just a single
> row.  Let me explain using an example.  Without the patch, we have the
> following plan for an UPDATE on a foreign table that updates a single row:
>
> postgres=# explain verbose update foo set a = a + 1 where a = 1;
>                                     QUERY PLAN
> ----------------------------------------------------------------------------------
>  Update on public.foo  (cost=100.00..101.05 rows=1 width=14)
>    Remote SQL: UPDATE public.foo SET a = $2 WHERE ctid = $1
>    ->  Foreign Scan on public.foo  (cost=100.00..101.05 rows=1 width=14)
>          Output: (a + 1), b, ctid
>          Remote SQL: SELECT a, b, ctid FROM public.foo WHERE ((a = 1)) FOR
> UPDATE
> (5 rows)
>
> The plan requires two queries, SELECT and UPDATE, to do the update.
> (Actually, the plan have additional overheads in creating a cursor for the
> SELECT and establishing a prepared statement for the UPDATE.)  But with the
> patch, we have:
>
> postgres=# explain verbose update foo set a = a + 1 where a = 1;
>                                 QUERY PLAN
> ---------------------------------------------------------------------------
>  Update on public.foo  (cost=100.00..101.05 rows=1 width=14)
>    ->  Foreign Update on public.foo  (cost=100.00..101.05 rows=1 width=14)
>          Remote SQL: UPDATE public.foo SET a = (a + 1) WHERE ((a = 1))
> (3 rows)
>
> The optimized plan requires just a single UPDATE query to do that!  So, even
> in the single-row-modification case the patch could improve the performance.
>
> How about "Direct Modify"; PlanDirectModify, BeginDirectModify,
> IterateDirectModify, EndDirectModify, ExplainDirectModify, and
> ri_usesFDWDirectModify.

Works for me!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: psql completion for ids in multibyte string
Следующее
От: Andres Freund
Дата:
Сообщение: Re: checkpointer continuous flushing - V16