Re: [HACKERS] Restrict concurrent update/delete with UPDATE ofpartition key

Поиск
Список
Период
Сортировка
От Pavan Deolasee
Тема Re: [HACKERS] Restrict concurrent update/delete with UPDATE ofpartition key
Дата
Msg-id CABOikdOD-ejBeT0rhEBXZ+nJm2wMBf6_xfr2U2+b3the=TKUcg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Restrict concurrent update/delete with UPDATE ofpartition key  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: [HACKERS] Restrict concurrent update/delete with UPDATE ofpartition key  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers


On Thu, Mar 8, 2018 at 10:27 PM, Robert Haas <robertmhaas@gmail.com> wrote:

However, there's no such thing as a free lunch.  We can't use the CTID
field to point to a CTID in another table because there's no room to
include the identify of the other table in the field.  We can't widen
it to make room because that would break on-disk compatibility and
bloat our already-too-big tuple headers.  So, we cannot make it work
like it does when the updates are confined to a single partition.
Therefore, the only options are (1) ignore the problem, and let a
cross-partition update look entirely like a delete+insert, (2) try to
throw some error in the case where this introduces user-visible
anomalies that wouldn't be visible otherwise, or (3) revert update
tuple routing entirely.  I voted for (1), but the consensus was (2).
I think that (3) will make a lot of people sad; it's a very good
feature. 

I am definitely not suggesting to do #3, though I agree with Tom that the option is on table. May be two back-to-back bugs in the area makes me worried and raises questions about the amount of testing the feature has got. In addition, making such a significant on-disk change for one corner case, for which even #1 might be acceptable, seems a lot. If we at all want to go in that direction, I would suggest considering a patch that I wrote last year to free-up additional bits from the ctid field (as part of the WARM). I know Tom did not like that either, but at the very least, it provides us a lot more room for future work, with the same amount of risk. 
 
If we want to have (2), then we've got to have some way to
mark a tuple that was deleted as part of a cross-partition update, and
that requires a change to the on-disk format.

I think the question is: isn't there an alternate way to achieve the same result? One alternate way would be to do what I suggested above i.e. free up more bits and use one of those. Another way would be to add a hidden column to the partition table, when it is created or when it is attached as a partition. This only penalises the partition tables, but keeps rest of the system out of it. Obviously, if this column is added when the table is attached as a partition, as against at table creation time, then the old tuple may not have room to store this additional field. May be we can handle that by double updating the tuple? That seems bad, but then it only impacts the case when a partition key is updated. And we can clearly document performance implications of that operation. I am not sure how common this case is going to be anyways. With this hidden column, we can even store a pointer to another partition and do something with that, if at all needed.

That's just one idea. Of course, I haven't thought about it for more than 10mins, so most likely I may have missed out on details and it's probably a stupid idea afterall. But there could be other ideas too. And even if we can't find one, my vote would be to settle for #1 instead of trying to do #2.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Testbed for predtest.c ... and some arguable bugs therein
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Temporary tables prevent autovacuum, leading to XID wraparound