Обсуждение: Reuse the dead item on unique index.
When _bt_check_unique finds a dead item that has same data as new item, LP_DEAD is set to the item. Can we reuse this dead item instead of inserting new item? If it is possible, the growth of btree index can be reduced. I think it is effective in the table like branches table of pgbench to which the same item is updated many times. An attached patch is test implementation of this idea. This patch is not complete, but it might be useful to show this idea. regards, --- Atsushi Ogawa
Вложения
Atsushi Ogawa <atsushi.ogawa@gmail.com> writes: > When _bt_check_unique finds a dead item that has same data as new > item, LP_DEAD is set to the item. Can we reuse this dead item instead > of inserting new item? This strikes me as a pretty bad idea for the same reason pointed out recently in other threads: the notion of equality embodied in a btree opclass' equals function may have little or nothing to do with true identity. So your assumption that it's the "same" data is faulty. Also, I'm dubious about the assumption that "can be marked LP_DELETED" is the same as "can be physically removed right now". The side-effects on indexscans happening concurrently with yours could be bad. At the very least you'd need to obtain super-exclusive lock (cf btbulkdelete) before doing the replacement. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us>: > Atsushi Ogawa <atsushi.ogawa@gmail.com> writes: > > When _bt_check_unique finds a dead item that has same data as new > > item, LP_DEAD is set to the item. Can we reuse this dead item instead > > of inserting new item? > > This strikes me as a pretty bad idea for the same reason pointed out > recently in other threads: the notion of equality embodied in a btree > opclass' equals function may have little or nothing to do with true > identity. So your assumption that it's the "same" data is faulty. Thanks, I understand the problem. When the size of new item and dead item is the equal, the new item can be overwrited at the position of the dead item. > Also, I'm dubious about the assumption that "can be marked LP_DELETED" > is the same as "can be physically removed right now". The side-effects > on indexscans happening concurrently with yours could be bad. At the > very least you'd need to obtain super-exclusive lock (cf btbulkdelete) > before doing the replacement. I agree. I will add code that checks the refcount of buffer. If refcount is 1, current process has super-exclusive lock, and we can overwrite the dead item. If refcount > 1, I use _bt_insertonpg. regards, --- Atsushi Ogawa