Обсуждение: Getting rid of excess lseeks()

Поиск
Список
Период
Сортировка

Getting rid of excess lseeks()

От
Tom Lane
Дата:
We've known for a long time that Postgres does a lot of
redundant-seeming "lseek(fd,0,SEEK_END)" kernel calls while inserting
data; one for each inserted tuple, in fact.  This is coming from
RelationGetBufferForTuple() in src/backend/access/heap/hio.c, which does
RelationGetNumberOfBlocks() to ensure that it knows the currently last
page of the relation to insert into.  That results in the lseek() call,
which is the only way to be sure we know the current file EOF exactly,
given that other backends might be extending the file too.

We have talked about avoiding this kernel call by keeping an accurate
EOF location somewhere in shared memory.  However, I just had what is
either a brilliant or foolish idea: who says that we absolutely must
insert the new tuple on the very last page of the table?  If it fits on
a page that's not-quite-the-last-one, why shouldn't we put it there?
If that works, we could just use "rel->rd_nblocks-1" as our initial
guess of the page to insert onto, and skip the lseek.  It doesn't
matter if rd_nblocks is slightly out of date.  The logic in 
RelationGetBufferForTuple would then be something like:
/* * First, use cached rd_nblocks to guess which page to put tuple * on. */if (rel->rd_nblocks > 0){    see if tuple
willfit on page rel->rd_nblocks-1;    if so, put it there and return.}/* * Before extending relation, make sure no one
elsehas done * so more recently than our last rd_nblocks update.  (If we * blindly extend the relation here, then
probablymost of the * page the other guy added will end up going to waste.) */newlastblock =
RelationGetNumberOfBlocks(relation);if(newlastblock > rel->rd_nblocks){    /*     * Someone else has indeed extended
therel.     * Update my idea of the rel length, and see if     * I can fit my tuple on the page he made.     */
rel->rd_nblocks= newlastblock;    see if tuple will fit on page rel->rd_nblocks-1;    if so, put it there and
return.}/** Otherwise, extend the rel by one block and put our tuple * there, same as before.  (Be sure to update
rel->rd_nblocks* for next time...) */
 

An additional small win is that we'd not have to do theif (!relation->rd_myxactonly)    LockPage(relation, 0,
ExclusiveLock);
bit unless the first insertion attempt fails.  This lock is only needed
to ensure that just one backend extends the rel at a time, so as long as
we are adding a tuple to a pre-existing page there's no need to grab it.
That would improve concurrency some more, since the majority of tuple
insertions will succeed in adding to an existing page.

So the question is, is it safe to insert on non-last pages?  AFAIK,
the only aspect of the system that really makes assumptions about tuple
positioning that sequential scans stop when they reach rel->rd_nblocks
(which they update at the beginning of the scan).  They are assuming
that tuples appearing on pages added after a scan starts are
uninteresting because they can't be committed from the point of view of
the scanning transaction.  But that assumption is not violated by
placing new tuples in pages earlier than the last possible place.

Comments?  Is there a hole in my reasoning?
        regards, tom lane


RE: Getting rid of excess lseeks()

От
Mike Mascari
Дата:
Just curious (and without having looked at a line of code),

If your idea works, would it be possible, or even a good idea, to 
have PostgreSQL extend the relation in a non-linear fashion? So, for 
a given statement, the second time it finds itself extending the 
relation it does so by 2 x pagesize, the third time, now having 
exhausted 3 pages, it extends the relation by 4 x pagesize, etc. 
Oracle has its STORAGE clause of the CREATE TABLE statement which 
allows for tuning of such things, but I'm wondering if PostgreSQL 
can/should do some adaptive allocation of disk space. Perhaps it 
might cut down on large bulk load times?

Just curious,

Mike Mascari
mascarm@mascari.com

-----Original Message-----
From:    Tom Lane [SMTP:tgl@sss.pgh.pa.us]

We have talked about avoiding this kernel call by keeping an accurate
EOF location somewhere in shared memory.  However, I just had what is
either a brilliant or foolish idea: who says that we absolutely must
insert the new tuple on the very last page of the table?  If it fits 
on
a page that's not-quite-the-last-one, why shouldn't we put it there?
If that works, we could just use "rel->rd_nblocks-1" as our initial
guess of the page to insert onto, and skip the lseek.  


Re: Getting rid of excess lseeks()

От
Tom Lane
Дата:
Mike Mascari <mascarm@mascari.com> writes:
> If your idea works, would it be possible, or even a good idea, to 
> have PostgreSQL extend the relation in a non-linear fashion?

The trick would be to ensure that the extra blocks actually got used
for something ... without more logic than is there now, all the backends
would glom onto the last new page and ignore the possibility of putting
tuples into the other pages you'd added.

The hack I've proposed (and am currently testing) doesn't really do
anything to reduce the per-page overhead of extending the relation.
What it does do is reduce the per-tuple overhead of adding tuples
to an extant last page.  Basically we are down to an lseek per block
instead of an lseek per tuple ...
        regards, tom lane