On Tue, 2006-12-05 at 13:25 -0500, Tom Lane wrote:
> Gregory Stark <stark@enterprisedb.com> writes:
> > "Tom Lane" <tgl@sss.pgh.pa.us> writes:
> >> Sure, it should hang around for awhile, and will. The problem is that
> >> its lifetime will be artificially inflated, so that the seqscan ends up
> >> kicking out other blocks that are really of greater importance, rather
> >> than recycling its own old blocks as it should.
>
> > I thought you had switched this all to a clock sweep algorithm.
>
> Yeah ... it's a clock sweep with counter. A buffer's counter is
> incremented by each access and decremented when the sweep passes over
> it. So multiple accesses allow the buffer to survive longer. For a
> large seqscan you really would rather the counter stayed at zero,
> because you want the buffers to be recycled when the sweep comes back
> the first time.
If you focus the backends together by synchronizing them you definitely
also then need to solve the problem of false cache reinforcement.
I envisaged that we would handle the problem by having a large SeqScan
reuse its previous buffers so it would avoid polluting the cache. If we
kept track of how many backends were in link-step together (a "Conga")
we would be able to check that a block had not been used by anyone but
the Conga members.
So we'd need rules about
- when to allow a Conga to form (if file is very big we check, if not we
don't, no real need for exact synchronisation in all cases)
- how to join a Conga
- how to leave a Conga if you fall behind
The cost of synchronisation (i.e. LWlocks) is much less than the cost of
non-synchronisation (i.e. lots more I/O).
-- Simon Riggs EnterpriseDB http://www.enterprisedb.com