Re: Eager page freeze criteria clarification

Поиск
Список
Период
Сортировка
От Melanie Plageman
Тема Re: Eager page freeze criteria clarification
Дата
Msg-id CAAKRu_b8BTWKxtTyayRmdjbh+3VQ-fnFYc2kDRKH62wyp8HUGA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Eager page freeze criteria clarification  (Melanie Plageman <melanieplageman@gmail.com>)
Ответы Re: Eager page freeze criteria clarification  (Robert Haas <robertmhaas@gmail.com>)
Re: Eager page freeze criteria clarification  (Peter Geoghegan <pg@bowt.ie>)
Re: Eager page freeze criteria clarification  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On Fri, Jul 28, 2023 at 3:27 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> On Fri, Jul 28, 2023 at 3:00 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > > Is this test meant to guard against unnecessary freezing or to avoid
> > > freezing when the cost is too high? That is, are we trying to
> > > determine how likely it is that the page has been recently modified
> > > and avoid eager freezing when it would be pointless (because the page
> > > will soon be modified again)?
> >
> > Sort of. This cost of freezing over time is weirdly nonlinear, so it's
> > hard to give a simple answer.
> >
> > The justification for the FPI trigger optimization is that FPIs are
> > overwhelmingly the cost that really matters when it comes to freezing
> > (and vacuuming in general) -- so we might as well make the best out of
> > a bad situation when pruning happens to get an FPI. There can easily
> > be a 10x or more cost difference (measured in total WAL volume)
> > between freezing without an FPI and freezing with an FPI.
> ...
> > In 16 VACUUM just "makes the best
> > out of a bad situation" when an FPI was already required during
> > pruning. We have already "paid for the privilege" of writing some WAL
> > for the page at that point, so it's reasonable to not squander a
> > window of opportunity to avoid future FPIs in future VACUUM
> > operations, by freezing early.
> >
> > We're "taking a chance" on being able to get freezing out of the way
> > early when an FPI triggers freezing. It's not guaranteed to work out
> > in each individual case, of course, but even if we assume it's fairly
> > unlikely to work out (which is very pessimistic) it's still very
> > likely a good deal.
> >
> > This strategy (the 16 strategy of freezing eagerly because we already
> > got an FPI) seems safer than a strategy involving freezing eagerly
> > because we won't get an FPI as a result. If for no other reason than
> > this: with the approach in 16 we already know for sure that we'll have
> > written an FPI anyway. It's hard to imagine somebody being okay with
> > the FPIs, but not being okay with the other extra WAL.
>
> I see. I don't have an opinion on the "best of a bad situation"
> argument. Though, I think it is worth amending the comment in the code
> to include this explanation.
>
> But, ISTM that there should also be some independent heuristic to
> determine whether or not it makes sense to freeze the page. That could
> be related to whether or not it will be cheap to do so (in which case
> we can check if we will have to emit an FPI as part of the freeze
> record) or it could be related to whether or not the freezing is
> likely to be pointless (we are likely to update the page again soon).
>
> It sounds like it was discussed before, but I'd be interested in
> revisiting it and happy to test out various ideas.

Hi, in service of "testing various ideas", I've benchmarked the
heuristics proposed in this thread, as well a few others that Andres and
I considered, for determining whether or not to opportunistically freeze
a page during vacuum. Note that this heuristic would be in addition to
the existing criterion that we only opportunistically freeze pages that
can be subsequently set all frozen in the visibility map.

I believe that there are two goals that should dictate whether or not we
should perform opportunistic freezing:

  1. Is it cheap? For example, if the buffer is already dirty, then no
  write amplification occurs, since it must be written out anyway.
  Freezing is also less expensive if we can do it without emitting an
  FPI.

  2. Will it be effective; that is, will it stay frozen?
  Opportunistically freezing a page that will immediately be modified is
  a waste.

The current heuristic on master meets neither of these goals: it freezes
a page if pruning emitted an FPI for it. This doesn't evaluate whether
or not freezing itself would be cheap, but rather attempts to hide
freezing behind an expensive operation. Furthermore, it often fails to
freeze cold data and may indiscriminately freeze hot data.

For the second goal, I've relied on past data to predict future
behavior, so I tried several criteria to estimate the likelihood that a
page will not be imminently modified. What was most effective was
Andres' suggestion of comparing the page LSN to the insert LSN at the
end of the last vacuum of that table; this approximates whether the page
has been recently modified, which is a decent proxy for whether it'll be
modified in the future. To do this, we need to save that insert LSN
somewhere. In the attached WIP patch, I saved it in the table stats, for
now -- knowing that those are not crash-safe.

Other discarded heuristic ideas included comparing the next transaction
ID at the end of the vacuum of a relation to the visibility cutoff xid
in the page -- but that wasn't effective for freezing data from bulk
loads.

The algorithms I evaluated all attempt to satisfy goal (1) by freezing
only if the buffer is already dirty and also by considering whether or
not an FPI would be emitted. Those that attempt to satisfy goal (2) do
so using the LSN comparison with varying thresholds. I ended up testing
master and the following five alternatives:

  1. Dirty buffer, no FPI required

  2. Dirty buffer, no FPI required OR page LSN is older than 10% of the
  LSNs since the last vacuum of the table.

  3. Dirty buffer, no FPI required AND page LSN is older than 10% of the
  LSNs since the last vacuum of the table.

  4. Dirty buffer, no FPI required OR page LSN is older than 33% of the
  LSNs since the last vacuum of the table.

  5. Dirty buffer, no FPI required AND page LSN is older than 33% of the
  LSNs since the last vacuum of the table.

I ran several benchmarks and compared these based on two metrics:

  1. Percentage of pages frozen at the end of the benchmark. For
  workloads with a working set much smaller than their data set, this
  metric should be high. Conversely, for workloads with a working set
  that is more-or-less their entire data set, this metric should be low.

  2. Page freezes per page frozen at the end of the benchmark. This
  should be as low as possible. Since each benchmark starts with zero
  frozen pages, a metric of 1 indicates that each frozen page was frozen
  only once.

Some of the benchmarks were run for a fixed number of transactions.
Those that do not specify a number of transactions were run for 45
minutes. I collected metrics from OS utilities and Postgres statistics
to examine throughput, FPIs emitted, and many other performance metrics
over the course of the benchmark. Below, I've summarized the results and
pointed out any notable negative performance impacts.

Overall, the two algorithms that seem to strike the best balance are (4)
and (5).

The OR condition in algorithm (4), as you might expect, results in
freezing much more of the cold data in workloads with a smaller working
set than data set. It tends to cause more FPIs to be emitted, since the
age criteria alone can trigger freezing -- even when the freeze record
would contain an FPI. Though, these FPIs may happen when the cold data
is eventually frozen in a wraparound vacuum. That is, the absence of
FPIs tracks the absence of frozen data quite closely.

For a workload in which only 10% of the data is being updated, master
often freezes the wrong data and still emits FPIs. For this kind of
workload, algorithms 1 and 2 also did not perform well and emitted more
FPIs than the other algorithms. In my examples, I found that the 10%
cutoff froze data too aggressively -- freezing data that was modified
soon after.

The workloads I benchmarked were as follows:

A. gaussian tpcb-like + select-only:
   pgbench scale 600 (DB < SB) with indexes on updated columns
   WL 1: 2 clients tpcb-like pgbench with gaussian access distribution
   WL 2: 16 clients select-only pgbench
   freezing more is better

B. tpcb-like
   pgbench scale 600, 16 clients
   freezing less is better

C. shifting hot set, autovacuum off, vacuum at end
   1 client inserting a single row and updating an indexed column of that
   row. 2 million transactions.
   freezing more is better

D. shifting hot set, delete old data
   10 MB table with index on updated column
   WL 1: 1 client inserting one row, updating that row
   WL 2: 1 client, rate limited to 0.02 TPS, delete old data keeping
   table at 5000 rows
   freezing less is better

E. shifting hot set, delete new data, access new data
   WL 1: 1 client cycling through 2 inserts of a single row each,
   updating an indexed column of the most recently inserted row, and then
   deleting that row
   WL 2: rate-limited to 0.2 TPS, selecting data from the last 300
   seconds
   freezing more is better

F. many COPYs, autovacuum on
   1 client, copying a total of 50 GB of data, autovacuum will run ~2x,
   ~2 checkpoints
   freezing more is better

G. several COPYs, autovacuum off, vacuum at end
   1 client, copying a total of 10 GB of data, no checkpoints
   freezing more is better

H. append only table, autovacuum off, vacuum at end
   1 client, inserting a single row at a time for 3million transactions
   freezing more is better

I. work queue
   1 client, inserting a row, sleep for half a second, delete that row
   for 5000 transactions.
   freezing less is better

Note that the page freezes/page frozen metric can be misleading when the
overall number of pages freezes is low. This is the case for master. It
did few page freezes but those tended to be pages that were modified
again soon after.


           Page Freezes/Page Frozen (less is better)

|   | Master |     (1) |     (2) |     (3) |     (4) |     (5) |
|---+--------+---------+---------+---------+---------+---------|
| A |  28.50 |    3.89 |    1.08 |    1.15 |    1.10 |    1.10 |
| B |   1.00 |    1.06 |    1.65 |    1.03 |    1.59 |    1.00 |
| C |    N/A |    1.00 |    1.00 |    1.00 |    1.00 |    1.00 |
| D |   2.00 | 5199.15 | 5276.85 | 4830.45 | 5234.55 | 2193.55 |
| E |   7.90 |    3.21 |    2.73 |    2.70 |    2.69 |    2.43 |
| F |    N/A |    1.00 |    1.00 |    1.00 |    1.00 |    1.00 |
| G |    N/A |    1.00 |    1.00 |    1.00 |    1.00 |    1.00 |
| H |    N/A |    1.00 |    1.00 |    1.00 |    1.00 |    1.00 |
| I |    N/A |   42.00 |   42.00 |     N/A |   41.00 |     N/A |


           % Frozen at end of run

|   | Master | (1) | (2) | (3) |  (4) | (5) |
|---+--------+-----+-----+-----+------+-----+
| A |      0 |   1 |  99 |   0 |   81 |   0 |
| B |     71 |  96 |  99 |   3 |   98 |   2 |
| C |      0 |   9 | 100 |   6 |   92 |   5 |
| D |      0 |   1 |   1 |   1 |    1 |   1 |
| E |      0 |  63 | 100 |  68 |  100 |  67 |
| F |      0 |   5 |  14 |   6 |   14 |   5 |
| G |      0 | 100 | 100 |  92 |  100 |  67 |
| H |      0 |  11 | 100 |   9 |   86 |   5 |
| I |      0 | 100 | 100 |   0 |  100 |   0 |


I can provide exact pgbench commands, configurations, or detailed
results upon request.

- Melanie



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Christoph Berg
Дата:
Сообщение: Re: A failure in 031_recovery_conflict.pl on Debian/s390x
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: Wrong usage of pqMsg_Close message code?