Обсуждение: reorderbuffer: memory overconsumption with medium-size subxacts
Hello Found this on Postgres 9.6, but I think it affects back to 9.4. I've seen a case where reorderbuffer keeps very large amounts of memory in use, without spilling to disk, if the main transaction does little or no changes and many subtransactions execute changes just below the threshold to spill to disk. The particular case we've seen is the main transaction does one UPDATE, then a subtransaction does something between 300 and 4000 changes. Since all these are below max_changes_in_memory, nothing gets spilled to disk. (To make matters worse: even if there are some subxacts that do more than max_changes_in_memory, only that subxact is spilled, not the whole transaction.) This was causing a 16GB-machine to die, unable to process the long transaction; had to add additional 16 GB of physical RAM for the machine to be able to process the transaction. I think there's a one-line fix, attached: just add the number of changes in a subxact to nentries_mem when the transaction is assigned to the parent. Since a wal ASSIGNMENT records happens once every 32 subxacts, this accumulates just that number of subxact changes in memory before spilling, which is much more reasonable. (Hmm, I wonder why this happens every 32 subxacts, if the code seems to be using PGPROC_MAX_CACHED_SUBXIDS which is 64.) Hmm, while writing this I am wonder if this affects cases with many levels of subtransactions. Not sure how are nested subxacts handled by reorderbuffer.c, but reading code I think it is okay. Of course, there's Tomas logical_work_mem too, but that's too invasive to backpatch. -- Álvaro Herrera PostgreSQL Expert, https://www.2ndQuadrant.com/
Вложения
On 12/16/18 4:06 PM, Alvaro Herrera wrote: > Hello > > Found this on Postgres 9.6, but I think it affects back to 9.4. > > I've seen a case where reorderbuffer keeps very large amounts of memory > in use, without spilling to disk, if the main transaction does little or > no changes and many subtransactions execute changes just below the > threshold to spill to disk. > > The particular case we've seen is the main transaction does one UPDATE, > then a subtransaction does something between 300 and 4000 changes. > Since all these are below max_changes_in_memory, nothing gets spilled to > disk. (To make matters worse: even if there are some subxacts that do > more than max_changes_in_memory, only that subxact is spilled, not the > whole transaction.) This was causing a 16GB-machine to die, unable to > process the long transaction; had to add additional 16 GB of physical > RAM for the machine to be able to process the transaction. > Yeah. We do the check for each xact separately, so it's vulnerable to this scenario (subxact large just below max_changes_in_memory). > I think there's a one-line fix, attached: just add the number of changes > in a subxact to nentries_mem when the transaction is assigned to the > parent. Since a wal ASSIGNMENT records happens once every 32 subxacts, > this accumulates just that number of subxact changes in memory before > spilling, which is much more reasonable. (Hmm, I wonder why this > happens every 32 subxacts, if the code seems to be using > PGPROC_MAX_CACHED_SUBXIDS which is 64.) > Not sure, for a couple of reasons ... Doesn't that essentially mean we'd always evict toplevel xact(s), including all subxacts, no matter how tiny those subxacts are? That seems like it might easily cause regressions for the common case with many small subxact and one huge subxact (which is the only one we'd currently spill, I think). That seems annoying. But even if we decide it's the right approach, isn't the proposed patch a couple of bricks shy? It redefines the nentries_mem field from "per (sub)xact" to "total" for the toplevel xact, but then updates it only in when assigning the child. But the subxacts may receive changes after that, so ReorderBufferQueueChange() probably needs to update the value for toplevel xact too, I guess. And ReorderBufferCheckSerializeTXN() should probably check the toplevel xact too ... Maybe a simpler solution would be to simply track total number of changes in memory (from all xacts), and then evict the largest one. But I doubt that would be backpatchable - it's pretty much what the logical_work_mem patch does. And of course, addressing this on pg11 is a bit more complicated due to the behavior of Generation context (see the logical_work_mem thread for details). > Hmm, while writing this I am wonder if this affects cases with many > levels of subtransactions. Not sure how are nested subxacts handled by > reorderbuffer.c, but reading code I think it is okay. > That should be OK. The assignments don't care about the intermediate subxacts, they only care about the toplevel xact and current subxact. > Of course, there's Tomas logical_work_mem too, but that's too invasive > to backpatch. > Yep. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2018-12-16 12:06:16 -0300, Alvaro Herrera wrote: > Found this on Postgres 9.6, but I think it affects back to 9.4. > > I've seen a case where reorderbuffer keeps very large amounts of memory > in use, without spilling to disk, if the main transaction does little or > no changes and many subtransactions execute changes just below the > threshold to spill to disk. > > The particular case we've seen is the main transaction does one UPDATE, > then a subtransaction does something between 300 and 4000 changes. > Since all these are below max_changes_in_memory, nothing gets spilled to > disk. (To make matters worse: even if there are some subxacts that do > more than max_changes_in_memory, only that subxact is spilled, not the > whole transaction.) This was causing a 16GB-machine to die, unable to > process the long transaction; had to add additional 16 GB of physical > RAM for the machine to be able to process the transaction. > > I think there's a one-line fix, attached: just add the number of changes > in a subxact to nentries_mem when the transaction is assigned to the > parent. Isn't this going to cause significant breakage, because we rely on nentries_mem to be accurate? /* try to load changes from disk */ if (entry->txn->nentries != entry->txn->nentries_mem) Greetings, Andres Freund
On 2018-Dec-16, Andres Freund wrote: > > I think there's a one-line fix, attached: just add the number of changes > > in a subxact to nentries_mem when the transaction is assigned to the > > parent. > > Isn't this going to cause significant breakage, because we rely on > nentries_mem to be accurate? > > /* try to load changes from disk */ > if (entry->txn->nentries != entry->txn->nentries_mem) Bahh. Are you suggesting I should try a more thorough rewrite of reorderbuffer.c? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2018-12-16 17:30:30 -0300, Alvaro Herrera wrote: > On 2018-Dec-16, Andres Freund wrote: > > > > I think there's a one-line fix, attached: just add the number of changes > > > in a subxact to nentries_mem when the transaction is assigned to the > > > parent. > > > > Isn't this going to cause significant breakage, because we rely on > > nentries_mem to be accurate? > > > > /* try to load changes from disk */ > > if (entry->txn->nentries != entry->txn->nentries_mem) > > Bahh. > > Are you suggesting I should try a more thorough rewrite of > reorderbuffer.c? I'm saying you can't just randomly poke at one variable without actually checking how it's used. I kinda doubt you're going to find something that's immediately backpatchable here. Perhaps something for master that's then backpatched after it matured for a while. Greetings, Andres Freund