Mailing list subscription's mail delivery delays?
От | Matthias van de Meent |
---|---|
Тема | Mailing list subscription's mail delivery delays? |
Дата | |
Msg-id | CAEze2Wi08Zw4BFfWaVMR1ufe9jhsbqYZtnBhOCyDsZLp-accXg@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: Mailing list subscription's mail delivery delays?
|
Список | pgsql-www |
Hi, By lack of a better place to ask: I've recently noticed that in several of the email threads that I follow over on -hackers@ that some of the email messages have a very high time-to-delivery, and thus mails from the same thread arrive out-of-order. I've seen several occurances of this with very long delays of over 10 hours, with at least one larger than 19 hours, assuming mail server clocks are accurate and receipt dates are correctly included in the mail headers. I'm not sure if the issue is on my side (mail servers are gmail's) or on the mailing list server - all traces I've checked indicate that the delay is somewhere in the delivery from postgres' last mail server to the first gmail mail server. I've only really noticed this sometime in the past few weeks. After sampling my mails, I found other examples of significant delays (>1h) for mails from well-respected hackers dating back to at least 2023-08-28. Would you happen to know why this could be the case, and what I can do to fix it if it's something on my side? I've attached three recently received mails from -hackers as .eml, to help with any debugging: one was delivered relatively quickly (91s), one for which the delivery took a long time (11h+) and one more with a very long delivery time (19h+). I haven't yet noticed any specific differences or commonalities between fast and slow mails. Kind regards, Matthias van de Meent. Hi, On 2023-09-27 17:43:04 -0700, Peter Geoghegan wrote: > On Wed, Sep 27, 2023 at 5:20 PM Melanie Plageman > <melanieplageman@gmail.com> wrote: > > > Can you define "unfreeze"? I don't know if this newly invented term > > > refers to unsetting a page that was marked all-frozen following (say) > > > an UPDATE, or if it refers to choosing to not freeze when the option > > > was available (in the sense that it was possible to do it and fully > > > mark the page all-frozen in the VM). Or something else. > > > > By "unfreeze", I mean unsetting a page all frozen in the visibility > > map when modifying the page for the first time after it was last > > frozen. > > I see. So I guess that Andres meant that you'd track that within all > backends, using pgstats infrastructure (when he summarized your call > earlier today)? That call was just between Robert and me (and not dedicated just to this topic, fwiw). Yes, I was thinking of tracking that in pgstat. I can imagine occasionally rolling it over into pg_class, to better deal with crashes / failovers, but am fairly agnostic on whether that's really useful / necessary. > And that that information would be an important input for VACUUM, as opposed > to something that it maintained itself? Yes. If the ratio of opportunistically frozen pages (which I'd define as pages that were frozen not because they strictly needed to) vs the number of unfrozen pages increases, we need to make opportunistic freezing less aggressive and vice versa. > ISTM that the concept of "unfreezing" a page is equivalent to > "opening" the page that was "closed" at some point (by VACUUM). It's > not limited to freezing per se -- it's "closed for business until > further notice", which is a slightly broader concept (and one not > unique to Postgres). You don't just need to be concerned about updates > and deletes -- inserts are also a concern. > > I would be sure to look out for new inserts that "unfreeze" pages, too > -- ideally you'd have instrumentation that caught that, in order to > get a general sense of the extent of the problem in each of your > chosen representative workloads. This is particularly likely to be a > concern when there is enough space on a heap page to fit one more heap > tuple, that's smaller than most other tuples. The FSM will "helpfully" > make sure of it. This problem isn't rare at all, unfortunately. I'm not as convinced as you are that this is a problem / that the solution won't cause more problems than it solves. Users are concerned when free space can't be used - you don't have to look further than the discussion in the last weeks about adding the ability to disable HOT to fight bloat. I do agree that the FSM code tries way too hard to fit things onto early pages - it e.g. can slow down concurrent copy workloads by 3-4x due to contention in the FSM - and that it has more size classes than necessary, but I don't think just closing frozen pages against further insertions of small tuples will cause its own set of issues. I think at the very least there'd need to be something causing pages to reopen once the aggregate unused space in the table reaches some threshold. Greetings, Andres Freund On Wed, Sep 27, 2023 at 5:20 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > > Can you define "unfreeze"? I don't know if this newly invented term > > refers to unsetting a page that was marked all-frozen following (say) > > an UPDATE, or if it refers to choosing to not freeze when the option > > was available (in the sense that it was possible to do it and fully > > mark the page all-frozen in the VM). Or something else. > > By "unfreeze", I mean unsetting a page all frozen in the visibility > map when modifying the page for the first time after it was last > frozen. I see. So I guess that Andres meant that you'd track that within all backends, using pgstats infrastructure (when he summarized your call earlier today)? And that that information would be an important input for VACUUM, as opposed to something that it maintained itself? > I would probably call choosing not to freeze when the option is > available "no freeze". I have been thinking of what to call it because > I want to add some developer stats for myself indicating why a page > that was freezable was not frozen. I think that having that sort of information available via custom instrumentation (just for the performance validation side) makes a lot of sense. ISTM that the concept of "unfreezing" a page is equivalent to "opening" the page that was "closed" at some point (by VACUUM). It's not limited to freezing per se -- it's "closed for business until further notice", which is a slightly broader concept (and one not unique to Postgres). You don't just need to be concerned about updates and deletes -- inserts are also a concern. I would be sure to look out for new inserts that "unfreeze" pages, too -- ideally you'd have instrumentation that caught that, in order to get a general sense of the extent of the problem in each of your chosen representative workloads. This is particularly likely to be a concern when there is enough space on a heap page to fit one more heap tuple, that's smaller than most other tuples. The FSM will "helpfully" make sure of it. This problem isn't rare at all, unfortunately. > > The choice to freeze or not freeze pretty much always relies on > > guesswork about what'll happen to the page in the future, no? > > Obviously we wouldn't even apply the FPI trigger criteria if we could > > somehow easily determine that it won't work out (to some degree that's > > what conditioning it on being able to set the all-frozen VM bit > > actually does). > > I suppose you are thinking of "opportunistic" as freezing whenever we > aren't certain it is the right thing to do simply because we have the > opportunity to do it? I have heard the term "opportunistic freezing" used to refer to freezing that takes place outside of VACUUM before now. You know, something perfectly analogous to pruning in VACUUM versus opportunistic pruning. (I knew that you can't have meant that -- my point is that the terminology in this area has problems.) > I want a way to express "freeze when freeze min age doesn't require it" That makes sense when you consider where we are right now, but it'll sound odd in a world where freezing via min_freeze_age is the exception rather than the rule. If anything, it would make more sense if the traditional min_freeze_age trigger criteria was the type of freezing that needed its own adjective. -- Peter Geoghegan Andres Freund <andres@anarazel.de> writes: > On 2023-09-27 16:52:44 -0400, Tom Lane wrote: >> I think it doesn't, as long as all the relevant build targets >> write their dependencies with "frontend_code" before "libpq". > Hm, that's not great. I don't think that should be required. I'll try to take > a look at why that's needed. Well, it's only important on platforms where we can't restrict libpq.so from exporting all symbols. I don't know how close we are to deciding that such cases are no longer interesting to worry about. Makefile.shlib seems to know how to do it everywhere except Windows, and I imagine we know how to do it over in the MSVC scripts. >> However, it's hard to test this, because the meson build >> seems completely broken on current macOS: > Looks like you need 1.2 for the new clang / ld output... Apparently apple's > linker changed the format of its version output :/. Ah, yeah, updating MacPorts again brought in meson 1.2.1 which seems to work. I now see a bunch of ld: warning: ignoring -e, not used for output type ld: warning: -undefined error is deprecated which are unrelated. There's still one duplicate warning from the backend link: ld: warning: ignoring duplicate libraries: '-lpam' I'm a bit baffled why that's showing up; there's no obvious double reference to pam. regards, tom lane