Mailing list subscription's mail delivery delays?

Поиск

Список

Период

Сортировка

От	Matthias van de Meent
Тема	Mailing list subscription's mail delivery delays?
Дата	29 сентября 2023 г. 01:46:43
Msg-id	CAEze2Wi08Zw4BFfWaVMR1ufe9jhsbqYZtnBhOCyDsZLp-accXg@mail.gmail.com обсуждение исходный текст
Ответы	Re: Mailing list subscription's mail delivery delays?
Список	pgsql-www

Дерево обсуждения

Hi,

By lack of a better place to ask:

I've recently noticed that in several of the email threads that I
follow over on -hackers@ that some of the email messages have a very
high time-to-delivery, and thus mails from the same thread arrive
out-of-order.
I've seen several occurances of this with very long delays of over 10
hours, with at least one larger than 19 hours, assuming mail server
clocks are accurate and receipt dates are correctly included in the
mail headers.

I'm not sure if the issue is on my side (mail servers are gmail's) or
on the mailing list server - all traces I've checked indicate that the
delay is somewhere in the delivery from postgres' last mail server to
the first gmail mail server.

I've only really noticed this sometime in the past few weeks. After
sampling my mails, I found other examples of significant delays (>1h)
for mails from well-respected hackers dating back to at least
2023-08-28.

Would you happen to know why this could be the case, and what I can do
to fix it if it's something on my side?

I've attached three recently received mails from -hackers as .eml, to
help with any debugging: one was delivered relatively quickly (91s),
one for which the delivery took a long time (11h+) and one more with a
very long delivery time (19h+). I haven't yet noticed any specific
differences or commonalities between fast and slow mails.

Kind regards,

Matthias van de Meent.
Hi,

On 2023-09-27 17:43:04 -0700, Peter Geoghegan wrote:
> On Wed, Sep 27, 2023 at 5:20 PM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
> > > Can you define "unfreeze"? I don't know if this newly invented term
> > > refers to unsetting a page that was marked all-frozen following (say)
> > > an UPDATE, or if it refers to choosing to not freeze when the option
> > > was available (in the sense that it was possible to do it and fully
> > > mark the page all-frozen in the VM). Or something else.
> >
> > By "unfreeze", I mean unsetting a page all frozen in the visibility
> > map when modifying the page for the first time after it was last
> > frozen.
>
> I see. So I guess that Andres meant that you'd track that within all
> backends, using pgstats infrastructure (when he summarized your call
> earlier today)?

That call was just between Robert and me (and not dedicated just to this
topic, fwiw).

Yes, I was thinking of tracking that in pgstat. I can imagine occasionally
rolling it over into pg_class, to better deal with crashes / failovers, but am
fairly agnostic on whether that's really useful / necessary.

> And that that information would be an important input for VACUUM, as opposed
> to something that it maintained itself?

Yes. If the ratio of opportunistically frozen pages (which I'd define as pages
that were frozen not because they strictly needed to) vs the number of
unfrozen pages increases, we need to make opportunistic freezing less
aggressive and vice versa.

> ISTM that the concept of "unfreezing" a page is equivalent to
> "opening" the page that was "closed" at some point (by VACUUM). It's
> not limited to freezing per se -- it's "closed for business until
> further notice", which is a slightly broader concept (and one not
> unique to Postgres). You don't just need to be concerned about updates
> and deletes -- inserts are also a concern.
>
> I would be sure to look out for new inserts that "unfreeze" pages, too
> -- ideally you'd have instrumentation that caught that, in order to
> get a general sense of the extent of the problem in each of your
> chosen representative workloads. This is particularly likely to be a
> concern when there is enough space on a heap page to fit one more heap
> tuple, that's smaller than most other tuples. The FSM will "helpfully"
> make sure of it. This problem isn't rare at all, unfortunately.

I'm not as convinced as you are that this is a problem / that the solution
won't cause more problems than it solves. Users are concerned when free space
can't be used - you don't have to look further than the discussion in the last
weeks about adding the ability to disable HOT to fight bloat.

I do agree that the FSM code tries way too hard to fit things onto early pages
- it e.g. can slow down concurrent copy workloads by 3-4x due to contention in
the FSM - and that it has more size classes than necessary, but I don't think
just closing frozen pages against further insertions of small tuples will
cause its own set of issues.

I think at the very least there'd need to be something causing pages to reopen
once the aggregate unused space in the table reaches some threshold.

Greetings,

Andres Freund

On Wed, Sep 27, 2023 at 5:20 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> > Can you define "unfreeze"? I don't know if this newly invented term
> > refers to unsetting a page that was marked all-frozen following (say)
> > an UPDATE, or if it refers to choosing to not freeze when the option
> > was available (in the sense that it was possible to do it and fully
> > mark the page all-frozen in the VM). Or something else.
>
> By "unfreeze", I mean unsetting a page all frozen in the visibility
> map when modifying the page for the first time after it was last
> frozen.

I see. So I guess that Andres meant that you'd track that within all
backends, using pgstats infrastructure (when he summarized your call
earlier today)? And that that information would be an important input
for VACUUM, as opposed to something that it maintained itself?

> I would probably call choosing not to freeze when the option is
> available "no freeze". I have been thinking of what to call it because
> I want to add some developer stats for myself indicating why a page
> that was freezable was not frozen.

I think that having that sort of information available via custom
instrumentation (just for the performance validation side) makes a lot
of sense.

ISTM that the concept of "unfreezing" a page is equivalent to
"opening" the page that was "closed" at some point (by VACUUM). It's
not limited to freezing per se -- it's "closed for business until
further notice", which is a slightly broader concept (and one not
unique to Postgres). You don't just need to be concerned about updates
and deletes -- inserts are also a concern.

I would be sure to look out for new inserts that "unfreeze" pages, too
-- ideally you'd have instrumentation that caught that, in order to
get a general sense of the extent of the problem in each of your
chosen representative workloads. This is particularly likely to be a
concern when there is enough space on a heap page to fit one more heap
tuple, that's smaller than most other tuples. The FSM will "helpfully"
make sure of it. This problem isn't rare at all, unfortunately.

> > The choice to freeze or not freeze pretty much always relies on
> > guesswork about what'll happen to the page in the future, no?
> > Obviously we wouldn't even apply the FPI trigger criteria if we could
> > somehow easily determine that it won't work out (to some degree that's
> > what conditioning it on being able to set the all-frozen VM bit
> > actually does).
>
> I suppose you are thinking of "opportunistic" as freezing whenever we
> aren't certain it is the right thing to do simply because we have the
> opportunity to do it?

I have heard the term "opportunistic freezing" used to refer to
freezing that takes place outside of VACUUM before now. You know,
something perfectly analogous to pruning in VACUUM versus
opportunistic pruning. (I knew that you can't have meant that -- my
point is that the terminology in this area has problems.)

> I want a way to express "freeze when freeze min age doesn't require it"

That makes sense when you consider where we are right now, but it'll
sound odd in a world where freezing via min_freeze_age is the
exception rather than the rule. If anything, it would make more sense
if the traditional min_freeze_age trigger criteria was the type of
freezing that needed its own adjective.

--
Peter Geoghegan

Andres Freund <andres@anarazel.de> writes:
> On 2023-09-27 16:52:44 -0400, Tom Lane wrote:
>> I think it doesn't, as long as all the relevant build targets
>> write their dependencies with "frontend_code" before "libpq".

> Hm, that's not great. I don't think that should be required. I'll try to take
> a look at why that's needed.

Well, it's only important on platforms where we can't restrict
libpq.so from exporting all symbols.  I don't know how close we are
to deciding that such cases are no longer interesting to worry about.
Makefile.shlib seems to know how to do it everywhere except Windows,
and I imagine we know how to do it over in the MSVC scripts.

>> However, it's hard to test this, because the meson build
>> seems completely broken on current macOS:

> Looks like you need 1.2 for the new clang / ld output...  Apparently apple's
> linker changed the format of its version output :/.

Ah, yeah, updating MacPorts again brought in meson 1.2.1 which seems
to work.  I now see a bunch of

ld: warning: ignoring -e, not used for output type
ld: warning: -undefined error is deprecated

which are unrelated.  There's still one duplicate warning
from the backend link:

ld: warning: ignoring duplicate libraries: '-lpam'

I'm a bit baffled why that's showing up; there's no obvious
double reference to pam.

            regards, tom lane

В списке pgsql-www по дате отправления:

Предыдущее

От: Célestin Matte
Дата: 25 сентября 2023 г., 20:26:24
Сообщение: Notes for future upgrade of pgweb to bookworm

Следующее

От: "David G. Johnston"
Дата: 29 сентября 2023 г., 01:53:32
Сообщение: Re: Mailing list subscription's mail delivery delays?

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Mailing list subscription's mail delivery delays?

Предыдущее

Следующее