Re: Statistics Import and Export

Поиск
Список
Период
Сортировка
От Stephen Frost
Тема Re: Statistics Import and Export
Дата
Msg-id Ze9SD2gPVwklwvEZ@tamriel.snowman.net
обсуждение исходный текст
Ответ на Re: Statistics Import and Export  (Corey Huinker <corey.huinker@gmail.com>)
Ответы Re: Statistics Import and Export  (Corey Huinker <corey.huinker@gmail.com>)
Список pgsql-hackers
Greetings,

* Corey Huinker (corey.huinker@gmail.com) wrote:
> > Having thought about it a bit more, I generally like the idea of being
> > able to just update one stat instead of having to update all of them at
> > once (and therefore having to go look up what the other values currently
> > are...).  That said, per below, perhaps making it strict is the better
> > plan.
>
> v8 has it as strict.

Ok.

> > > > Also, in some cases we allow the function to be called with a
> > > > NULL but then make it a no-op rather than throwing an ERROR (eg, if the
> > > > OID ends up being NULL).
> > >
> > > Thoughts on it emitting a WARN or NOTICE before returning false?
> >
> > Eh, I don't think so?
> >
> > Where this is coming from is that we can often end up with functions
> > like these being called inside of larger queries, and having them spit
> > out WARN or NOTICE will just make them noisy.
> >
> > That leads to my general feeling of just returning NULL if called with a
> > NULL OID, as we would get with setting the function strict.
>
> In which case we're failing nearly silently, yes, there is a null returned,
> but we have no idea why there is a null returned. If I were using this
> function manually I'd want to know what I did wrong, what parameter I
> skipped, etc.

I can see it both ways and don't feel super strongly about it ... I just
know that I've had some cases where we returned an ERROR or otherwise
were a bit noisy on NULL values getting passed into a function and it
was much more on the annoying side than on the helpful side; to the
point where we've gone back and pulled out ereport(ERROR) calls from
functions before because they were causing issues in otherwise pretty
reasonable queries (consider things like functions getting pushed down
to below WHERE clauses and such...).

> > Well, that code is for pg_statistic while I was looking at pg_class (in
> > vacuum.c:1428-1443, where we track if we're actually changing anything
> > and only make the pg_class change if there's actually something
> > different):
>
> I can do that, especially since it's only 3 tuples of known types, but my
> reservations are summed up in the next comment.

> > Not sure why we don't treat both the same way though ... although it's
> > probably the case that it's much less likely to have an entire
> > pg_statistic row be identical than the few values in pg_class.
>
> That would also involve comparing ANYARRAY values, yuk. Also, a matched
> record will never be the case when used in primary purpose of the function
> (upgrades), and not a big deal in the other future cases (if we use it in
> ANALYZE on foreign tables instead of remote table samples, users
> experimenting with tuning queries under hypothetical workloads).

Sure.  Not a huge deal either way, was just pointing out the difference.
I do think it'd be good to match what ANALYZE does here, so checking if
the values in pg_class are different and only updating if they are,
while keeping the code for pg_statistic where it'll just always update.

> > Hmm, that's a valid point, so a NULL passed in would need to set that
> > value actually to NULL, presumably.  Perhaps then we should have
> > pg_set_relation_stats() be strict and have pg_set_attribute_stats()
> > handles NULLs passed in appropriately, and return NULL if the relation
> > itself or attname, or other required (not NULL'able) argument passed in
> > cause the function to return NULL.
> >
>
> That's how I have relstats done in v8, and could make it do that for attr
> stats.

That'd be my suggestion, at least, but as I mention above, it's not a
position I hold very strongly.

> > (What I'm trying to drive at here is a consistent interface for these
> > functions, but one which does a no-op instead of returning an ERROR on
> > values being passed in which aren't allowable; it can be quite
> > frustrating trying to get a query to work where one of the functions
> > decides to return ERROR instead of just ignoring things passed in which
> > aren't valid.)
>
> I like the symmetry of a consistent interface, but we've already got an
> asymmetry in that the pg_class update is done non-transactionally (like
> ANALYZE does).

Don't know that I really consider that to be the same kind of thing when
it comes to talking about the interface as the other aspects we're
discussing ...

> One persistent problem is that there is no _safe equivalent to ARRAY_IN, so
> that can always fail on us, though it should only do so if the string
> passed in wasn't a valid array input format, or the values in the array
> can't coerce to the attribute's basetype.

That would happen before we even get to being called and there's not
much to do about it anyway.

> I should also point out that we've lost the ability to check if the export
> values were of a type, and if the destination column is also of that type.
> That's a non-issue in binary upgrades, but of course if a field changed
> from integers to text the histograms would now be highly misleading.
> Thoughts on adding a typname parameter that the function uses as a cheap
> validity check?

Seems reasonable to me.

> v8 attached, incorporating these suggestions plus those of Bharath and
> Bertrand. Still no pg_dump.
>
> As for pg_dump, I'm currently leading toward the TOC entry having either a
> series of commands:
>
>     SELECT pg_set_relation_stats('foo.bar'::regclass, ...);
> pg_set_attribute_stats('foo.bar'::regclass, 'id'::name, ...); ...

I'm guessing the above was intended to be SELECT ..; SELECT ..;

> Or one compound command
>
>     SELECT pg_set_relation_stats(t.oid, ...)
>          pg_set_attribute_stats(t.oid, 'id'::name, ...),
>          pg_set_attribute_stats(t.oid, 'last_name'::name, ...),
>          ...
>     FROM (VALUES('foo.bar'::regclass)) AS t(oid);
>
> The second one has the feature that if any one attribute fails, then the
> whole update fails, except, of course, for the in-place update of pg_class.
> This avoids having an explicit transaction block, but we could get that
> back by having restore wrap the list of commands in a transaction block
> (and adding the explicit lock commands) when it is safe to do so.

Hm, I like this approach as it should essentially give us the
transaction block we had been talking about wanting but without needing
to explicitly do a begin/commit, which would add in some annoying
complications.  This would hopefully also reduce the locking concern
mentioned previously, since we'd get the lock needed in the first
function call and then the others would be able to just see that we've
already got the lock pretty quickly.

> Subject: [PATCH v8] Create pg_set_relation_stats, pg_set_attribute_stats.

[...]

> +Datum
> +pg_set_relation_stats(PG_FUNCTION_ARGS)

[...]

> +    ctup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
> +    if (!HeapTupleIsValid(ctup))
> +        elog(ERROR, "pg_class entry for relid %u vanished during statistics import",
> +             relid);

Maybe drop the 'during statistics import' part of this message?  Also
wonder if maybe we should make it a regular ereport() instead, since it
might be possible for a user to end up seeing this?

> +    pgcform = (Form_pg_class) GETSTRUCT(ctup);
> +
> +    reltuples = PG_GETARG_FLOAT4(P_RELTUPLES);
> +    relpages = PG_GETARG_INT32(P_RELPAGES);
> +    relallvisible = PG_GETARG_INT32(P_RELALLVISIBLE);
> +
> +    /* Do not update pg_class unless there is no meaningful change */

This comment doesn't seem quite right.  Maybe it would be better if it
was in the positive, eg: Only update pg_class if there is a meaningful
change.

Rest of it looks pretty good to me, at least.

Thanks!

Stephen

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Confine vacuum skip logic to lazy_scan_skip
Следующее
От: Shruthi Gowda
Дата:
Сообщение: Re: remaining sql/json patches