Обсуждение: Is stats update during COPY IN really a good idea?

Поиск
Список
Период
Сортировка

Is stats update during COPY IN really a good idea?

От
Tom Lane
Дата:
We have a TODO item* Update reltuples in COPY

I was just about to go do this when I realized that it may not be such
a hot idea after all.  The problem is that updating pg_class.reltuples
means that concurrent COPY operations will block each other, because
they want to update the same row in pg_class.  You can already see this
happen in CREATE INDEX:
create table foo(f1 int);begin;create index fooey on foo(f1);

-- in another psql do
create index fooey2 on foo(f1);

-- second backend blocks until first xact is committed or rolled back.

While this doesn't bother me for CREATE INDEX, it does bother me for
COPY, since people often use COPY to avoid per-tuple INSERT overhead.
It seems pretty likely that this will cause blocking problems for real
applications.  I think that may be a bigger problem than the benefit of
not needing a VACUUM (or, now, ANALYZE) to get the stats updated.
        regards, tom lane


Re: Is stats update during COPY IN really a good idea?

От
Bruce Momjian
Дата:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > People are using COPY into the same table at the same time? 
> 
> Yes --- we had a message from someone who was doing that (and running
> into unrelated performance issues) just last week.

OK.

> > My vote is to update pg_class.  The VACUUM takes much more time than the
> > update, and we are only updating the pg_class row, right?
> 
> What?  What does VACUUM have to do with this?

You have to VACUUM to get pg_class updated after COPY, right?

> The reason this is a significant issue is that the first COPY could be
> inside a transaction, in which case the lock will persist until that
> transaction commits, which could be awhile.

Oh, I see.  Can we disable the pg_class update if we are in a
multi-statement transaction?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Is stats update during COPY IN really a good idea?

От
Bruce Momjian
Дата:
> We have a TODO item
>     * Update reltuples in COPY
> 
> I was just about to go do this when I realized that it may not be such
> a hot idea after all.  The problem is that updating pg_class.reltuples
> means that concurrent COPY operations will block each other, because
> they want to update the same row in pg_class.  You can already see this
> happen in CREATE INDEX:

People are using COPY into the same table at the same time? 

> While this doesn't bother me for CREATE INDEX, it does bother me for
> COPY, since people often use COPY to avoid per-tuple INSERT overhead.
> It seems pretty likely that this will cause blocking problems for real
> applications.  I think that may be a bigger problem than the benefit of
> not needing a VACUUM (or, now, ANALYZE) to get the stats updated.

Oh, well we can either decide to do it or remove the TODO item.  Either
way we win!

My vote is to update pg_class.  The VACUUM takes much more time than the
update, and we are only updating the pg_class row, right?  Can't we just
start a new transaction and update the pg_class row, that way we don't
have to open it for writing during the copy.

FYI, I had a 100k deep directory that caused me problems this morning. 
Just catching up.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Is stats update during COPY IN really a good idea?

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> People are using COPY into the same table at the same time? 

Yes --- we had a message from someone who was doing that (and running
into unrelated performance issues) just last week.

> My vote is to update pg_class.  The VACUUM takes much more time than the
> update, and we are only updating the pg_class row, right?

What?  What does VACUUM have to do with this?

The reason this is a significant issue is that the first COPY could be
inside a transaction, in which case the lock will persist until that
transaction commits, which could be awhile.

> Can't we just start a new transaction and update the pg_class row,
> that way we don't have to open it for writing during the copy.

No, we cannot; requiring COPY to happen outside a transaction block is
not acceptable.
        regards, tom lane


Re: Is stats update during COPY IN really a good idea?

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> My vote is to update pg_class.  The VACUUM takes much more time than the
> update, and we are only updating the pg_class row, right?
>> 
>> What?  What does VACUUM have to do with this?

> You have to VACUUM to get pg_class updated after COPY, right?

But doing this is only interesting if you need to update reltuples in
order to get the planner to generate reasonable plans.  In reality, if
you've added enough data to cause the plans to shift, you probably ought
to do an ANALYZE anyway to update pg_statistic.  Given that ANALYZE is a
lot cheaper than it used to be, I think that the notion of making COPY
do this looks fairly obsolete anyhow.

> Oh, I see.  Can we disable the pg_class update if we are in a
> multi-statement transaction?

Ugh.  Do you really want COPY's behavior to depend on context like that?

If you did want context-dependent behavior, a saner approach would be to
only try to update reltuples if the copy has more than, say, doubled the
old value.  This would be likely to happen in bulk load and unlikely to
happen in concurrent-insertions-that-choose-to-use-COPY.  But I'm not
convinced we need it at all.
        regards, tom lane


Re: Is stats update during COPY IN really a good idea?

От
Bruce Momjian
Дата:
> > You have to VACUUM to get pg_class updated after COPY, right?
> 
> But doing this is only interesting if you need to update reltuples in
> order to get the planner to generate reasonable plans.  In reality, if
> you've added enough data to cause the plans to shift, you probably ought
> to do an ANALYZE anyway to update pg_statistic.  Given that ANALYZE is a
> lot cheaper than it used to be, I think that the notion of making COPY
> do this looks fairly obsolete anyhow.

Yes, but remember, we are trying to catch ignorant cases, not
experienced people.

> > Oh, I see.  Can we disable the pg_class update if we are in a
> > multi-statement transaction?
> 
> Ugh.  Do you really want COPY's behavior to depend on context like that?
> 
> If you did want context-dependent behavior, a saner approach would be to
> only try to update reltuples if the copy has more than, say, doubled the
> old value.  This would be likely to happen in bulk load and unlikely to
> happen in concurrent-insertions-that-choose-to-use-COPY.  But I'm not
> convinced we need it at all.

Maybe not.  The COPY/pg_class hack is just to quiet people who have done
COPY and forgotten VACUUM or ANALYZE.  Maybe the user is only performing
a few operations before deleting the table.  Updating pg_class does help
in that case.


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026