Обсуждение: UPDATE

Поиск
Список
Период
Сортировка

UPDATE

От
c k
Дата:
I am using PG on windows for development. While testing some queries I found that updates are much slower as compared to other database systems. (On avarage MySQL updates 4-5 times faster than PG in a single unique index on primary key) Why?
Another thing that I noticed is when any single column in a table is updated and having no index many (in my case 7) indexes on different columns, It takes about 180-200 secs. But when any single column in a table is updated and having many (in my case 7) indexes on different columns, but does not have index on the column that is being updated, It takes about 800-900 secs. Why?
What can be done for such updates to make them faster?
Thanks
CPKulkarni

Re: UPDATE

От
c k
Дата:
1.5 GB RAM, Pentium dual core, single 80 GB disk, Windows XP sp3, PostgreSQL 8.3.6
a table is like following:
-- Table: accgroups

-- DROP TABLE accgroups;

CREATE TABLE accgroups
(
  accgroupid serial NOT NULL,
  accgroupidname character varying(150) NOT NULL DEFAULT ''::character varying,
  accgroupname character varying,
  createdby integer DEFAULT 0,
  createdtimestamp timestamp without time zone DEFAULT ('now'::text)::timestamp without time zone,
  locked smallint,
  lastmodifiedby integer DEFAULT 0,
  lastmodifiedtimestamp timestamp without time zone,
  remark character varying(255) DEFAULT NULL::character varying,
  cobranchid integer DEFAULT 0,
.
.
.
.
  againstid integer DEFAULT 0,
)
WITH (OIDS=FALSE);
This table has currently 1,65,000+ rows.
Query is fairly simple.
update accgroups set cobranchid=2 where cobranchid=1;
Thanks
CPKulkarni


On Thu, Feb 19, 2009 at 6:54 PM, Richard Huxton <dev@archonet.com> wrote:
> What can be done for such updates to make them faster?

You're going to have to provide some sort of information before anyone
can help you.

You might want to start with: basic hardware details: ram, number of
disks etc, O.S. version, PostgreSQL version, basic configuration changes
you've made, sample queries that are slow along with explain analyse
output (if it's not just a blanket update), table definitions...

--
 Richard Huxton
 Archonet Ltd

Re: UPDATE

От
Craig Ringer
Дата:
c k wrote:
> I am using PG on windows for development. While testing some queries I
> found that updates are much slower as compared to other database
> systems. (On avarage MySQL updates 4-5 times faster than PG in a single
> unique index on primary key) Why?

Is your MySQL instance using MyISAM tables, or InnoDB?

> Another thing that I noticed is when any single column in a table is
> updated and having no index many (in my case 7) indexes on different
> columns, It takes about 180-200 secs. But when any single column in a
> table is updated and having many (in my case 7) indexes on different
> columns, but does not have index on the column that is being updated, It
> takes about 800-900 secs. Why?

Perhaps you can show us some table definitions and queries with EXPLAIN
ANALYZE output? Your explanation is quite difficult to understand.

> What can be done for such updates to make them faster?

Faster disks :-P

--
Craig Ringer

Re: UPDATE

От
Craig Ringer
Дата:
c k wrote:

> CREATE TABLE accgroups
> (
>   accgroupid serial NOT NULL,
>   accgroupidname character varying(150) NOT NULL DEFAULT ''::character
> varying,
>   accgroupname character varying,
>   createdby integer DEFAULT 0,
>   createdtimestamp timestamp without time zone DEFAULT
> ('now'::text)::timestamp without time zone,
>   locked smallint,
>   lastmodifiedby integer DEFAULT 0,
>   lastmodifiedtimestamp timestamp without time zone,
>   remark character varying(255) DEFAULT NULL::character varying,
>   cobranchid integer DEFAULT 0,
> .
> .
> .
> .
>   againstid integer DEFAULT 0,
> )
> WITH (OIDS=FALSE);

That "..." might be rather important. How big ARE these records?

Remember, PostgreSQL uses a MVCC approach to transactional isolation.
Unless Pg can take short-cuts (as it sometimes can when, for example,
there is only one active transaction) it must write NEW COPIES of each
record. It can't just update the existing ones.

Effectively, Pg must do an INSERT of the updated record then a DELETE of
the old one. All fields, not just the updated field, must be read and
written.

Some other databases use a locking approach instead. They can just write
the new values in place. The I/O required is dramatically reduced. On
the other hand, other concurrent transactions can't do anything while
you're working with those records, you're more prone to deadlock
situations, achieving decent concurrency is much more difficult, etc. If
you're using something horrifying like MyISAM tables where there IS no
rollback, there's no need to maintain a rollback log or anything like
that. Even with it the I/O is still much less than what Pg requires.

AFAIK if you run the UPDATE while there are no other concurrent
transactions, Pg will write the new values in-place. It still updates
the WAL first, but it won't create whole new copies of each record as
well. That's the case at least if the field you're updating isn't
involved in any indexes; I think doing this relies on HOT, and won't
work if the column being altered is involved in one or more indexes.

--
Craig Ringer

Re: UPDATE

От
Tom Lane
Дата:
Craig Ringer <craig@postnewspapers.com.au> writes:
> AFAIK if you run the UPDATE while there are no other concurrent
> transactions, Pg will write the new values in-place. It still updates
> the WAL first, but it won't create whole new copies of each record as
> well. That's the case at least if the field you're updating isn't
> involved in any indexes; I think doing this relies on HOT, and won't
> work if the column being altered is involved in one or more indexes.

This is not correct; PG *never* overwrites an existing record (at least
not in any user-accessible code paths).  The deal with HOT is that we
avoid making a fresh set of index entries if none of the indexed columns
changed; but that only happens if there is enough room on the same
database page for the new copy of the record.  In that case the old copy
is chained to the new one and index searches have to follow the chain
link.  (The same-page restriction is made primarily to ensure that
index lookups don't get too expensive by having to read in extra pages.)

Yes, this is a lot slower than what MyISAM does.  As noted, there are
countervailing advantages.

            regards, tom lane

Re: UPDATE

От
Craig Ringer
Дата:
Tom Lane wrote:
> Craig Ringer <craig@postnewspapers.com.au> writes:
>> AFAIK if you run the UPDATE while there are no other concurrent
>> transactions, Pg will write the new values in-place. It still updates
>> the WAL first, but it won't create whole new copies of each record as
>> well. That's the case at least if the field you're updating isn't
>> involved in any indexes; I think doing this relies on HOT, and won't
>> work if the column being altered is involved in one or more indexes.
>
> This is not correct; PG *never* overwrites an existing record (at least
> not in any user-accessible code paths).

That's what I always thought, but I encountered some odd behaviour while
trying to generate table bloat that made me think otherwise. I generated
a large table full of dummy data then repeatedly UPDATEd it. To my
surprise, though, it never grew beyond the size it had at creation time
... if the transaction running the UPDATE was the only one active.

If there were other transactions active too, the table grew as I'd expect.

Is there another explanation for this that I've missed?

> Yes, this is a lot slower than what MyISAM does.  As noted, there are
> countervailing advantages.

For example, Pg doesn't have to accept all sorts of invalid data,
truncate things to fit instead of complaining, and silently coerce other
invalid input to defaults just to avoid the possibility to having to
raise an error - since it can just roll the changes back.

Ugh.

--
Craig Ringer

Re: UPDATE

От
Tom Lane
Дата:
Craig Ringer <craig@postnewspapers.com.au> writes:
> Tom Lane wrote:
>> This is not correct; PG *never* overwrites an existing record (at least
>> not in any user-accessible code paths).

> That's what I always thought, but I encountered some odd behaviour while
> trying to generate table bloat that made me think otherwise. I generated
> a large table full of dummy data then repeatedly UPDATEd it. To my
> surprise, though, it never grew beyond the size it had at creation time
> ... if the transaction running the UPDATE was the only one active.

> If there were other transactions active too, the table grew as I'd expect.

> Is there another explanation for this that I've missed?

In 8.3 that's not unexpected: once you have two entries in a HOT chain
then a later update can reclaim the dead one and re-use its space.
(HOT can do that without any intervening VACUUM because only within-page
changes are needed.)  However, that only works when the older one is in
fact dead to all observers; otherwise it has to be kept around, so the
update chain grows.

            regards, tom lane

Re: UPDATE

От
c k
Дата:
Thanks to all replies.
The main thing is that there was no index on the said column when I am updating and there are about 40+ other columns mostly of integer and smallint types. Also in MySQL I am using InnoDB tables. For both there is a single transaction when working, No other user is connected. The major difference is that when there are indices on other columns than is being updated, it takes more time. I think the reason behind this is that MVCC. As all rows are rewritten(newly inserted with changed column value), the indices must be updated accordingly and this may take more time?

CPKulkarni

On Thu, Feb 19, 2009 at 9:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Craig Ringer <craig@postnewspapers.com.au> writes:
> Tom Lane wrote:
>> This is not correct; PG *never* overwrites an existing record (at least
>> not in any user-accessible code paths).

> That's what I always thought, but I encountered some odd behaviour while
> trying to generate table bloat that made me think otherwise. I generated
> a large table full of dummy data then repeatedly UPDATEd it. To my
> surprise, though, it never grew beyond the size it had at creation time
> ... if the transaction running the UPDATE was the only one active.

> If there were other transactions active too, the table grew as I'd expect.

> Is there another explanation for this that I've missed?

In 8.3 that's not unexpected: once you have two entries in a HOT chain
then a later update can reclaim the dead one and re-use its space.
(HOT can do that without any intervening VACUUM because only within-page
changes are needed.)  However, that only works when the older one is in
fact dead to all observers; otherwise it has to be kept around, so the
update chain grows.

                       regards, tom lane

Re: UPDATE

От
Dennis Brakhane
Дата:
On Thu, Feb 19, 2009 at 7:08 PM, c k <shreeseva.learning@gmail.com> wrote:
> As
> all rows are rewritten(newly inserted with changed column value), the
> indices must be updated accordingly and this may take more time?

I'd think so, but I'm no expert.

If you have many and frequent updates, it might be worthwhile to
adjust the fillfactor of the table, so the updated data can be put in
the same page.