Обсуждение: select is fast, update based on same where clause is slow

Поиск
Список
Период
Сортировка

select is fast, update based on same where clause is slow

От
"Jeff Barrett"
Дата:
I have a select statement that returns a set of 74,000+ results back in
under a minute as follows:

select s.sessid, s.membid, s.datetime
from sessions2 s, (select min(datetime) as datetime, membid                             from sessions2
          where membid is not null                             group by membid) as minsess
 
where s.membid = minsess.membid
and s.datetime = minsess.datetime;

The final cost from explain for the above select is 22199.15 ... 24318.40
with rows = 5 and width = 28.

Then I issue an update as follows (to update those 74,000+ rows):
update sessions2 set sinceinception = 0
from sessions2 s, (select min(datetime) as datetime, membid from sessions2
group by membid) as mindate
where s.membid = mindate.membid
and s.datetime = mindate.datetime;

The final cost from explain for the above update is 31112.11...98869.91 with
rows = 2013209 and width=87.

This update statement has been left running over night and does not
complete. The ram usage on the machine is at about 3/4 capacity (800mb)
during this time and CPU usage is near 100%. The machine has the -F option
set and memory segments of 200mb and is running 7.1.2.

What could be causing this update statement to not complete?
Why are the costs so different since it seems to me that besides the cost of
the update they are the same query?

Any help would be great!

Jeff Barrett




Re: select is fast, update based on same where clause is slow

От
Stephan Szabo
Дата:
On Fri, 21 Sep 2001, Jeff Barrett wrote:

> I have a select statement that returns a set of 74,000+ results back in
> under a minute as follows:
> 
> select s.sessid, s.membid, s.datetime
> from sessions2 s, (select min(datetime) as datetime, membid
>                               from sessions2
>                               where membid is not null
>                               group by membid) as minsess
> where s.membid = minsess.membid
> and s.datetime = minsess.datetime;
> 
> The final cost from explain for the above select is 22199.15 ... 24318.40
> with rows = 5 and width = 28.
> 
> Then I issue an update as follows (to update those 74,000+ rows):
> update sessions2 set sinceinception = 0
> from sessions2 s, (select min(datetime) as datetime, membid from sessions2
> group by membid) as mindate
> where s.membid = mindate.membid
> and s.datetime = mindate.datetime;
> 
> The final cost from explain for the above update is 31112.11...98869.91 with
> rows = 2013209 and width=87.
> 
> This update statement has been left running over night and does not
> complete. The ram usage on the machine is at about 3/4 capacity (800mb)
> during this time and CPU usage is near 100%. The machine has the -F option
> set and memory segments of 200mb and is running 7.1.2.
> 
> What could be causing this update statement to not complete?
> Why are the costs so different since it seems to me that besides the cost of
> the update they are the same query?

I thought that the updated table is always in your from list (implicitly),
so you'd want:
update sessions2 set sinceinception = 0
from (select min(datetime) as datetime, membid from sessions2 group bymembid) as mindate
where sessions2.membid=mindate.membid and 
sessions2.datetime=mindate.datetime;

I think your query would be joining the s/mindate results against
sessions2.




Re: select is fast, update based on same where clause is slow

От
"Josh Berkus"
Дата:
Jeff,

I think that you're running into a syntactical problem here:

> Then I issue an update as follows (to update those 74,000+ rows):
> update sessions2 set sinceinception = 0
> from sessions2 s, (select min(datetime) as datetime, membid from
> sessions2
> group by membid) as mindate
> where s.membid = mindate.membid
> and s.datetime = mindate.datetime;

In the query above, there is no linkage between the instance of
sessions2 (sessions2) you are updating and the instance you are limiting
(s).  As a result, you are running an update on 74,000^2 rows (about 55
trillion), and if the query ever completed you would find that
sinceinception would be set to 0 for all rows in sessions2.  Your query
should read:
update sessions2 set sinceinception = 0from (select min(datetime) as datetime, membid from      sessions2 group by
membid)as mindatewhere sessions2.membid = mindate.membid  and sessions2.datetime = mindate.datetime;
 

It's a peculiarity of Update queries that the name of the updated table
cannot be aliased anywhere in the query.  To add to the confusion, in MS
Transact SQL, you are expected to repeat the name of the updated table
in the FROM clause, while in Postgres such repetition is prohibited.

This is mainly due to the fact that UPDATE ... FROM is not well-defined
in the SQL 92 standard.

-Josh


______AGLIO DATABASE SOLUTIONS___________________________                                      Josh Berkus Complete
informationtechnology      josh@agliodbs.com  and data management solutions       (415) 565-7293 for law firms, small
businesses       fax 621-2533   and non-profit organizations.      San Francisco
 


Re: select is fast, update based on same where clause is slow

От
"Jeff Barrett"
Дата:
That did the trick. Thank you for the quick detailed answer. It runs in
about a minute now.

Jeff Barrett

"Stephan Szabo" <sszabo@megazone23.bigpanda.com> wrote in message
news:Pine.BSF.4.21.0109210830550.88512-100000@megazone23.bigpanda.com...
> On Fri, 21 Sep 2001, Jeff Barrett wrote:
>
> > I have a select statement that returns a set of 74,000+ results back in
> > under a minute as follows:
> >
> > select s.sessid, s.membid, s.datetime
> > from sessions2 s, (select min(datetime) as datetime, membid
> >                               from sessions2
> >                               where membid is not null
> >                               group by membid) as minsess
> > where s.membid = minsess.membid
> > and s.datetime = minsess.datetime;
> >
> > The final cost from explain for the above select is 22199.15 ...
24318.40
> > with rows = 5 and width = 28.
> >
> > Then I issue an update as follows (to update those 74,000+ rows):
> > update sessions2 set sinceinception = 0
> > from sessions2 s, (select min(datetime) as datetime, membid from
sessions2
> > group by membid) as mindate
> > where s.membid = mindate.membid
> > and s.datetime = mindate.datetime;
> >
> > The final cost from explain for the above update is 31112.11...98869.91
with
> > rows = 2013209 and width=87.
> >
> > This update statement has been left running over night and does not
> > complete. The ram usage on the machine is at about 3/4 capacity (800mb)
> > during this time and CPU usage is near 100%. The machine has the -F
option
> > set and memory segments of 200mb and is running 7.1.2.
> >
> > What could be causing this update statement to not complete?
> > Why are the costs so different since it seems to me that besides the
cost of
> > the update they are the same query?
>
> I thought that the updated table is always in your from list (implicitly),
> so you'd want:
> update sessions2 set sinceinception = 0
> from (select min(datetime) as datetime, membid from sessions2 group by
>  membid) as mindate
> where sessions2.membid=mindate.membid and
> sessions2.datetime=mindate.datetime;
>
> I think your query would be joining the s/mindate results against
> sessions2.
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly