Обсуждение: Alter/update large tables - VERRRY annoying behaviour!
Hi, everybody! I was wonderring if it is a known "feature" to begin with, and, if there are any plans to fix it in future? I had two very large tables in my database (about 30 million rows each), connected by a foreign key, and wanted to merge them together... Something like this: create table a ( id int primary key, some_data int ); create table b ( id int unique references a, other_data int ); So, what I did was: alter table a add other_data int; update a set other_data = b.other_data from b where b.id=a.id; This took me awfully long, but worked (I guess). I say 'I guess', because I wasn't able so far to verify that - when I triued to do select * from a limit 1; It just hungs on me ... at least, it looks like it does. Lucky me, I have compiled the backend from sources with full debug info, because if I hadn't done that, (as most users), I would certainly had thought, that my database is hopelessly corrupted, and would have to recreate it from scratch :-( Instead, I loaded the whole thing into a debugger, because that seems to be the only way to figure out what the hell it is thinking about... So, what I found out was that it seems to have recreated my entire table when I updated it, and left all the old tuples in it as well, so, my 'select *...limit 1' query was cycling through 30 million deleted tuples, trying to find the first one that was still valid, and that's waht was taking that long time... First of all, a question for you - is ANY update to a table equivalent (in this respect) to a delete+insert? Or is my problem specific to the fact that I have altered the table and add new columns? Now, I understand, that, if I vacuum'ed it, the problem would have been resolved... The problems I have with it though are: - As I said, the behaviour was so unexpected, I would have trashed the whole databse, if I wasn't able to debug it... If there is no other possible solution, I think, at the very least, it should give the user some indication that it's not hopelessly hung, when doing that query... - Vacuum, isn't the speediest thing in the world too (it's been running for a hour now, and still has not finished). I was hoping to complete modifying my schema first, and then just vacuum everything once. So, it would be REALLY, REALLY helpful for situations like that, if PG was smart enough to keep track somehow of those deleted tuples, to avoid having to scan through them all every time... In my particular situation, the solution would be trivial (just remembering the address of the first valid tuple would suffice - because the entire table was updated)... I am not familiar enough with internals to suggest anything more general than this, but EVEN fixing only this particular scenario, would, I believe, be extremely useful.... Do you agree? Thanks a lot! Dima.
On Mon, 15 Apr 2002 13:07:20 -0400 "Dmitry Tkach" <dmitry@openratings.com> wrote: > Hi, everybody! Hi Dmitry! Don't cross-post! It's annoying! > This took me awfully long, but worked (I guess). > I say 'I guess', because I wasn't able so far to verify that - when I triued to do > > select * from a limit 1; > > It just hungs on me ... at least, it looks like it does. This didn't hang, it just requires a sequential scan of the whole table. As you observe below, it will also need to scan through dead tuples, but that is just a product of MVCC and there's no real way around it. Once you VACUUM the dead tuples will be removed and sequential scans should be fast once more. And before assuming that something has hung, it's a good idea to look at the output of EXPLAIN for that query, as well as monitoring system performance (through top, vmstat, etc) to see what the system is doing. > Lucky me, I have compiled the backend from sources with full debug info, because if I hadn't done that, > (as most users), I would certainly had thought, that my database is hopelessly corrupted, and would have to > recreate it from scratch :-( That's a ludicrous conclusion. > First of all, a question for you - is ANY update to a table equivalent (in this respect) to a delete+insert? Yes, AFAIK -- MVCC requires this. > - Vacuum, isn't the speediest thing in the world too (it's been running for a hour now, and still has not finished). Is this 7.2? If not, VACUUM should be substantially faster in 7.2. In any case, you'll always want to VACUUM or VACUUM FULL (and ANALYZE) when you change your tables in such a dramatic fashion. Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC
Neil Conway wrote:<br /><blockquote cite="mid:20020415142451.1d8a21d0.nconway@klamath.dyndns.org" type="cite"><pre wrap="">OnMon, 15 Apr 2002 13:07:20 -0400<br />"Dmitry Tkach" <a class="moz-txt-link-rfc2396E" href="mailto:dmitry@openratings.com"><dmitry@openratings.com></a>wrote:<br /></pre><blockquote type="cite"><pre wrap="">Hi,everybody!<br /></pre></blockquote><pre wrap=""><br />Hi Dmitry! Don't cross-post! It's annoying!<br /></pre></blockquote>What do you mean by 'cross-post'?<br /> Are you saying that posting to several lists at a time is annoying?<br/> I just thought, that this problem might be interesting to people, who read those (and not necessarily ALL<br /> of them)... What's annoying about it?<br /><blockquote cite="mid:20020415142451.1d8a21d0.nconway@klamath.dyndns.org"type="cite"><pre wrap=""><br /></pre><blockquote type="cite"><prewrap="">This took me awfully long, but worked (I guess).<br />I say 'I guess', because I wasn't able so farto verify that - when I triued to do<br /><br />select * from a limit 1;<br /><br />It just hungs on me ... at least,it looks like it does.<br /></pre></blockquote><pre wrap=""><br />This didn't hang, it just requires a sequential scanof the whole table.</pre></blockquote> I know it does (as I said below). The point is that it SHOULD NOT, and especially,that I can't imagine anyone, not familiar with postgres internals to expect that it would - all it needs to dois to grab the first row and return immediately.<br /> That's what it would do, if you just create a new table and populateit with data.<br /><br /><blockquote cite="mid:20020415142451.1d8a21d0.nconway@klamath.dyndns.org" type="cite"><prewrap=""><br />As you observe below, it will also need to scan through dead tuples,</pre></blockquote> Not'also' - JUST the dead ones! That's what's especially annoying about it - as soon as it finds the first tuple, that'snot dead, it returns.<br /><blockquote cite="mid:20020415142451.1d8a21d0.nconway@klamath.dyndns.org" type="cite"><prewrap=""><br />but that is just a product of MVCC and there's no real way around<br />it.</pre></blockquote>My whole point is that I don't believe it (that there is no way around) :-)<br /> For one thing,I have never seen ANY database engine (Oracle, Informix, DB2) that would take more than a second to get the first rowfrom a table, regardless of what has been done to that table before.<br /> That (and my common sense too) tells me thatthere MUST be a 'way around it'. <br /> I can see, that it's not currently implemented in postgres, but do believe (andthat's the whole point of me posting that message in the first place) that it is a huge usability issue and really needsto be fixed.<br /><blockquote cite="mid:20020415142451.1d8a21d0.nconway@klamath.dyndns.org" type="cite"><pre wrap="">Once you VACUUM the dead tuples will be removed and sequential<br />scans should be fast once more.<br /></pre></blockquote><br/> Yeah... I hope so. I am running vacuum on that table. It's been running for 6 hours now and stillhas not finished. <br /> Doesn't it look to you like a little too much trouble to go through just to take a look atthe first row of a table ? :-)<br /><br /> And, once again, I am not done modifying that schema - this is just an intermediatestep, which means, I will have to do the vacuum all over when I am finished...<br /><br /> This seems like WAYtoo much trouble to me :-(<br /><br /><blockquote cite="mid:20020415142451.1d8a21d0.nconway@klamath.dyndns.org" type="cite"><prewrap=""><br />And before assuming that something has hung, it's a good idea to<br />look at the output ofEXPLAIN for that query, as well as monitoring<br />system performance (through top, vmstat, etc) to see what the<br />systemis doing.<br /></pre></blockquote> Yeah, right...<br /><br /> explain select * from a limit 1;<br /> NOTICE: QUERYPLAN:<br /><br /> Limit (cost=0.00..1.01 rows=1 width=46)<br /> -> Seq Scan on a (cost=0.00..32529003.00 rows=32243660width=46)<br /><br /> EXPLAIN<br /><br /> There is absolutely nothing in this plan, that would suggest it willgo on executing for ages...<br /> Look at the 'cost' value for example...<br /> In any event, there is nothing differentin this plan from what I was getting before I modified the table (when the query would take just a few millisecondsto be executed).<br /><br /> As for monitoring system performance... Well, I could see it maxing out on CPU usageand disk IO at times... How exactly does it help me to realize it did not hung?<br /><br /> (Let me clarify that - by'hung' I mean 'not going to return the results in any reasonable time', not necessarily 'not doing anything at all')<br/><br /><br /><blockquote cite="mid:20020415142451.1d8a21d0.nconway@klamath.dyndns.org" type="cite"><pre wrap=""><br/></pre><blockquote type="cite"><pre wrap="">Lucky me, I have compiled the backend from sources with full debuginfo, because if I hadn't done that,<br />(as most users), I would certainly had thought, that my database is hopelesslycorrupted, and would have to<br />recreate it from scratch :-(<br /></pre></blockquote><pre wrap=""><br />That'sa ludicrous conclusion.<br /></pre></blockquote><br /> Why is it?<br /><blockquote cite="mid:20020415142451.1d8a21d0.nconway@klamath.dyndns.org"type="cite"><pre wrap=""><br /></pre><blockquote type="cite"><prewrap="">First of all, a question for you - is ANY update to a table equivalent (in this respect) to a delete+insert?<br/></pre></blockquote><pre wrap=""><br />Yes, AFAIK -- MVCC requires this.<br /></pre></blockquote><br />What's MVCC?<br /><blockquote cite="mid:20020415142451.1d8a21d0.nconway@klamath.dyndns.org" type="cite"><pre wrap=""><br/></pre><blockquote type="cite"><pre wrap="">- Vacuum, isn't the speediest thing in the world too (it's been runningfor a hour now, and still has not finished).<br /></pre></blockquote><pre wrap=""><br />Is this 7.2? If not, VACUUMshould be substantially faster in 7.2.</pre></blockquote> Yes, it is 7.2<br /><blockquote cite="mid:20020415142451.1d8a21d0.nconway@klamath.dyndns.org"type="cite"><pre wrap=""><br />In any case, you'll always wantto VACUUM or VACUUM FULL (and<br />ANALYZE) when you change your tables in such a dramatic fashion.<br /><br /></pre></blockquote><br/> I know... Once again, I was hoping to be able to complete my changes before doing the vacuum :-(<br/><br /> Thanks for your reply!<br /><br /> Dima<br /><br /><br />
> Hi Dmitry! Don't cross-post! It's annoying! Is overly cross posting or HTML mail more annoying? I can't decide. :-) -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian wrote:<br /><blockquote cite="mid:200204152034.g3FKYAx21352@candle.pha.pa.us" type="cite"><blockquote type="cite"><prewrap=""> Hi Dmitry! Don't cross-post! It's annoying!<br /></pre></blockquote><pre wrap=""><br />Is overlycross posting or HTML mail more annoying? I can't decide. :-)<br /><br /></pre></blockquote> Do you mean I am postingin HTML????<br /> I am sorry, if that's the case - I was sure that I have turned it off :-(<br /> Somehow, when ICC to myself, I am getting a text-only message...<br /> Do you see it as HTML?<br /><br /> Dima<br /><br /><br />
On Mon, 15 Apr 2002, Dmitry Tkach wrote: > Neil Conway wrote: > >> On Mon, 15 Apr 2002 13:07:20 -0400 >> "Dmitry Tkach" <dmitry@openratings.com> wrote: > >>> This took me awfully long, but worked (I guess). >>> I say 'I guess', because I wasn't able so far to verify that - when I triued to do >>> select * from a limit 1; >>> It just hungs on me ... at least, it looks like it does. > >> This didn't hang, it just requires a sequential scan of the whole table. > > I know it does (as I said below). The point is that it SHOULD NOT, and > especially, that I can't imagine anyone, not familiar with postgres > internals to expect that it would - all it needs to do is to grab the > first row and return immediately. > That's what it would do, if you just create a new table and populate it > with data. > >> As you observe below, it will also need to scan through dead tuples, > > Not 'also' - JUST the dead ones! That's what's especially annoying about > it - as soon as it finds the first tuple, that's not dead, it returns. > > but that is just a product of MVCC and there's no real way around > it. > > My whole point is that I don't believe it (that there is no way around) > :-) > For one thing, I have never seen ANY database engine (Oracle, Informix, > DB2) that would take more than a second to get the first row from a > table, regardless of what has been done to that table before. > That (and my common sense too) tells me that there MUST be a 'way around > it'. > I can see, that it's not currently implemented in postgres, but do > believe (and that's the whole point of me posting that message in the > first place) that it is a huge usability issue and really needs to be > fixed. Ok, so I realise that addressing this issue of using a limit of 1 doesn't address anything associated with using this table for something useful, however I find it an interesting point. All that has been asked for is that a single row be returned. This doesn't have to be the first valid row at all. Indeed isn't it one of axioms of SQL that data is returned in some arbitrary order unless specifically forced into an order? Therefore shouldn't doing SELECT * FROM x LIMIT 1 use an index, any index, to just fetch a single row? Or, is this an effect of a row needing to be visited in order to determine it's 'visibility' as proposed as a reason why a sequential scan was being performed for one of my queries I posted about recently? I am still a little confused about that though. If there is an index and deletes have been commited, shouldn't the index have been updated and forced to forget about the deleted rows unless there is a transaction open that can still access those deleted items? As for the rest of the argument, is it constructive? I for one thought the original poster had done more than a normal user, me included, would have done before posting. >[stuff deleted] > >>> First of all, a question for you - is ANY update to a table equivalent (in this respect) to a delete+insert? > >> Yes, AFAIK -- MVCC requires this. > > What's MVCC? Funny, I was about to ask that question. Something about variable size of fields in the physical storage? > >>> - Vacuum, isn't the speediest thing in the world too (it's been running for a hour now, and still has not finished). > >> Is this 7.2? If not, VACUUM should be substantially faster in 7.2. > > Yes, it is 7.2 Pleased I haven't tried what I'm doing in a version older than 7.2 in that case, and my table's only 1 million rows. > [more deletions] However, a useful thread, since it explains a 'feature' a ran into earlier today after managing about 4/5 of an update of every row of my table. I had been thinking the problem was large storage usage by the txtidx type, I realise now it is the updated = deleted+inserted storage requirement. I'm thinking of changing what I'm doing to a way I considered earlier [without thinking about this thread] so I can stay with in the limits or my resources. -- Nigel J. Andrews Director --- Logictree Systems Limited Computer Consultants
It's sad replying to my own question [of sorts] but... On Mon, 15 Apr 2002, Nigel J. Andrews wrote: > > On Mon, 15 Apr 2002, Dmitry Tkach wrote: > > > Neil Conway wrote: > > > >> On Mon, 15 Apr 2002 13:07:20 -0400 > >> "Dmitry Tkach" <dmitry@openratings.com> wrote: > > > >>> First of all, a question for you - is ANY update to a table equivalent (in > this respect) to a delete+insert? > > > >> Yes, AFAIK -- MVCC requires this. > > > > What's MVCC? > > Funny, I was about to ask that question. Something about variable size of > fields in the physical storage? I've remembered, alsmot: Multi Version ConCurrency? [plenty deleted from those quoted messages but they you all realised that or don't care] -- Nigel J. Andrews Director --- Logictree Systems Limited Computer Consultants
Nigel J. Andrews wrote: > On Mon, 15 Apr 2002, Nigel J. Andrews wrote: >>On Mon, 15 Apr 2002, Dmitry Tkach wrote: >> >>>What's MVCC? >> >>Funny, I was about to ask that question. Something about variable size of >>fields in the physical storage? > > I've remembered, alsmot: Multi Version ConCurrency? Almost: Multi-Version Concurrency Control http://www.postgresql.org/idocs/index.php?mvcc.html http://www.onlamp.com/pub/a/onlamp/2001/05/25/postgresql_mvcc.html If you need more info, Google is your friend. Jochem
Revisiting this thread, at least I think it was this thread, I don't remember anyone pointing out contrib/pgstattuple for monitoring the number of dead tuples. Just thought I'd mention it. -- Nigel J. Andrews Director --- Logictree Systems Limited Computer Consultants