Обсуждение: Deleting vs foreign keys
I have an application that makes heavy use of foreign keys all over the tables. This is very nice since the data is very consistent. There also this "central" table which holds "sites" in it. A site pretty much is the crux of it all. Deleting a site will very precisely eliminate all data regarding it, since there's CASCADE on delete's everywhere. The only trouble I'm having is that the original developers apparently didn't account for large amounts of data. I'm starting to get a LOT of data in some tables, and nowadays deleting a site will take a disgusting amount of time (in the range of tens of minutes). It's impossible to do it via Web, so I have to issue the central delete from the shell and leave it running until it's done. Is there any way I can make things better? I could queue site drops and have a cronjob pick them up instead of deleting "live" via Web, but that's just silly patchwork IMHO.
On Tue, Oct 25, 2005 at 12:59:27AM +0300, WireSpot wrote: > I have an application that makes heavy use of foreign keys all over > the tables. This is very nice since the data is very consistent. There > also this "central" table which holds "sites" in it. A site pretty > much is the crux of it all. Deleting a site will very precisely > eliminate all data regarding it, since there's CASCADE on delete's > everywhere. > > The only trouble I'm having is that the original developers apparently > didn't account for large amounts of data. I'm starting to get a LOT of > data in some tables, and nowadays deleting a site will take a > disgusting amount of time (in the range of tens of minutes). Are there indexes on the foreign key columns? That is, given the following example, CREATE TABLE foo (id integer PRIMARY KEY); CREATE TABLE bar (fooid integer NOT NULL REFERENCES foo ON DELETE CASCADE); do you have an index on bar.fooid? Also, do you regularly vacuum and analyze the database? -- Michael Fuhr
On 10/25/05, Michael Fuhr <mike@fuhr.org> wrote: > Are there indexes on the foreign key columns? That is, given > the following example, > > CREATE TABLE foo (id integer PRIMARY KEY); > CREATE TABLE bar (fooid integer NOT NULL REFERENCES foo ON DELETE CASCADE); > > do you have an index on bar.fooid? There are no indexes on the referring fields in any tables. Would this make a huge difference? > Also, do you regularly vacuum and analyze the database? Yes.
On Tue, Oct 25, 2005 at 12:18:34PM +0300, WireSpot wrote: > On 10/25/05, Michael Fuhr <mike@fuhr.org> wrote: > > Are there indexes on the foreign key columns? That is, given > > the following example, > > > > CREATE TABLE foo (id integer PRIMARY KEY); > > CREATE TABLE bar (fooid integer NOT NULL REFERENCES foo ON DELETE CASCADE); > > > > do you have an index on bar.fooid? > > There are no indexes on the referring fields in any tables. Would this > make a huge difference? If bar is at all large, yes. Consider if you delete something from foo. The system has to then scan bar to find all matching fooids. If there is no index on that column it will do a seq scan. If you delete everything from foo, it will (sequentially) scan bar once for every row in foo... BTW, have you considered TRUNCATE? (although it may not work with foreign keys). Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Вложения
On Tue, 2005-10-25 at 11:18, WireSpot wrote: > On 10/25/05, Michael Fuhr <mike@fuhr.org> wrote: > > Are there indexes on the foreign key columns? That is, given > > the following example, > > > > CREATE TABLE foo (id integer PRIMARY KEY); > > CREATE TABLE bar (fooid integer NOT NULL REFERENCES foo ON DELETE CASCADE); > > > > do you have an index on bar.fooid? > > There are no indexes on the referring fields in any tables. Would this > make a huge difference? Yes it will. Think about it: when the delete cascades, the rows which have to be deleted in the referring table have to be found first... if there's no index, a full table scan will be done for that. An index will allow the delete process to quickly find the referring rows which have to be deleted too. So it's a good practice to create indexes on fields referring to parent tables. Cheers, Csaba.
On 10/25/05, Martijn van Oosterhout <kleptog@svana.org> wrote: > BTW, have you considered TRUNCATE? (although it may not work with > foreign keys). It doesn't :) "TRUNCATE cannot be used if there are foreign-key references to the table from other tables. Checking validity in such cases would require table scans, and the whole point is not to do one." Besides, emptying entire tables never happens, there's just selective deletes from the "master" table. Thanks for the tip, guys. I'll go punch in some indexes and I'll be back to report how much of a difference it made.
On 10/25/05, WireSpot <wirespot@gmail.com> wrote: > Thanks for the tip, guys. I'll go punch in some indexes and I'll be > back to report how much of a difference it made. Adding indexes made the dropping of entries for the "master" table roughly 6 (six) times faster. It's definitely an improvement and I suspect it will benefit the application in many other ways. It's still not perfect though. Based on the figures I've got, some of the biggest entries in the "master" table would still take about 45 minutes to delete along with all their foreign key referrals. So as far as practical purposes are concerned, I'm back to square one. Fortunately, in this case, all the tables hold a redundant site ID. The only practical alternative I see is to drop all the foreign keys, delete from all the tables based on that redundant ID, then add the keys back and hope I don't hit any conflicts. This is still not perfect, because while I do this there cannot be any regular access to the database (can't afford to while foreign keys are down). But if the drop becomes blazing fast in this manner, it would be worth it... I guess. Any advice about what went wrong when this database was designed? Was there any way that the designers could've kept the consistency offered by foreign keys and at the same time allow for fast deletes?
I guess your master table has a considerable tree of child records for each deleted record, that's why the deletion takes so long. We have this situation too on top level object deletion. Our solution is to make it asynchronous, i.e. the user requests it and then a background job does it. There's no way to make such deletes blazing fast. Other solutions would be to not delete at all, but mark the top level record as deleted (using a deleted flag on it). That has the disadvantage that it's children can still be accessed, unless all access code checks the master active flag, even for child access. Other disadvantage is that all the data still is in the data base, taking up space, possibly slowing down operation. Now that could also be an advantage if you discover you shouldn't have been deleted that record just after pushing the "delete" button and receiving the confirmation that everything was deleted... Cheers, Csaba. On Tue, 2005-10-25 at 17:01, WireSpot wrote: > On 10/25/05, WireSpot <wirespot@gmail.com> wrote: > > Thanks for the tip, guys. I'll go punch in some indexes and I'll be > > back to report how much of a difference it made. > > Adding indexes made the dropping of entries for the "master" table > roughly 6 (six) times faster. It's definitely an improvement and I > suspect it will benefit the application in many other ways. > > It's still not perfect though. Based on the figures I've got, some of > the biggest entries in the "master" table would still take about 45 > minutes to delete along with all their foreign key referrals. So as > far as practical purposes are concerned, I'm back to square one. > > Fortunately, in this case, all the tables hold a redundant site ID. > The only practical alternative I see is to drop all the foreign keys, > delete from all the tables based on that redundant ID, then add the > keys back and hope I don't hit any conflicts. > > This is still not perfect, because while I do this there cannot be any > regular access to the database (can't afford to while foreign keys are > down). But if the drop becomes blazing fast in this manner, it would > be worth it... I guess. > > Any advice about what went wrong when this database was designed? Was > there any way that the designers could've kept the consistency offered > by foreign keys and at the same time allow for fast deletes? > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings
On 10/25/05, Csaba Nagy <nagy@ecircle-ag.com> wrote: > I guess your master table has a considerable tree of child records for > each deleted record, that's why the deletion takes so long. We have this > situation too on top level object deletion. Our solution is to make it > asynchronous, i.e. the user requests it and then a background job does > it. There's no way to make such deletes blazing fast. It's a bit puzzling, frankly, to see a nice example of engineering (which is what I consider a consistent DB design) foiled by a performance and practical issue. :( That's it then. I'll ask the designers of the application to consider implementing the background delete, since it's obvious I can't solve it satisfactorily solely on the DB side.
WireSpot wrote: > On 10/25/05, Csaba Nagy <nagy@ecircle-ag.com> wrote: > >>I guess your master table has a considerable tree of child records for >>each deleted record, that's why the deletion takes so long. We have this >>situation too on top level object deletion. Our solution is to make it >>asynchronous, i.e. the user requests it and then a background job does >>it. There's no way to make such deletes blazing fast. > > > It's a bit puzzling, frankly, to see a nice example of engineering > (which is what I consider a consistent DB design) foiled by a > performance and practical issue. :( Don't kid yourself that there's a solution to this. If you've got a site with a lot of data in a lot of dependent tables, then have to expend a certain amount of effort to delete them. Anything PG did to make your delete faster would almost certainly slow down updates/inserts/selects. Managing a lot of data takes a lot of effort. -- Richard Huxton Archonet Ltd