Обсуждение: Re: [COMMITTERS] pgsql: Fix an O(N^2) problem in foreign key references.
Kevin Grittner <kgrittn@postgresql.org> writes: > Fix an O(N^2) problem in foreign key references. Judging from the buildfarm, this patch is broken under CLOBBER_CACHE_ALWAYS. See friarbird's results in particular. I might be too quick to finger this patch, but nothing else lately has touched foreign-key behavior, and the foreign_key test is where things are going south. regards, tom lane
On 09/14/2015 09:56 AM, Tom Lane wrote: > Kevin Grittner <kgrittn@postgresql.org> writes: >> Fix an O(N^2) problem in foreign key references. > > Judging from the buildfarm, this patch is broken under > CLOBBER_CACHE_ALWAYS. See friarbird's results in particular. > I might be too quick to finger this patch, but nothing else > lately has touched foreign-key behavior, and the foreign_key > test is where things are going south. I'm able to reproduce that failure with CLOBBER_CACHE_ALWAYS and it definitely is caused by this commit. Do you want to back it out for the time being? Kevin is on vacation right now. Regards, Jan -- Jan Wieck Senior Software Engineer http://slony.info
Jan Wieck <jan@wi3ck.info> writes: > On 09/14/2015 09:56 AM, Tom Lane wrote: >> Kevin Grittner <kgrittn@postgresql.org> writes: >>> Fix an O(N^2) problem in foreign key references. >> Judging from the buildfarm, this patch is broken under >> CLOBBER_CACHE_ALWAYS. See friarbird's results in particular. >> I might be too quick to finger this patch, but nothing else >> lately has touched foreign-key behavior, and the foreign_key >> test is where things are going south. > I'm able to reproduce that failure with CLOBBER_CACHE_ALWAYS and it > definitely is caused by this commit. Do you want to back it out for the > time being? Kevin is on vacation right now. I'll take a look today, and either fix it if I can find the problem quickly or back it out. (If anyone else is already looking at it, happy to defer to you...) regards, tom lane
On 09/15/2015 09:51 AM, Tom Lane wrote: > Jan Wieck <jan@wi3ck.info> writes: >> On 09/14/2015 09:56 AM, Tom Lane wrote: >>> Kevin Grittner <kgrittn@postgresql.org> writes: >>>> Fix an O(N^2) problem in foreign key references. > >>> Judging from the buildfarm, this patch is broken under >>> CLOBBER_CACHE_ALWAYS. See friarbird's results in particular. >>> I might be too quick to finger this patch, but nothing else >>> lately has touched foreign-key behavior, and the foreign_key >>> test is where things are going south. > >> I'm able to reproduce that failure with CLOBBER_CACHE_ALWAYS and it >> definitely is caused by this commit. Do you want to back it out for the >> time being? Kevin is on vacation right now. > > I'll take a look today, and either fix it if I can find the problem > quickly or back it out. > > (If anyone else is already looking at it, happy to defer to you...) I am looking at it. Unfortunately I'm not getting any usable backtrace yet. Regards, Jan -- Jan Wieck Senior Software Engineer http://slony.info
I wrote: > Jan Wieck <jan@wi3ck.info> writes: >> I'm able to reproduce that failure with CLOBBER_CACHE_ALWAYS and it >> definitely is caused by this commit. Do you want to back it out for the >> time being? Kevin is on vacation right now. > I'll take a look today, and either fix it if I can find the problem > quickly or back it out. The answer is "back it out", because this patch is fundamentally and irretrievably broken. You can't just clobber the hashtable like that, because there are very possibly active uses of hashtable entries in outer function call levels. In the particular case exhibited in http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=friarbird&dt=2015-09-15%2004%3A20%3A00 namely TRAP: FailedAssertion("!(riinfo->constraint_id == constraintOid)", File: "/pgbuild/root/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/ri_triggers.c",Line: 2838) what's evidently happened is that a cache flush occurred during the syscache fetch at line 2828, and the patch chose that moment to destroy the hashtable, in particular the entry already created at line 2817. Hashtable resets occurring later in a particular RI trigger's execution would cause other symptoms. The pre-existing coding is safe, despite the lack of any reference counting on hashtable entries, because the cache flush callback never actually destroys any entries. It just cautiously marks them invalid, which won't actually have any effect until the next ri_LoadConstraintInfo() call on that constraint OID. AFAICS, there's no way to make this work reliably without changing the cache API somehow. Reference counts would make it possible to tell whether it's safe to delete a particular entry, but they would be fairly complicated to maintain (because of the need to make sure they get decremented after an elog(ERROR)). Given that the entries are just flat chunks of memory, another answer is to have cache lookups copy the entry data into caller-provided storage rather than rely on the cache to stay valid. Not sure offhand whether that would be unreasonably expensive; sizeof(RI_ConstraintInfo) is about 600 bytes on my machine, which seems like kind of a lot, but I guess it's not enormous. For that matter, if we're gonna do it like that, it's not very clear that we need the ri_constraint_cache at all, rather than just having ri_LoadConstraintInfo fill the data from the pg_constraint syscache row every time. In any case, this patch seems like a pretty ugly and brute-force way to limit the cache size, since it destroys not only old and uninteresting entries but also valid, up-to-date, recently used entries. So I'm not very excited about thinking of some band-aid way to let this work; I don't like the approach of destroying the entire hash table in the first place. So I'm going to revert it and send it back for rework. regards, tom lane
On 09/15/2015 10:58 AM, Tom Lane wrote: > I wrote: >> Jan Wieck <jan@wi3ck.info> writes: >>> I'm able to reproduce that failure with CLOBBER_CACHE_ALWAYS and it >>> definitely is caused by this commit. Do you want to back it out for the >>> time being? Kevin is on vacation right now. > >> I'll take a look today, and either fix it if I can find the problem >> quickly or back it out. > > The answer is "back it out", because this patch is fundamentally and > irretrievably broken. You can't just clobber the hashtable like that, > because there are very possibly active uses of hashtable entries in > outer function call levels. Ok. Would it be appropriate to use (Un)RegisterXactCallback() in case of detecting excessive sequential scanning of that cache and remove all invalid entries from it at that time? Regards, Jan -- Jan Wieck Senior Software Engineer http://slony.info
Jan Wieck <jan@wi3ck.info> writes: > Would it be appropriate to use (Un)RegisterXactCallback() in case of > detecting excessive sequential scanning of that cache and remove all > invalid entries from it at that time? Don't see how unregistering the callback would help? In any case, I'm -1 on this entire concept of "let's zero out the cache at randomly chosen moments". That's far too hard to test the effects of, as we've just seen. It needs to be more predictable, and I think it ought to be more selective too. One idea is to limit the cache to say 1000 entries, and get rid of the least-used one when the threshold is exceeded. We could do that fairly cheaply if we chain the cache entries together in a doubly-linked list (see ilist.h) and move an entry to the front when referenced. For reasonably large cache sizes, that would pretty much guarantee that you weren't destroying an actively-used entry. However, it would still be a bit hard to test, because you could not safely reduce the cache size to just 1 as a test mechanism. Another idea is that it's not the size of the cache that's the problem, it's the cost of finding the entries that need to be invalidated. So we could also consider adding list links that chain only the entries that are currently marked valid. If the list gets too long, we could mark them all invalid and thereby reset the list to nil. This doesn't do anything for the cache's space consumption, but that wasn't what you were worried about anyway was it? regards, tom lane
On 09/15/2015 11:54 AM, Tom Lane wrote: > Jan Wieck <jan@wi3ck.info> writes: >> Would it be appropriate to use (Un)RegisterXactCallback() in case of >> detecting excessive sequential scanning of that cache and remove all >> invalid entries from it at that time? > Another idea is that it's not the size of the cache that's the problem, > it's the cost of finding the entries that need to be invalidated. > So we could also consider adding list links that chain only the entries > that are currently marked valid. If the list gets too long, we could mark > them all invalid and thereby reset the list to nil. This doesn't do > anything for the cache's space consumption, but that wasn't what you were > worried about anyway was it? That sounds like a workable solution to this edge case. I'll see how that works. Thanks, Jan -- Jan Wieck Senior Software Engineer http://slony.info
On 09/15/2015 12:02 PM, Jan Wieck wrote: > On 09/15/2015 11:54 AM, Tom Lane wrote: >> Jan Wieck <jan@wi3ck.info> writes: >>> Would it be appropriate to use (Un)RegisterXactCallback() in case of >>> detecting excessive sequential scanning of that cache and remove all >>> invalid entries from it at that time? > >> Another idea is that it's not the size of the cache that's the problem, >> it's the cost of finding the entries that need to be invalidated. >> So we could also consider adding list links that chain only the entries >> that are currently marked valid. If the list gets too long, we could mark >> them all invalid and thereby reset the list to nil. This doesn't do >> anything for the cache's space consumption, but that wasn't what you were >> worried about anyway was it? > > That sounds like a workable solution to this edge case. I'll see how > that works. Attached is a complete rework of the fix from scratch, based on Tom's suggestion. The code now maintains a double linked list as suggested, but only uses it to mark all currently valid entries as invalid when hashvalue == 0. If a specific entry is invalidated it performs a hash lookup for maximum efficiency in that case. This code does pass the regression tests with CLOBBER_CACHE_ALWAYS, does have the desired performance impact on restore of hundreds of thousands of foreign key constraints and does not show any negative impact on bulk loading of data with foreign keys. Regards, Jan -- Jan Wieck Senior Software Engineer http://slony.info
Вложения
Jan Wieck <jan@wi3ck.info> writes: > Attached is a complete rework of the fix from scratch, based on Tom's > suggestion. > The code now maintains a double linked list as suggested, but only uses > it to mark all currently valid entries as invalid when hashvalue == 0. > If a specific entry is invalidated it performs a hash lookup for maximum > efficiency in that case. That does not work; you're ignoring the possibility of hashvalue collisions, and for that matter not considering that the hash value is not equal to the hash key. I don't think your code is invalidating the right entry at all during a foreign key constraint update, and it certainly cannot do the right thing if there's a hash value collision. Attached is something closer to what I was envisioning; can you do performance testing on it? regards, tom lane diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c index 61edde9..db1787a 100644 *** a/src/backend/utils/adt/ri_triggers.c --- b/src/backend/utils/adt/ri_triggers.c *************** *** 40,45 **** --- 40,46 ---- #include "commands/trigger.h" #include "executor/executor.h" #include "executor/spi.h" + #include "lib/ilist.h" #include "parser/parse_coerce.h" #include "parser/parse_relation.h" #include "miscadmin.h" *************** typedef struct RI_ConstraintInfo *** 125,130 **** --- 126,132 ---- * PK) */ Oid ff_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (FK = * FK) */ + dlist_node valid_link; /* Link in list of valid entries */ } RI_ConstraintInfo; *************** typedef struct RI_CompareHashEntry *** 185,190 **** --- 187,194 ---- static HTAB *ri_constraint_cache = NULL; static HTAB *ri_query_cache = NULL; static HTAB *ri_compare_cache = NULL; + static dlist_head ri_constraint_cache_valid_list; + static int ri_constraint_cache_valid_count = 0; /* ---------- *************** ri_LoadConstraintInfo(Oid constraintOid) *** 2924,2929 **** --- 2928,2940 ---- ReleaseSysCache(tup); + /* + * For efficient processing of invalidation messages below, we keep a + * doubly-linked list, and a count, of all currently valid entries. + */ + dlist_push_tail(&ri_constraint_cache_valid_list, &riinfo->valid_link); + ri_constraint_cache_valid_count++; + riinfo->valid = true; return riinfo; *************** ri_LoadConstraintInfo(Oid constraintOid) *** 2936,2956 **** * gets enough update traffic that it's probably worth being smarter. * Invalidate any ri_constraint_cache entry associated with the syscache * entry with the specified hash value, or all entries if hashvalue == 0. */ static void InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue) { ! HASH_SEQ_STATUS status; ! RI_ConstraintInfo *hentry; Assert(ri_constraint_cache != NULL); ! hash_seq_init(&status, ri_constraint_cache); ! while ((hentry = (RI_ConstraintInfo *) hash_seq_search(&status)) != NULL) { ! if (hentry->valid && ! (hashvalue == 0 || hentry->oidHashValue == hashvalue)) ! hentry->valid = false; } } --- 2947,2987 ---- * gets enough update traffic that it's probably worth being smarter. * Invalidate any ri_constraint_cache entry associated with the syscache * entry with the specified hash value, or all entries if hashvalue == 0. + * + * Note: at the time a cache invalidation message is processed there may be + * active references to the cache. Because of this we never remove entries + * from the cache, but only mark them invalid, which is harmless to active + * uses. (Any query using an entry should hold a lock sufficient to keep that + * data from changing under it --- but we may get cache flushes anyway.) */ static void InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue) { ! dlist_mutable_iter iter; Assert(ri_constraint_cache != NULL); ! /* ! * If the list of currently valid entries gets excessively large, we mark ! * them all invalid so we can empty the list. This arrangement avoids ! * O(N^2) behavior in situations where a session touches many foreign keys ! * and also does many ALTER TABLEs, such as a restore from pg_dump. ! */ ! if (ri_constraint_cache_valid_count > 1000) ! hashvalue = 0; /* pretend it's a cache reset */ ! ! dlist_foreach_modify(iter, &ri_constraint_cache_valid_list) { ! RI_ConstraintInfo *riinfo = dlist_container(RI_ConstraintInfo, ! valid_link, iter.cur); ! ! if (hashvalue == 0 || riinfo->oidHashValue == hashvalue) ! { ! riinfo->valid = false; ! /* Remove invalidated entries from the list, too */ ! dlist_delete(iter.cur); ! ri_constraint_cache_valid_count--; ! } } }
On 09/18/2015 10:47 AM, Tom Lane wrote: > Jan Wieck <jan@wi3ck.info> writes: >> Attached is a complete rework of the fix from scratch, based on Tom's >> suggestion. > >> The code now maintains a double linked list as suggested, but only uses >> it to mark all currently valid entries as invalid when hashvalue == 0. >> If a specific entry is invalidated it performs a hash lookup for maximum >> efficiency in that case. > > That does not work; you're ignoring the possibility of hashvalue > collisions, and for that matter not considering that the hash value > is not equal to the hash key. I don't think your code is invalidating > the right entry at all during a foreign key constraint update, and > it certainly cannot do the right thing if there's a hash value collision. > > Attached is something closer to what I was envisioning; can you do > performance testing on it? Sorry for the delay. Yes, that patch also has the desired performance for restoring a schema with hundreds of thousands of foreign key constraints. Regards, Jan -- Jan Wieck Senior Software Engineer http://slony.info
Jan Wieck <jan@wi3ck.info> writes: > On 09/18/2015 10:47 AM, Tom Lane wrote: >> Attached is something closer to what I was envisioning; can you do >> performance testing on it? > Yes, that patch also has the desired performance for restoring a schema > with hundreds of thousands of foreign key constraints. Great, thanks for checking. I'll push it in a moment. regards, tom lane
Tom Lane wrote: > Jan Wieck <jan@wi3ck.info> writes: > > Attached is a complete rework of the fix from scratch, based on Tom's > > suggestion. > > > The code now maintains a double linked list as suggested, but only uses > > it to mark all currently valid entries as invalid when hashvalue == 0. > > If a specific entry is invalidated it performs a hash lookup for maximum > > efficiency in that case. > > That does not work; you're ignoring the possibility of hashvalue > collisions, and for that matter not considering that the hash value > is not equal to the hash key. I don't think your code is invalidating > the right entry at all during a foreign key constraint update, and > it certainly cannot do the right thing if there's a hash value collision. Would it make sense to remove the only the few oldest entries, instead of all of them? As is, I think this causes a storm of reloads every once in a while, if the number of FKs in the system is large enough. Maybe on a cache hit we could push the entry to the head of the list, and then remove N entries from the back of the list when the threshold is reached. I think the assumption here is that most sessions will not reach the threshold at all, except for those that access all tables such as pg_dump -- that is, most sessions are short-lived. But in cases involved connection poolers, eventually all sessions would access all tables, and thus be subject to the storm issue. (Of course, this is all hypothetical.) -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Would it make sense to remove the only the few oldest entries, instead > of all of them? As is, I think this causes a storm of reloads every > once in a while, if the number of FKs in the system is large enough. > Maybe on a cache hit we could push the entry to the head of the list, > and then remove N entries from the back of the list when the threshold > is reached. Sure, there's room for optimization of that sort, although I think we could do with some evidence that it's actually helpful. I believe that under "production" workloads InvalidateConstraintCacheCallBack won't get called much at all, so the problem's moot. (FWIW, it might take less code to put the recently-used entries at the back of the list. Then the loop in InvalidateConstraintCacheCallBack could just invalidate/delete entries if either they're targets, or the current list length exceeds the threshold.) regards, tom lane
I wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> Would it make sense to remove the only the few oldest entries, instead >> of all of them? As is, I think this causes a storm of reloads every >> once in a while, if the number of FKs in the system is large enough. >> Maybe on a cache hit we could push the entry to the head of the list, >> and then remove N entries from the back of the list when the threshold >> is reached. > (FWIW, it might take less code to put the recently-used entries at the > back of the list. Then the loop in InvalidateConstraintCacheCallBack > could just invalidate/delete entries if either they're targets, or > the current list length exceeds the threshold.) Specifically, we could change it as per attached. But this adds some cycles to the mainline usage of fetching a valid cache entry, so I'd want to see some evidence that there are use-cases it helps before applying it. regards, tom lane diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c index 018cb99..54376ed 100644 *** a/src/backend/utils/adt/ri_triggers.c --- b/src/backend/utils/adt/ri_triggers.c *************** ri_LoadConstraintInfo(Oid constraintOid) *** 2814,2820 **** ri_InitHashTables(); /* ! * Find or create a hash entry. If we find a valid one, just return it. */ riinfo = (RI_ConstraintInfo *) hash_search(ri_constraint_cache, (void *) &constraintOid, --- 2814,2821 ---- ri_InitHashTables(); /* ! * Find or create a hash entry. If we find a valid one, just return it, ! * after pushing it to the end of the list to mark it recently used. */ riinfo = (RI_ConstraintInfo *) hash_search(ri_constraint_cache, (void *) &constraintOid, *************** ri_LoadConstraintInfo(Oid constraintOid) *** 2822,2828 **** --- 2823,2833 ---- if (!found) riinfo->valid = false; else if (riinfo->valid) + { + dlist_delete(&riinfo->valid_link); + dlist_push_tail(&ri_constraint_cache_valid_list, &riinfo->valid_link); return riinfo; + } /* * Fetch the pg_constraint row so we can fill in the entry. *************** ri_LoadConstraintInfo(Oid constraintOid) *** 2931,2936 **** --- 2936,2942 ---- /* * For efficient processing of invalidation messages below, we keep a * doubly-linked list, and a count, of all currently valid entries. + * The most recently used entries are at the *end* of the list. */ dlist_push_tail(&ri_constraint_cache_valid_list, &riinfo->valid_link); ri_constraint_cache_valid_count++; *************** ri_LoadConstraintInfo(Oid constraintOid) *** 2948,2953 **** --- 2954,2965 ---- * Invalidate any ri_constraint_cache entry associated with the syscache * entry with the specified hash value, or all entries if hashvalue == 0. * + * We also invalidate entries if there are too many valid entries. This rule + * avoids O(N^2) behavior in situations where a session touches many foreign + * keys and also does many ALTER TABLEs, such as a restore from pg_dump. When + * we do kill entries for that reason, they'll be the least recently used + * ones, since they're at the front of the list. + * * Note: at the time a cache invalidation message is processed there may be * active references to the cache. Because of this we never remove entries * from the cache, but only mark them invalid, which is harmless to active *************** InvalidateConstraintCacheCallBack(Datum *** 2961,2981 **** Assert(ri_constraint_cache != NULL); - /* - * If the list of currently valid entries gets excessively large, we mark - * them all invalid so we can empty the list. This arrangement avoids - * O(N^2) behavior in situations where a session touches many foreign keys - * and also does many ALTER TABLEs, such as a restore from pg_dump. - */ - if (ri_constraint_cache_valid_count > 1000) - hashvalue = 0; /* pretend it's a cache reset */ - dlist_foreach_modify(iter, &ri_constraint_cache_valid_list) { RI_ConstraintInfo *riinfo = dlist_container(RI_ConstraintInfo, valid_link, iter.cur); ! if (hashvalue == 0 || riinfo->oidHashValue == hashvalue) { riinfo->valid = false; /* Remove invalidated entries from the list, too */ --- 2973,2985 ---- Assert(ri_constraint_cache != NULL); dlist_foreach_modify(iter, &ri_constraint_cache_valid_list) { RI_ConstraintInfo *riinfo = dlist_container(RI_ConstraintInfo, valid_link, iter.cur); ! if (hashvalue == 0 || riinfo->oidHashValue == hashvalue || ! ri_constraint_cache_valid_count > 1000) { riinfo->valid = false; /* Remove invalidated entries from the list, too */