Обсуждение: Re: [COMMITTERS] pgsql: Avoid SnapshotResetXmin() during AtEOXact_Snapshot()
On Fri, Mar 24, 2017 at 10:23 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > Avoid SnapshotResetXmin() during AtEOXact_Snapshot() > > For normal commits and aborts we already reset PgXact->xmin > Avoiding touching highly contented shmem improves concurrent > performance. > > Simon Riggs I'm getting occasional crashes with backtraces that look like this: #0 0x00007fff9679c286 in __pthread_kill () #1 0x00007fff94e1a9f9 in pthread_kill () #2 0x00007fff9253a9a3 in abort () #3 0x0000000107e0659e in ExceptionalCondition (conditionName=<value temporarily unavailable, due to optimizations>, errorType=0x6 <Address 0x6 out of bounds>, fileName=<value temporarily unavailable, due to optimizations>, lineNumber=<value temporarily unavailable, due to optimizations>) at assert.c:54 #4 0x0000000107e4be2b in AtEOXact_Snapshot (isCommit=<value temporarily unavailable, due to optimizations>, isPrepare=0 '\0') at snapmgr.c:1154 #5 0x0000000107a76c06 in CleanupTransaction () at xact.c:2643 #6 0x0000000107a76267 in CommitTransactionCommand () at xact.c:2818 #7 0x0000000107cecfc2 in exec_simple_query (query_string=0x7f975481e640 "ABORT TRANSACTION") at postgres.c:2461 #8 0x0000000107ceabb7 in PostgresMain (argc=<value temporarily unavailable, due to optimizations>, argv=<value temporarily unavailable, due to optimizations>, dbname=<value temporarily unavailable, due to optimizations>, username=<value temporarily unavailable, due to optimizations>) at postgres.c:4071 #9 0x0000000107c6bb58 in PostmasterMain (argc=<value temporarily unavailable, due to optimizations>, argv=<value temporarily unavailable, due to optimizations>) at postmaster.c:4317 #10 0x0000000107be5cdd in main (argc=<value temporarily unavailable, due to optimizations>, argv=<value temporarily unavailable, due to optimizations>) at main.c:228 I suspect that is the fault of this patch. Please fix or revert. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Mar 24, 2017 at 12:14 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Mar 24, 2017 at 10:23 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> Avoid SnapshotResetXmin() during AtEOXact_Snapshot() >> >> For normal commits and aborts we already reset PgXact->xmin >> Avoiding touching highly contented shmem improves concurrent >> performance. >> >> Simon Riggs > > I'm getting occasional crashes with backtraces that look like this: > > #4 0x0000000107e4be2b in AtEOXact_Snapshot (isCommit=<value > temporarily unavailable, due to optimizations>, isPrepare=0 '\0') at > snapmgr.c:1154 > #5 0x0000000107a76c06 in CleanupTransaction () at xact.c:2643 > > I suspect that is the fault of this patch. Please fix or revert. Also, the entire buildfarm is turning red. longfin, spurfowl, and magpie all show this assertion failure in the log. I haven't checked the others. TRAP: FailedAssertion("!(MyPgXact->xmin == 0)", File: "snapmgr.c", Line: 1154) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Mar 24, 2017 at 12:27 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Mar 24, 2017 at 12:14 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Fri, Mar 24, 2017 at 10:23 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> Avoid SnapshotResetXmin() during AtEOXact_Snapshot() >>> >>> For normal commits and aborts we already reset PgXact->xmin >>> Avoiding touching highly contented shmem improves concurrent >>> performance. >>> >>> Simon Riggs >> >> I'm getting occasional crashes with backtraces that look like this: >> >> #4 0x0000000107e4be2b in AtEOXact_Snapshot (isCommit=<value >> temporarily unavailable, due to optimizations>, isPrepare=0 '\0') at >> snapmgr.c:1154 >> #5 0x0000000107a76c06 in CleanupTransaction () at xact.c:2643 >> >> I suspect that is the fault of this patch. Please fix or revert. > > Also, the entire buildfarm is turning red. > > longfin, spurfowl, and magpie all show this assertion failure in the > log. I haven't checked the others. > > TRAP: FailedAssertion("!(MyPgXact->xmin == 0)", File: "snapmgr.c", Line: 1154) Another thing that is interesting is that when I run make -j8 check-world, the overall tests appear to succeed even though there are failures mid-way through: test tablefunc ... FAILED (test process exited with exit code 2) ...but then later we end with: ok All tests successful. Files=11, Tests=80, 251 wallclock secs ( 0.07 usr 0.02 sys + 19.77 cusr 14.45 csys = 34.31 CPU) Result: PASS real 4m27.421s user 3m50.047s sys 1m31.937s That's unrelated to the current problem of course, but it seems to suggest that make's -j option doesn't entirely do what you'd expect when used with make check-world. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 24 March 2017 at 16:14, Robert Haas <robertmhaas@gmail.com> wrote: > I suspect that is the fault of this patch. Please fix or revert. Will revert then fix. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2017-03-24 13:50:54 -0400, Robert Haas wrote: > On Fri, Mar 24, 2017 at 12:27 PM, Robert Haas <robertmhaas@gmail.com> wrote: > > On Fri, Mar 24, 2017 at 12:14 PM, Robert Haas <robertmhaas@gmail.com> wrote: > >> On Fri, Mar 24, 2017 at 10:23 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > >>> Avoid SnapshotResetXmin() during AtEOXact_Snapshot() > >>> > >>> For normal commits and aborts we already reset PgXact->xmin > >>> Avoiding touching highly contented shmem improves concurrent > >>> performance. > >>> > >>> Simon Riggs > >> > >> I'm getting occasional crashes with backtraces that look like this: > >> > >> #4 0x0000000107e4be2b in AtEOXact_Snapshot (isCommit=<value > >> temporarily unavailable, due to optimizations>, isPrepare=0 '\0') at > >> snapmgr.c:1154 > >> #5 0x0000000107a76c06 in CleanupTransaction () at xact.c:2643 > >> > >> I suspect that is the fault of this patch. Please fix or revert. > > > > Also, the entire buildfarm is turning red. > > > > longfin, spurfowl, and magpie all show this assertion failure in the > > log. I haven't checked the others. > > > > TRAP: FailedAssertion("!(MyPgXact->xmin == 0)", File: "snapmgr.c", Line: 1154) > > Another thing that is interesting is that when I run make -j8 > check-world, the overall tests appear to succeed even though there are > failures mid-way through: > > test tablefunc ... FAILED (test process exited with exit code 2) > > ...but then later we end with: > > ok > All tests successful. > Files=11, Tests=80, 251 wallclock secs ( 0.07 usr 0.02 sys + 19.77 > cusr 14.45 csys = 34.31 CPU) > Result: PASS > real 4m27.421s > user 3m50.047s > sys 1m31.937s > That's unrelated to the current problem of course, but it seems to > suggest that make's -j option doesn't entirely do what you'd expect > when used with make check-world. > That's likely the output of a different test from the one that failed. It's a lot easier to see the result if you're doing && echo success || echo failure - Andres