Обсуждение: Re: postgres processes spending most of their time in the

Поиск
Список
Период
Сортировка

Re: postgres processes spending most of their time in the

От
"Jeffrey W. Baker"
Дата:

On Fri, 28 Dec 2001, Tom Lane wrote:

> "Jeffrey W. Baker" <jwbaker@acm.org> writes:
> >> It sounds like you have a build that's using SysV semaphores in place of
> >> test-and-set instructions.  That should not happen on x86 hardware,
> >> since we have assembly TAS code for x86.  Please look at your port
> >> header file (src/include/pg_config_os.h symlink) and
> >> src/include/storage/s_lock.h to figure out why it's misbuilt.
>
> > Well, it seems that one of __i386__ or __GNUC__ isn't set at compile time.
> > I'm using GCC on i386 so I don't see how that is possible.
>
> I don't either.
>
> > It should be
> > safe for me to simply define these two things in pg_config.h, I suspect.
>
> That is not a solution.  If it's broken for you then it's likely to be
> broken for other people.  We need to figure out what went wrong and
> provide a permanent fix.
>
> What gcc version are you running, exactly, and what symbols does it
> predefine?  (I seem to recall that there's a way to find that out,
> though I'm not recalling how at the moment.  Anyone?)

GCC 2.95.3 on Slackware 8 i386.  I just tried this program:

#if defined __i386__ && defined __GNUC__
int
main(void)
{
    exit();
}
#endif

which builds and runs, so the problem must lie elsewhere.

-jwb


Re: postgres processes spending most of their time in the kernel

От
Tom Lane
Дата:
"Jeffrey W. Baker" <jwbaker@acm.org> writes:
> GCC 2.95.3 on Slackware 8 i386.

Seems pretty vanilla-flavored.

> I just tried this program:

> #if defined __i386__ && defined __GNUC__
> int
> main(void)
> {
>     exit();
> }
> #endif

> which builds and runs, so the problem must lie elsewhere.

Hmm.  Perhaps the TAS support does get compiled and the problem is
something else.  It'd be worth trying to determine directly what
the code is doing --- viz, run a test case that causes all the
semop thrashing, attach to one of the backends with gdb, set a
breakpoint at semop, get a backtrace to see where it's called from.
(Repeat a few times to see if the results are consistent or not.)

You might need to configure and build with --enable-debug to get
a useful (symbolic) backtrace.

            regards, tom lane

Re: postgres processes spending most of their time in the

От
"Jeffrey W. Baker"
Дата:

On Fri, 28 Dec 2001, Tom Lane wrote:

> Hmm.  Perhaps the TAS support does get compiled and the problem is
> something else.  It'd be worth trying to determine directly what
> the code is doing --- viz, run a test case that causes all the
> semop thrashing, attach to one of the backends with gdb, set a
> breakpoint at semop, get a backtrace to see where it's called from.
> (Repeat a few times to see if the results are consistent or not.)
>
> You might need to configure and build with --enable-debug to get
> a useful (symbolic) backtrace.

#0  semop (semid=1277952, sops=0xbfffe934, nsops=1) at
../sysdeps/unix/sysv/linux/semop.c:35
#1  0x81504d2 in IpcSemaphoreUnlock (semId=1277952, sem=0) at ipc.c:456
#2  0x8157fa6 in LWLockRelease (lockid=LockMgrLock) at lwlock.c:455
#3  0x8154a5f in LockAcquire (lockmethod=1, locktag=0xbfffea50,
xid=6753879, lockmode=1, dontWait=0 '\000')
    at lock.c:723
#4  0x81537fb in LockRelation (relation=0x4054a070, lockmode=1) at
lmgr.c:153
#5  0x807a58b in relation_openr (relationName=0x821c38d "pg_statistic",
lockmode=1) at heapam.c:524
#6  0x807a6de in heap_openr (relationName=0x821c38d "pg_statistic",
lockmode=1) at heapam.c:595
#7  0x81b2f49 in SearchCatCache (cache=0x82a0fe0, v1=7522815, v2=65535,
v3=0, v4=0) at catcache.c:943
#8  0x81b8d4d in SearchSysCache (cacheId=24, key1=7522815, key2=65535,
key3=0, key4=0) at syscache.c:461
#9  0x81b9f35 in get_attavgwidth (relid=7522815, attnum=-1) at
lsyscache.c:906
#10 0x81223c2 in set_rel_width (root=0x82a72d0, rel=0x82a99b0) at
costsize.c:1286
#11 0x81221e6 in set_baserel_size_estimates (root=0x82a72d0,
rel=0x82a99b0) at costsize.c:1161
#12 0x8120262 in set_plain_rel_pathlist (root=0x82a72d0, rel=0x82a99b0,
rte=0x82a7358) at allpaths.c:132
#13 0x8120237 in set_base_rel_pathlists (root=0x82a72d0) at allpaths.c:115
#14 0x8120142 in make_one_rel (root=0x82a72d0) at allpaths.c:62
#15 0x812b09d in subplanner (root=0x82a72d0, flat_tlist=0x82a9718,
tuple_fraction=0) at planmain.c:238
#16 0x812af2e in query_planner (root=0x82a72d0, tlist=0x82a9308,
tuple_fraction=0) at planmain.c:126
#17 0x812c4ab in grouping_planner (parse=0x82a72d0, tuple_fraction=0) at
planner.c:1094
#18 0x812b4f5 in subquery_planner (parse=0x82a72d0, tuple_fraction=-1) at
planner.c:228
#19 0x812b2d2 in planner (parse=0x82a72d0) at planner.c:94
#20 0x815c0cc in pg_plan_query (querytree=0x82a72d0) at postgres.c:511
#21 0x815c362 in pg_exec_query_string (
    query_string=0x82a6fa0 "UPDATE job SET state = 'retired' WHERE req =
745458", dest=Remote,
    parse_context=0x827d510) at postgres.c:758
#22 0x815d581 in PostgresMain (argc=4, argv=0xbffff10c, username=0x8278da9
"pg") at postgres.c:1849


and :

#0  semop (semid=1277952, sops=0xbfffe9b0, nsops=1) at
../sysdeps/unix/sysv/linux/semop.c:35
#1  0x8150424 in IpcSemaphoreLock (semId=1277952, sem=2, interruptOK=0
'\000') at ipc.c:422
#2  0x8157bfb in LWLockAcquire (lockid=LockMgrLock, mode=LW_EXCLUSIVE) at
lwlock.c:271
#3  0x8154fb2 in LockRelease (lockmethod=1, locktag=0xbfffea80,
xid=6753879, lockmode=1) at lock.c:1018
#4  0x81539f9 in UnlockRelation (relation=0x4054b8f8, lockmode=1) at
lmgr.c:217
#5  0x8081823 in index_endscan (scan=0x82aa620) at indexam.c:288
#6  0x81b3146 in SearchCatCache (cache=0x82a0fe0, v1=7522815, v2=65535,
v3=0, v4=0) at catcache.c:994

-jwb


Re: postgres processes spending most of their time in the kernel

От
Tom Lane
Дата:
"Jeffrey W. Baker" <jwbaker@acm.org> writes:
> #0  semop (semid=1277952, sops=0xbfffe934, nsops=1) at
> ../sysdeps/unix/sysv/linux/semop.c:35
> #1  0x81504d2 in IpcSemaphoreUnlock (semId=1277952, sem=0) at ipc.c:456
> #2  0x8157fa6 in LWLockRelease (lockid=LockMgrLock) at lwlock.c:455
> #3  0x8154a5f in LockAcquire (lockmethod=1, locktag=0xbfffea50,
> xid=6753879, lockmode=1, dontWait=0 '\000')
>     at lock.c:723
> #4  0x81537fb in LockRelation (relation=0x4054a070, lockmode=1) at
> lmgr.c:153

> and :

> #0  semop (semid=1277952, sops=0xbfffe9b0, nsops=1) at
> ../sysdeps/unix/sysv/linux/semop.c:35
> #1  0x8150424 in IpcSemaphoreLock (semId=1277952, sem=2, interruptOK=0
> '\000') at ipc.c:422
> #2  0x8157bfb in LWLockAcquire (lockid=LockMgrLock, mode=LW_EXCLUSIVE) at
> lwlock.c:271
> #3  0x8154fb2 in LockRelease (lockmethod=1, locktag=0xbfffea80,
> xid=6753879, lockmode=1) at lock.c:1018
> #4  0x81539f9 in UnlockRelation (relation=0x4054b8f8, lockmode=1) at
> lmgr.c:217

Hmm, those look perfectly normal, except one would like to think it's
the exception not the rule to be blocking on LWLocks.  If you're seeing
a lot of semops coming from these code paths then something unhappy is
going on (and I would imagine that 7.1 would've been worse, since it
would busy-wait under similar conditions).

Could we get some details about your application?  How many active
backends do you have, and what are they doing?  A query trace
(postmaster log with debug_print_query on) might be helpful.

            regards, tom lane