Обсуждение: 7.2.1 backend crash (convert_string_datum, locale)
Hi, When testing postgres 7.2.1 on a sparc/solaris8 box with --enable-locale --enable-multibyte I get a crash in convert_string_datum. The backend just dies when doing an select. With casserts and debug configured in I got the following in the log: NOTICE: AllocSetFree: detected write past chunk end in TransactionCommandContex t 4b7c18 NOTICE: AllocSetFree: detected write past chunk end in TransactionCommandContex t 4b7c18 NOTICE: AllocSetFree: detected write past chunk end in TransactionCommandContex t 4b7c18 NOTICE: AllocSetFree: detected write past chunk end in TransactionCommandContex t 4b7c18 NOTICE: AllocSetFree: detected write past chunk end in TransactionCommandContex t 4b7c18 NOTICE: AllocSetFree: detected write past chunk end in TransactionCommandContex t 4b7818 Gdb on the crashing backend says: Program received signal SIGSEGV, Segmentation fault. 0x269bd0 in pfree (pointer=0x4b7878) at mcxt.c:446 446 AssertArg(MemoryContextIsValid(header->context)); (gdb) where #0 0x269bd0 in pfree (pointer=0x4b7878) at mcxt.c:446 #1 0x21844c in convert_string_datum (value=5251848, typid=1043) at selfuncs.c:2059 #2 0x217978 in convert_to_scalar (value=4947304, valuetypid=1043, scaledvalue=0xffbee0b8, lobound=5251848, hibound=4946632, boundstypid=1043, scaledlobound=0xffbee0a8, scaledhibound=0xffbee0b0) at selfuncs.c:1763 #3 0x214f8c in scalarineqsel (root=0x4aebe8, operator=1066, isgt=0 '\000', var=0x4b6218, other=0x4b76d8) at selfuncs.c:584 #4 0x21541c in scalarltsel (fcinfo=0xffbee258) at selfuncs.c:733 #5 0x25aa90 in DirectFunctionCall4 (func=0x215304 <scalarltsel>, arg1=4910056, arg2=1066, arg3=4947368, arg4=0) at fmgr.c:725 #6 0x2199f0 in prefix_selectivity (root=0x4aebe8, var=0x4b6218, prefix=0x4b7ce8 "SY") at selfuncs.c:2667 #7 0x215854 in patternsel (fcinfo=0xffbee518, ptype=Pattern_Type_Like) at selfuncs.c:872 #8 0x215a18 in likesel (fcinfo=0xffbee518) at selfuncs.c:913 #9 0x25c5e4 in OidFunctionCall4 (functionId=1819, arg1=4910056, arg2=1213, arg3=4941064, arg4=1) at fmgr.c:1218 #10 0x185128 in restriction_selectivity (root=0x4aebe8, operator=1213, args=0x4b6508, varRelid=1) at plancat.c:232 #11 0x167530 in clauselist_selectivity (root=0x4aebe8, clauses=0x4b7678, varRelid=1) at clausesel.c:156 #12 0x167394 in restrictlist_selectivity (root=0x4aebe8, restrictinfo_list=0x4b6958, varRelid=1) at clausesel.c:74 #13 0x16a044 in set_baserel_size_estimates (root=0x4aebe8, rel=0x4b6af8) at costsize.c:1146 #14 0x166ae0 in set_plain_rel_pathlist (root=0x4aebe8, rel=0x4b6af8, rte=0x4aec78) at allpaths.c:132 #15 0x166aa4 in set_base_rel_pathlists (root=0x4aebe8) at allpaths.c:115 #16 0x1667ec in make_one_rel (root=0x4aebe8) at allpaths.c:62 #17 0x177708 in subplanner (root=0x4aebe8, flat_tlist=0x4b6a18, tuple_fraction=0) at planmain.c:238 #18 0x177544 in query_planner (root=0x4aebe8, tlist=0x4b5ed8, tuple_fraction=0) at planmain.c:126 #19 0x17939c in grouping_planner (parse=0x4aebe8, tuple_fraction=0) at planner.c:1094 #20 0x177d70 in subquery_planner (parse=0x4aebe8, tuple_fraction=-1) at planner.c:228 #21 0x177a2c in planner (parse=0x4aebe8) at planner.c:94 #22 0x1c821c in pg_plan_query (querytree=0x4aebe8) at postgres.c:513 #23 0x1c871c in pg_exec_query_string ( query_string=0x4ae278 "SELECT find0.userId AS userId, find0.longValue AS findLongValue0 FROM userData find0 WHERE find0.groupName='user'AND find0.attributeName LIKE 'login%' AND find0.value LIKE 'SY%'", dest=Remote, parse_context=0x464598) at postgres.c:784 #24 0x1ca63c in PostgresMain (argc=4, argv=0xffbef018, username=0x4607e1 "mats") at postgres.c:1926 #25 0x18bab0 in DoBackend (port=0x4606b0) at postmaster.c:2243 #26 0x18af48 in BackendStartup (port=0x4606b0) at postmaster.c:1874 #27 0x189548 in ServerLoop () at postmaster.c:995 #28 0x188d18 in PostmasterMain (argc=1, argv=0x447db0) at postmaster.c:771 #29 0x143ebc in main (argc=1, argv=0xffbefacc) at main.c:206 (gdb) up #1 0x21844c in convert_string_datum (value=5251848, typid=1043) at selfuncs.c:2059 2059 pfree(val); (gdb) print val $1 = 0x4b7878 "D1BFD67F71192ECE" (gdb) print xfrmstr $2 = 0x4b78d8 "\001R\0014\001P\001T\001R\0019\001:\001T\001:\0014\0014\001<\0015\001S\001Q\001S\001\001\001S\001Q\001S\0015\001<\0014\0014\001:\001T\001:\0019\001R\001T\001P\0014\001R\001\001\001R\0014\001P\001T\001R\0019\001:\001T\001:\0014\0014\001<\0015\001S\001Q\001S\001\001" (gdb) print xfrmsize $3 = 48 (gdb) print xfrmlen $4 = 102 (gdb) print *(varattrib *)(value) $5 = {va_header = 20, va_content = {va_compressed = {va_rawsize = 1144078918, va_data = "D"}, va_external = {va_rawsize = 1144078918, va_extsize = 1144403782, va_valueid = 925970745, va_toastrelid = 843400005}, va_data = "D"}} (gdb) print (char *)((varattrib *)(value))->va_content.va_data $6 = 0x50230c "D1BFD67F71192ECE~", '\177' <repeats 183 times>... (gdb) list 2054 /* Oops, didn't make it */ 2055 pfree(xfrmstr); 2056 xfrmstr = (char *) palloc(xfrmlen + 1); 2057 xfrmlen = strxfrm(xfrmstr, val, xfrmlen + 1); 2058 } 2059 pfree(val); 2060 val = xfrmstr; 2061 #endif 2062 2063 return (unsigned char *) val; (gdb) down #0 0x269bd0 in pfree (pointer=0x4b7878) at mcxt.c:446 446 AssertArg(MemoryContextIsValid(header->context)); (gdb) print header $7 = (StandardChunkHeader *) 0x4b7868 (gdb) print *header $8 = {context = 0x15246b8, size = 32, requested_size = 17} (gdb) Please let me know if there is more info I can get out of gdb to track this down. _ Mats Lofkvist mal@algonet.se
Mats Lofkvist <mal@algonet.se> writes: > When testing postgres 7.2.1 on a sparc/solaris8 box with > --enable-locale --enable-multibyte I get a crash in > convert_string_datum. This smells like a problem that we chased down awhile back, that snprintf on Solaris is broken (it will write past the end of the specified buffer length, thus corrupting adjacent data). Andrew, I think that was your test case we found it on. Do you recall if a fix is available from Sun? regards, tom lane
Tom Lane wrote: > Mats Lofkvist <mal@algonet.se> writes: > > When testing postgres 7.2.1 on a sparc/solaris8 box with > > --enable-locale --enable-multibyte I get a crash in > > convert_string_datum. > > This smells like a problem that we chased down awhile back, that > snprintf on Solaris is broken (it will write past the end of the > specified buffer length, thus corrupting adjacent data). > > Andrew, I think that was your test case we found it on. Do you > recall if a fix is available from Sun? Yes, I remember this too. It was specifically multibyte-related. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Thu, Jul 11, 2002 at 11:15:42PM -0400, Tom Lane wrote: > Mats Lofkvist <mal@algonet.se> writes: > > When testing postgres 7.2.1 on a sparc/solaris8 box with > > --enable-locale --enable-multibyte I get a crash in > > convert_string_datum. > > This smells like a problem that we chased down awhile back, that > snprintf on Solaris is broken (it will write past the end of the > specified buffer length, thus corrupting adjacent data). It does indeed. This was only the 64-bit library, though, or at least as far as we were able to tell. And I wasn't able to turn up any evidence that it happened on Solaris 8. But it might. We don't use 8, at least not yet. > Andrew, I think that was your test case we found it on. Do you > recall if a fix is available from Sun? Not as far as I know, at least for 7. Come to think of it, I now _do_ recall seeing something in my various Google wanderings which suggested that there is a fix in one of the patch packages for Solaris 8 (which suggests the buggy library is in the basic Solaris 8 install). I dimly recall some mention of incompatibility between it and some other patchlevel, as well, so it might require some digging. (Given that it's really a bounds mistake in a system library, you'd think that it'd be easier to find more information about it; I actually learned almost everything I know about the problem from, IIRC, the autoconf web pages, so I'd not expect a cursory search of Sun's site to turn anything up.) In the FAQ_Solaris, there is a suggestion to use the substitute function included in the Postgres tree (which is what you suggested, Tom, and what I did), as well as instructions on how to do it. It definitely works for me on Solaris 7. Might be worth trying on 8 as well. If so, the FAQ should be updated so as not to limit the discussion to Solaris 7 and earlier. Sorry I can't be more help than this. A -- ---- Andrew Sullivan 87 Mowat Avenue Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M6K 3E3 +1 416 646 3304 x110
andrew@libertyrms.info (Andrew Sullivan) writes: > On Thu, Jul 11, 2002 at 11:15:42PM -0400, Tom Lane wrote: > > Mats Lofkvist <mal@algonet.se> writes: > > > When testing postgres 7.2.1 on a sparc/solaris8 box with > > > --enable-locale --enable-multibyte I get a crash in > > > convert_string_datum. > > > > This smells like a problem that we chased down awhile back, that > > snprintf on Solaris is broken (it will write past the end of the > > specified buffer length, thus corrupting adjacent data). > > It does indeed. This was only the 64-bit library, though, or at > least as far as we were able to tell. And I wasn't able to turn up > any evidence that it happened on Solaris 8. But it might. We don't > use 8, at least not yet. > > > Andrew, I think that was your test case we found it on. Do you > > recall if a fix is available from Sun? > > Not as far as I know, at least for 7. Come to think of it, I now > _do_ recall seeing something in my various Google wanderings which > suggested that there is a fix in one of the patch packages for > Solaris 8 (which suggests the buggy library is in the basic Solaris 8 > install). I dimly recall some mention of incompatibility between it > and some other patchlevel, as well, so it might require some digging. > (Given that it's really a bounds mistake in a system library, you'd > think that it'd be easier to find more information about it; I > actually learned almost everything I know about the problem from, > IIRC, the autoconf web pages, so I'd not expect a cursory search of > Sun's site to turn anything up.) > > In the FAQ_Solaris, there is a suggestion to use the substitute > function included in the Postgres tree (which is what you suggested, > Tom, and what I did), as well as instructions on how to do it. It > definitely works for me on Solaris 7. Might be worth trying on 8 as > well. If so, the FAQ should be updated so as not to limit the > discussion to Solaris 7 and earlier. I didn't get it to work with the stuff in FAQ_Solaris (can't guarantee I really got snprintf substituted though, just followed the instructions and recompiled). Removing --enable-multibyte didn't help either. Without neither --enable-locale or --enable-multibyte it seems to work, but as I had to create a new database when removing locale any problems local to the first database are not seen anymore. Is postgres 8-bit clean without locale support enabled? (I don't care about sort orders and such, only need to read/write 8-bit chars via jdbc). _ Mats Lofkvist mal@algonet.se
Mats Lofkvist <mal@algonet.se> writes: > Without neither --enable-locale or --enable-multibyte it > seems to work, but as I had to create a new database when > removing locale any problems local to the first database > are not seen anymore. Hm. If the database is already corrupt then simply recompiling a corrected binary isn't going to magically make things perfect. Maybe you should retry the snprintf patch and/or --enable-multibyte using fresh databases. > Is postgres 8-bit clean without locale support enabled? > (I don't care about sort orders and such, only need to > read/write 8-bit chars via jdbc). In that case you don't really need locale, no. Not sure about whether you need multibyte; does JDBC expect Unicode support? regards, tom lane