Обсуждение: Lyris looking to help fix PostgresSQL crashing problems
Hello -- I'm the lead programmer of Lyris ListManager, an email list server that run on PostgreSQL, Oracle, and MS/SQL. About 20% of our client base of 4000 runs on PostgresSQL -- it's very popular with our clients -- much more than Oracle is(about 3%). Unfortunately we have about a dozen clients who have stability problems with PostgresSQL. This week a major television networkcancelled their order with us due to their PostgresSQL stability issues, which is what prompted me to write this emailand get involved with the PostgresSQL community. It seems that with larger database sizes (500,000 rows and larger) and high stress, the server daemon has a tendency to core.We've also had cases where a single connection doing a million inserts into a table will cause the daemon to core. We'veseen problems with both 7.1 and 7.2.x, with built-on-the-machine and with RPMs. We've also had big stability problemswith Solaris 8/Sparc, and don't ship on that platform because of that. What I'd like to do is help solve these problems in the core distribution, so that PostrgesSQL can indeed be able to handlethe large databases and high transaction loads that Microsoft SQL can. My company has hired open source people before to help fix bugs or add features to open source projects, most notable fromthe Tcl community, as we use Tcl quite a bit (we have two programmers from the Tcl Core team working here). This worksout well for the Tcl community, as we fund the development of the project, as well as pay someone to work on somethingthey want to work on anyhow. So... what I'm looking for are recommendations on a PostgresSQL guru who could help nail the stability/load issues, and makesure that the fixes make their way back into the PostgresSQL core. What I'd prefer is to get a regular contributor tothis list, so that this person could investigate our problems, and then get the community's help in solving them. Thanks! -john
John Buckman <john@lyris.com> writes: > It seems that with larger database sizes (500,000 rows and larger) and > high stress, the server daemon has a tendency to core. We'd love to see some stack traces ... regards, tom lane
> John Buckman <john@lyris.com> writes: > > It seems that with larger database sizes (500,000 rows and larger) and > > high stress, the server daemon has a tendency to core. > We'd love to see some stack traces ... Yeah, I just didn't know what form this list prefers to work on things, which is why I'd prefer to hire a regular participantof this list. If gcc 'where' stack traces are what you want, we can do that. I suspect that the problems may be platform-or-build related, because we've often had trouble replicating customer problemson our own sysems. For example, we had many reports of problems with 7.2.x, and saw it crash often on a customer'sredhat machine that we had ssh access to, but couldn't make it crash in our own lab. :( That's why we need help. If we could make a simple C test case that crashed pgsql, I'm sure you guys could fix the problem in a jiffy. -john
John Buckman wrote: > > John Buckman <john@lyris.com> writes: > > > It seems that with larger database sizes (500,000 rows and larger) and > > > high stress, the server daemon has a tendency to core. > > > We'd love to see some stack traces ... > > Yeah, I just didn't know what form this list prefers to work on > things, which is why I'd prefer to hire a regular participant > of this list. If gcc 'where' stack traces are what you want, > we can do that. Yep, in most cases, the crash creates a core file in the database directory. A backtrace of that core file is usually a good start. You should to sure there are debugging symbols in the binary (gcc -g). The server log files also often contain valuable information. > I suspect that the problems may be platform-or-build related, > because we've often had trouble replicating customer problems > on our own systems. For example, we had many reports of problems > with 7.2.x, and saw it crash often on a customer's redhat machine > that we had ssh access to, but couldn't make it crash in our > own lab. :( That's why we need help. If we could make a simple > C test case that crashed pgsql, I'm sure you guys could fix the > problem in a jiffy. Yes, that does make it harder, but a backtrace usually gets us started. It may also be tickling some OS bug or a hardware failure, or a simple exhaustion of some resource. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
> John Buckman <john@lyris.com> writes: > > It seems that with larger database sizes (500,000 rows and larger) and > > high stress, the server daemon has a tendency to core. > We'd love to see some stack traces ... Yeah, I just didn't know what form this list prefers in terms of info to be able to work on things, which is why I'd preferto hire a regular participant of this list. If gcc 'where' stack traces from core files are what you want, we cando that. I suspect that the problems may be platform-or-build related, because we've often had trouble replicating customer problemson our own sysems. For example, we had many reports of problems with 7.2.x, and saw it crash often on a customer'sredhat machine that we had ssh access to, but couldn't make it crash in our own lab. :( That's why we need help. If we could make a simple C test case that crashed pgsql, I'm sure you guys could fix the problem in a jiffym but localizingand recreating a problem is always 80% of it. -john