Обсуждение: Proof-of-concept for initdb-time shared_buffers selection
The attached patch shows how initdb can dynamically determine reasonable shared_buffers and max_connections settings that will work on the current machine. It consists of two trivial adjustments: one rips out the "PrivateMemory" code, so that a standalone backend will allocate a shared memory segment the same way as a postmaster would do, and the second adds a simple test loop in initdb that sees how large a setting will still allow the backend to start. The patch isn't quite complete since I didn't bother adding the few lines of sed hacking needed to actually insert the selected values into the installed postgresql.conf file, but that's just another few minutes' work. Adjusting the documentation to match would take a bit longer. We might also want to tweak initdb to print a warning message if it's forced to select very small values, but I didn't do that yet. Questions for the list: 1. Does this approach seem like a reasonable solution to our problem of some machines having unrealistically small kernel limits on shared memory? 2. If so, can I get away with applying this post-feature-freeze? I can argue that it's a bug fix, but perhaps some will disagree. 3. What should be the set of tested values? I have it as buffers: first to work of 1000 900 800 700 600 500 400 300 200 100 50 connections: first to work of 100 50 40 30 20 10 but we could certainly argue for different rules. regards, tom lane *** src/backend/port/sysv_shmem.c.orig Thu May 8 15:17:07 2003 --- src/backend/port/sysv_shmem.c Fri Jul 4 14:47:51 2003 *************** *** 45,52 **** static void *InternalIpcMemoryCreate(IpcMemoryKey memKey, uint32 size); static void IpcMemoryDetach(int status, Datum shmaddr); static void IpcMemoryDelete(int status, Datum shmId); - static void *PrivateMemoryCreate(uint32 size); - static void PrivateMemoryDelete(int status, Datum memaddr); static PGShmemHeader *PGSharedMemoryAttach(IpcMemoryKey key, IpcMemoryId *shmid, void *addr); --- 45,50 ---- *************** *** 243,283 **** } - /* ---------------------------------------------------------------- - * private memory support - * - * Rather than allocating shmem segments with IPC_PRIVATE key, we - * just malloc() the requested amount of space. This code emulates - * the needed shmem functions. - * ---------------------------------------------------------------- - */ - - static void * - PrivateMemoryCreate(uint32 size) - { - void *memAddress; - - memAddress = malloc(size); - if (!memAddress) - { - fprintf(stderr, "PrivateMemoryCreate: malloc(%u) failed\n", size); - proc_exit(1); - } - MemSet(memAddress, 0, size); /* keep Purify quiet */ - - /* Register on-exit routine to release storage */ - on_shmem_exit(PrivateMemoryDelete, PointerGetDatum(memAddress)); - - return memAddress; - } - - static void - PrivateMemoryDelete(int status, Datum memaddr) - { - free(DatumGetPointer(memaddr)); - } - - /* * PGSharedMemoryCreate * --- 241,246 ---- *************** *** 289,294 **** --- 252,260 ---- * collision with non-Postgres shmem segments. The idea here is to detect and * re-use keys that may have been assigned by a crashed postmaster or backend. * + * makePrivate means to always create a new segment, rather than attach to + * or recycle any existing segment. + * * The port number is passed for possible use as a key (for SysV, we use * it to generate the starting shmem key). In a standalone backend, * zero will be passed. *************** *** 323,342 **** for (;;NextShmemSegID++) { - /* Special case if creating a private segment --- just malloc() it */ - if (makePrivate) - { - memAddress = PrivateMemoryCreate(size); - break; - } - /* Try to create new segment */ memAddress = InternalIpcMemoryCreate(NextShmemSegID, size); if (memAddress) break; /* successful create and attach */ /* Check shared memory and possibly remove and recreate */ ! if ((hdr = (PGShmemHeader *) memAddress = PGSharedMemoryAttach( NextShmemSegID, &shmid, UsedShmemSegAddr)) == NULL) continue; /* can't attach, not one of mine */ --- 289,304 ---- for (;;NextShmemSegID++) { /* Try to create new segment */ memAddress = InternalIpcMemoryCreate(NextShmemSegID, size); if (memAddress) break; /* successful create and attach */ /* Check shared memory and possibly remove and recreate */ ! ! if (makePrivate) /* a standalone backend shouldn't do this */ ! continue; ! if ((hdr = (PGShmemHeader *) memAddress = PGSharedMemoryAttach( NextShmemSegID, &shmid, UsedShmemSegAddr)) == NULL) continue; /* can't attach, not one of mine */ *** src/backend/utils/init/postinit.c.orig Fri Jun 27 10:45:30 2003 --- src/backend/utils/init/postinit.c Fri Jul 4 14:47:43 2003 *************** *** 176,187 **** { /* * We're running a postgres bootstrap process or a standalone backend. ! * Create private "shmem" and semaphores. Force MaxBackends to 1 so ! * that we don't allocate more resources than necessary. */ - SetConfigOption("max_connections", "1", - PGC_POSTMASTER, PGC_S_OVERRIDE); - CreateSharedMemoryAndSemaphores(true, MaxBackends, 0); } } --- 176,183 ---- { /* * We're running a postgres bootstrap process or a standalone backend. ! * Create private "shmem" and semaphores. */ CreateSharedMemoryAndSemaphores(true, MaxBackends, 0); } } *** src/bin/initdb/initdb.sh.orig Fri Jul 4 12:41:21 2003 --- src/bin/initdb/initdb.sh Fri Jul 4 15:19:11 2003 *************** *** 579,584 **** --- 579,618 ---- ########################################################################## # + # DETERMINE PLATFORM-SPECIFIC CONFIG SETTINGS + # + # Use reasonable values if kernel will let us, else scale back + + cp /dev/null "$PGDATA"/postgresql.conf + + $ECHO_N "selecting default shared_buffers... "$ECHO_C + + for nbuffers in 1000 900 800 700 600 500 400 300 200 100 50 + do + TEST_OPT="$PGSQL_OPT -c shared_buffers=$nbuffers -c max_connections=5" + if "$PGPATH"/postgres $TEST_OPT template1 </dev/null >/dev/null 2>&1 + then + break + fi + done + + echo "$nbuffers" + + $ECHO_N "selecting default max_connections... "$ECHO_C + + for nconns in 100 50 40 30 20 10 + do + TEST_OPT="$PGSQL_OPT -c shared_buffers=$nbuffers -c max_connections=$nconns" + if "$PGPATH"/postgres $TEST_OPT template1 </dev/null >/dev/null 2>&1 + then + break + fi + done + + echo "$nconns" + + ########################################################################## + # # CREATE CONFIG FILES $ECHO_N "creating configuration files... "$ECHO_C
On Fri, Jul 04, 2003 at 03:29:37PM -0400, Tom Lane wrote: > 2. If so, can I get away with applying this post-feature-freeze? I can > argue that it's a bug fix, but perhaps some will disagree. I'd say it is a bug fix. Michael -- Michael Meskes Email: Michael at Fam-Meskes dot De ICQ: 179140304, AIM: michaelmeskes, Jabber: meskes@jabber.org Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!
On Friday 04 July 2003 13:31, Michael Meskes wrote: > On Fri, Jul 04, 2003 at 03:29:37PM -0400, Tom Lane wrote: > > 2. If so, can I get away with applying this post-feature-freeze? I can > > argue that it's a bug fix, but perhaps some will disagree. > > I'd say it is a bug fix. > > Michael I'm with you Michael/Tom on this one as well, Lets at least get this framework inplace, we can always experment with what values we settle on. -- Darcy Buskermolen Wavefire Technologies Corp. ph: 250.717.0200 fx: 250.763.1759 http://www.wavefire.com
Tom Lane wrote: > 1. Does this approach seem like a reasonable solution to our problem > of some machines having unrealistically small kernel limits on shared > memory? Yes, it does to me. > 2. If so, can I get away with applying this post-feature-freeze? I can > argue that it's a bug fix, but perhaps some will disagree. I'd go with calling it a bug fix, or rather pluging a known deficiency. > 3. What should be the set of tested values? I have it as > buffers: first to work of 1000 900 800 700 600 500 400 300 200 100 50 > connections: first to work of 100 50 40 30 20 10 > but we could certainly argue for different rules. These seem reasonable. We might want to output a message, even if the highest values fly, that tuning is recommended for best performance. Joe
On Fri, Jul 04, 2003 at 15:29:37 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > 3. What should be the set of tested values? I have it as > buffers: first to work of 1000 900 800 700 600 500 400 300 200 100 50 > connections: first to work of 100 50 40 30 20 10 > but we could certainly argue for different rules. Should the default max number of connections first try something greater than what Apache sets by default (256 for prefork, 400 for worker)?
Bruno Wolff III <bruno@wolff.to> writes: > Should the default max number of connections first try something greater > than what Apache sets by default (256 for prefork, 400 for worker)? We could do that. I'm a little worried about setting default values that are likely to cause problems with exhausting the kernel's fd table (nfiles limit). If anyone actually tries to run 256 or 400 backends without having increased nfiles and/or twiddled our max_files_per_process setting, they're likely to have serious problems. (There could be some objection even to max_connections 100 on this ground.) We could imagine having initdb reduce max_files_per_process to prevent such problems, but then you'd be talking about giving up performance to accommodate a limit that the user might not ever approach in practice. You really don't want the thing selecting parameters on the basis of unrealistic estimates of what max_connections needs to be. Ultimately there's no substitute for some user input about what they're planning to do with the database, and possibly adjustment of kernel settings along with PG settings, if you're planning to run serious applications. initdb can't be expected to do this unless you want to make it interactive, which would certainly make the RPM guys really unhappy. I'd rather see such considerations pushed off to a separate tool, some kind of "configuration wizard" perhaps. regards, tom lane
On Fri, 04 Jul 2003 15:29:37 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote: >The attached patch shows how initdb can dynamically determine reasonable >shared_buffers and max_connections settings that will work on the >current machine. Can't this be done on postmaster startup? I think of two GUC variables where there is only one today: min_shared_buffers and max_shared_buffers. If allocation for the max_ values fails, the numbers are decreased in a loop of, say, 10 steps until allocation succeeds, or even fails at the min_ values. The actual values chosen are reported as a NOTICE and can be inspected as readonly GUC variables. This would make the lives easier for the folks trying to come up with default .conf files, e.g. min_shared_buffers = 64 max_shared_buffers = 2000 could cover a fairly large range of low level to mid level machines. A paranoid dba, who doesn't want the postmaster to do unpredictable things on startup, can always set min_xxx == max_xxx to get the current behaviour. Servus Manfred
Manfred Koizar <mkoi-pg@aon.at> writes: > On Fri, 04 Jul 2003 15:29:37 -0400, Tom Lane <tgl@sss.pgh.pa.us> > wrote: >> The attached patch shows how initdb can dynamically determine reasonable >> shared_buffers and max_connections settings that will work on the >> current machine. > Can't this be done on postmaster startup? Why would that be a good idea? Seems to me it just offers a fresh opportunity to do the wrong thing at every startup. We'v had troubles enough with problems that appear only when the postmaster is started by hand rather than by boot script, or vice versa; this would just add another unknown to the equation. > This would make the lives easier for the folks trying to come up with > default .conf files, e.g. > min_shared_buffers = 64 > max_shared_buffers = 2000 > could cover a fairly large range of low level to mid level machines. Not unless their notion of a default .conf file includes a preinstalled $PGDATA directory. Under ordinary circumstances, initdb will get run locally on the target machine, and should come up with a valid value. regards, tom lane
Manfred, > Can't this be done on postmaster startup? I think of two GUC > variables where there is only one today: min_shared_buffers and > max_shared_buffers. If allocation for the max_ values fails, the > numbers are decreased in a loop of, say, 10 steps until allocation > succeeds, or even fails at the min_ values. I think the archives are back up. Take a look at this thread; we already had this discussion at some length, and decided that a max of 1000 was reasonable in advance of user tuning. And, I believe, Tom has already written the code. -- Josh Berkus Aglio Database Solutions San Francisco