Обсуждение: No control over max.num. WAL files
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello I am trying to move a postgres cluster with 90 databases and around 140GB of data between two servers (8.3.12 -> 8.3.15). I am using 'pg_dumpall | psql' in the process and everything works ok until our pg_xlog partition gets full. According to the documentation [1] we can expect a maximum of (3 * checkpoint_segments + 1 segment files) WAL files in pg_xlog. In our system this will be (3 * 128 + 1) = 385 WAL files (~6GB) We have taken this into account + some extra space. Our pg_xlog partition is ~8GB and under the restore process 486 WAL files were created in this partition. The partition got full and everything crashed. Our question is: How can we get 486 WAL files generated in our pg_xlog partition if the documentation says that in the worst case we will get 385 WAL files? These are the relevant parameters we have changed in postgresql.conf: archive_mode | off checkpoint_segments | 128 default_statistics_target | 100 maintenance_work_mem | 512MB max_fsm_pages | 800000 max_fsm_relations | 8000 shared_buffers | 10GB wal_buffers | 512kB wal_sync_method | fdatasync work_mem | 16MB And these the relevant error messages: PANIC: could not write to file "pg_xlog/xlogtemp.25133": No space left on device LOG: WAL writer process (PID 25133) was terminated by signal 6: Aborted LOG: terminating any other active server processes WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. STATEMENT: CREATE INDEX attachments_textsearch ON attachments USING gin (textsearchable); FATAL: the database system is in recovery mode LOG: all server processes terminated; reinitializing LOG: database system was interrupted; last known up at 2011-05-20 17:46:18 CEST LOG: database system was not properly shut down; automatic recovery in progress LOG: redo starts at 12/6FD38F70 FATAL: the database system is in recovery mode LOG: could not open file "pg_xlog/0000000100000013000000B0" (log file 19, segment 176): No such file or directory LOG: redo done at 13/AFFFFCE8 LOG: autovacuum launcher started LOG: database system is ready to accept connections As you can see the last SQL statement before the crash is: CREATE INDEX attachments_textsearch ON attachments USING gin (textsearchable); Maybe the problem is related to this? Any ideas?, thanks in advance. [1] http://www.postgresql.org/docs/8.3/interactive/wal-configuration.html regards, - -- Rafael Martinez Guerrero Center for Information Technology University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iEYEARECAAYFAk3c6FsACgkQBhuKQurGihT/pgCcD5nA8E5VHIHf984VjrHDk3YT yAAAoIiW5CClJ7CN9bu+Ib89IckHmMEf =H5W3 -----END PGP SIGNATURE-----
On Wed, May 25, 2011 at 01:30:35PM +0200, Rafael Martinez wrote: > I am using 'pg_dumpall | psql' in the process and everything works ok > until our pg_xlog partition gets full. > > According to the documentation [1] we can expect a maximum of > (3 * checkpoint_segments + 1 segment files) WAL files in pg_xlog. That isn't what the documentation says. It says this: If, due to a short-term peak of log output rate, there are more than 3 * checkpoint_segments + 1 segment files, the unneeded segment files will be deleted instead of recycled until the system gets back under this limit. Note that "unneeded". Obviously, you need more than that, probably because you're restoring the database in one transaction (so none of the files can be flushed). > In our system this will be (3 * 128 + 1) = 385 WAL files (~6GB) > > We have taken this into account + some extra space. > > Our pg_xlog partition is ~8GB and under the restore process 486 WAL > files were created in this partition. The partition got full and > everything crashed. Disk is cheap. 8G is hardly anything any more; I'd buy some more disk for WAL. A -- Andrew Sullivan ajs@crankycanuck.ca
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 05/25/2011 02:08 PM, Andrew Sullivan wrote: > On Wed, May 25, 2011 at 01:30:35PM +0200, Rafael Martinez wrote: Thanks for your answer. >> >> According to the documentation [1] we can expect a maximum of >> (3 * checkpoint_segments + 1 segment files) WAL files in pg_xlog. > > That isn't what the documentation says. It says this: > > If, due to a short-term peak of log output rate, there are more > than 3 * checkpoint_segments + 1 segment files, the unneeded > segment files will be deleted instead of recycled until the system > gets back under this limit. > > Note that "unneeded". Obviously, you need more than that, probably > because you're restoring the database in one transaction (so none of > the files can be flushed). > We are not restoring the database in one transaction so it is not so clear to me you need all these extra WAL files around. If this is the case, I think the documentation is not clear enough and can lead to misunderstandings. If we cannot calculate roughly how much disk we need for WAL files, how can we be sure we won't get into another situation that needs even more space than the space we have allocated? [......] > > Disk is cheap. 8G is hardly anything any more; I'd buy some more disk for WAL. > That is not the problem, we have our data in a SAN. Our problem is that if what you are saying is true, how much diskspace is enough? regards, - -- Rafael Martinez Guerrero Center for Information Technology University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iEYEARECAAYFAk3c9vQACgkQBhuKQurGihQupQCeJaIsZFs/AhrMoP/jDS5R+gTj CwIAnA36bb8/bOrezC91lNQAvXP5nmMc =VOGM -----END PGP SIGNATURE-----
On Wed, May 25, 2011 at 1:08 PM, Andrew Sullivan <ajs@crankycanuck.ca> wrote: > Note that "unneeded". Obviously, you need more than that, probably > because you're restoring the database in one transaction (so none of > the files can be flushed). That's the way SQLServer and Oracle work, but not PostgreSQL. We can clear down WAL files even during a long running transaction. For us, "unneeded" means prior to the second-to-last checkpoint record. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, May 25, 2011 at 6:37 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Wed, May 25, 2011 at 1:08 PM, Andrew Sullivan <ajs@crankycanuck.ca> wrote: > >> Note that "unneeded". Obviously, you need more than that, probably >> because you're restoring the database in one transaction (so none of >> the files can be flushed). > > That's the way SQLServer and Oracle work, but not PostgreSQL. We can > clear down WAL files even during a long running transaction. > > For us, "unneeded" means prior to the second-to-last checkpoint record. I wonder if OP is outrunning his checkpoint writing?
On Wed, May 25, 2011 at 01:37:47PM +0100, Simon Riggs wrote: > > That's the way SQLServer and Oracle work, but not PostgreSQL. We can > clear down WAL files even during a long running transaction. > > For us, "unneeded" means prior to the second-to-last checkpoint record. Well, they're obviously not getting cleared down, so they must be needed. I know how Postgres is supposed to work in these cases, but in my experience you cannot rely on the OP's calculation to provide you with a true maximum. Pathological conditions result in a lot of WAL segments hanging around. What I really suspect is that this has to do with the way I/O scheduling works, particularly in the presence of the bgwriter. But I don't feel comfortable suggesting particular reasons for what I've experienced in production. What I _can_ tell you is that, when I've had to do large restores like this, I wanted plenty of overhead for WAL. ISTR dedicating 40G to WAL one time for a case like this. A -- Andrew Sullivan ajs@crankycanuck.ca
On Wed, May 25, 2011 at 6:47 AM, Andrew Sullivan <ajs@crankycanuck.ca> wrote: > On Wed, May 25, 2011 at 01:37:47PM +0100, Simon Riggs wrote: >> >> That's the way SQLServer and Oracle work, but not PostgreSQL. We can >> clear down WAL files even during a long running transaction. >> >> For us, "unneeded" means prior to the second-to-last checkpoint record. > > Well, they're obviously not getting cleared down, so they must be > needed. I know how Postgres is supposed to work in these cases, but > in my experience you cannot rely on the OP's calculation to provide > you with a true maximum. Pathological conditions result in a lot of > WAL segments hanging around. > > What I really suspect is that this has to do with the way I/O > scheduling works, particularly in the presence of the bgwriter. But I > don't feel comfortable suggesting particular reasons for what I've > experienced in production. What I _can_ tell you is that, when I've > had to do large restores like this, I wanted plenty of overhead for > WAL. ISTR dedicating 40G to WAL one time for a case like this. I have one db server on a SAN right now and it's got 20G allocated for WAL and 500G for the data/base dir, and have no problems with my WAL ever coming close to filling up. But if I did, I'd just shut down the db, move pg_xlog back to the data/base dir on the 500G drive set, restart, restore, shut down, move pg_xlog back, restart and go.
> These are the relevant parameters we have changed in postgresql.conf: > > archive_mode | off > checkpoint_segments | 128 > default_statistics_target | 100 > maintenance_work_mem | 512MB > max_fsm_pages | 800000 > max_fsm_relations | 8000 > shared_buffers | 10GB > wal_buffers | 512kB > wal_sync_method | fdatasync > work_mem | 16MB Are you sure you don't also have WAL archiving enabled? That can easily lead to too many WAL segments being kept if the archiving process can't keep up or fails. It'd really help if you could show part of your postgresql logs from the during restore. -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 05/25/2011 02:55 PM, Craig Ringer wrote: >> These are the relevant parameters we have changed in postgresql.conf: >> >> archive_mode | off >> checkpoint_segments | 128 >> default_statistics_target | 100 >> maintenance_work_mem | 512MB >> max_fsm_pages | 800000 >> max_fsm_relations | 8000 >> shared_buffers | 10GB >> wal_buffers | 512kB >> wal_sync_method | fdatasync >> work_mem | 16MB > > > Are you sure you don't also have WAL archiving enabled? That can easily > lead to too many WAL segments being kept if the archiving process can't > keep up or fails. > I am sure. I have double checked. # grep archive_mode postgresql.conf archive_mode = off # SHOW archive_mode; archive_mode - -------------- off (1 row) > It'd really help if you could show part of your postgresql logs from the > during restore. > This happens when restoring a 30GB database. It looks like it stops when creating a GIN index on a 4217 MB table. See my first post for the relevant information from the log file (No relevant information/errors before this) STATEMENT: CREATE INDEX attachments_textsearch ON attachments USING gin (textsearchable); Where 'textsearchable' is a tsvector column. In the source database, this index is 7989 MB. regards, - -- Rafael Martinez Guerrero Center for Information Technology University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iEYEARECAAYFAk3dAIIACgkQBhuKQurGihSRUQCfTHrPPoc0ean2O99269KOIEZY fjIAoIHoVIw4QOx0s52wWy4XBh9gH1Os =N0TH -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 05/25/2011 02:47 PM, Andrew Sullivan wrote: > On Wed, May 25, 2011 at 01:37:47PM +0100, Simon Riggs wrote: >> >> That's the way SQLServer and Oracle work, but not PostgreSQL. We can >> clear down WAL files even during a long running transaction. >> >> For us, "unneeded" means prior to the second-to-last checkpoint record. > > Well, they're obviously not getting cleared down, so they must be > needed ...... I wonder if full_page_writes has something to do with this. - From the documentation: ".... If full_page_writes is set (as is the default), there is another factor to consider. To ensure data page consistency, the first modification of a data page after each checkpoint results in logging the entire page content ...." It looks like we get this 'problem' when creating a GIN index on a tsvector column on a 4217 MB table (This takes longer than checkpoint_timeout) - -- Rafael Martinez Guerrero Center for Information Technology University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iEYEARECAAYFAk3dA14ACgkQBhuKQurGihTqtgCdH+4trtuHqqqq+x0CBkYj5Vth ZFMAninQQqtE4+ZBOA/Gff+v1Zm8xo73 =1Tbn -----END PGP SIGNATURE-----