Обсуждение: Space for pg_dump
Hi, How much space does a pg_dump usually take? One of my databases is 600GB.... How much space do I need to dump this? Thanks
SHARMILA JOTHIRAJAH wrote: > Hi, > How much space does a pg_dump usually take? > One of my databases is 600GB.... How much space do I need to dump this? That will depend on how many indexes etc. make up that 600GB. Also how compressible your data is if you are using -Fc. Certainly less than a "live" database, but by how much it's difficult to say. -- Richard Huxton Archonet Ltd
On Tue, Mar 31, 2009 at 7:57 AM, SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com> wrote: > > Hi, > How much space does a pg_dump usually take? > One of my databases is 600GB.... How much space do I need to dump this? Note you can find out by doing: pg_dump dbname | wc
--- On Tue, 3/31/09, Scott Marlowe <scott.marlowe@gmail.com> wrote: > From: Scott Marlowe <scott.marlowe@gmail.com> > Subject: Re: [GENERAL] Space for pg_dump > To: "SHARMILA JOTHIRAJAH" <sharmi_jo@yahoo.com> > Cc: "General postgres mailing list" <pgsql-general@postgresql.org> > Date: Tuesday, March 31, 2009, 11:49 AM > On Tue, Mar 31, 2009 at 7:57 AM, > SHARMILA JOTHIRAJAH > <sharmi_jo@yahoo.com> > wrote: > > > > Hi, > > How much space does a pg_dump usually take? > > One of my databases is 600GB.... How much space do I > need to dump this? > > Note you can find out by doing: > > pg_dump dbname | wc > Yes...I could find the space used after creating the dump. But I need to pre-allocate some space for storing these dumps (there are other databases too that needs to be dumped). SoIm trying to find a space estimate .... Do you have a rough estimate of pg_dump in general... like 1/4 th of the database size or something like that...I just needa rough estimate for now Thanks Sharmila
On Tue, Mar 31, 2009 at 9:57 AM, SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com> wrote: > But I need to pre-allocate some space for storing these dumps (there are other databases too that needs to be dumped).So Im trying to find a space estimate .... > Do you have a rough estimate of pg_dump in general... like 1/4 th of the database size or something like that...I justneed a rough estimate for now Sadly, there is no exact maths for such things. If your database has tons of indexes and such, it might be 20 or 100 times bigger on disk than it will be during backup. If it's all compressible text with few indexes, it might be a 1:1 or so size. You can't really tell without running pg_dump. The advantage of doing pg_dump|wc -l is that the db doesn't have to be stored somewhere.
On Tue, Mar 31, 2009 at 9:57 AM, SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com> wrote: > --- On Tue, 3/31/09, Scott Marlowe <scott.marlowe@gmail.com> wrote: > >> From: Scott Marlowe <scott.marlowe@gmail.com> >> Subject: Re: [GENERAL] Space for pg_dump >> To: "SHARMILA JOTHIRAJAH" <sharmi_jo@yahoo.com> >> Cc: "General postgres mailing list" <pgsql-general@postgresql.org> >> Date: Tuesday, March 31, 2009, 11:49 AM >> On Tue, Mar 31, 2009 at 7:57 AM, >> SHARMILA JOTHIRAJAH >> <sharmi_jo@yahoo.com> >> wrote: >> > >> > Hi, >> > How much space does a pg_dump usually take? >> > One of my databases is 600GB.... How much space do I >> need to dump this? >> >> Note you can find out by doing: >> >> pg_dump dbname | wc >> > Yes...I could find the space used after creating the dump. > > But I need to pre-allocate some space for storing these dumps (there are other databases too that needs to be dumped).So Im trying to find a space estimate .... > Do you have a rough estimate of pg_dump in general... like 1/4 th of the database size or something like that...I justneed a rough estimate for now It's hard to say. Why can't you RUN the example command on each db and see for yourself? It doesn't create a backup, per se, it just creates one and hands it to wc to see how big it is.
--- On Tue, 3/31/09, Scott Marlowe <scott.marlowe@gmail.com> wrote: > From: Scott Marlowe <scott.marlowe@gmail.com> > Subject: Re: [GENERAL] Space for pg_dump > To: "SHARMILA JOTHIRAJAH" <sharmi_jo@yahoo.com> > Cc: "General postgres mailing list" <pgsql-general@postgresql.org> > Date: Tuesday, March 31, 2009, 12:07 PM > On Tue, Mar 31, 2009 at 9:57 AM, > SHARMILA JOTHIRAJAH > <sharmi_jo@yahoo.com> > wrote: > > But I need to pre-allocate some space for storing > these dumps (there are other databases too that needs to be > dumped). So Im trying to find a space estimate .... > > Do you have a rough estimate of pg_dump in general... > like 1/4 th of the database size or something like that...I > just need a rough estimate for now > > Sadly, there is no exact maths for such things. If > your database has > tons of indexes and such, it might be 20 or 100 times > bigger on disk > than it will be during backup. If it's all > compressible text with few > indexes, it might be a 1:1 or so size. You can't > really tell without > running pg_dump. The advantage of doing pg_dump|wc -l > is that the db > doesn't have to be stored somewhere. > Thanks...I started pg_dump|wc -l and its running now Another question is that wc -l gives you the no of lines...right... What is the size of each line...or how do you get the size from that?
On Tue, Mar 31, 2009 at 10:31 AM, SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com> wrote: > > > > --- On Tue, 3/31/09, Scott Marlowe <scott.marlowe@gmail.com> wrote: > >> From: Scott Marlowe <scott.marlowe@gmail.com> >> wrote: >> > But I need to pre-allocate some space for storing >> these dumps (there are other databases too that needs to be >> dumped). So Im trying to find a space estimate .... >> > Do you have a rough estimate of pg_dump in general... >> like 1/4 th of the database size or something like that...I >> just need a rough estimate for now >> >> Sadly, there is no exact maths for such things. If >> your database has >> tons of indexes and such, it might be 20 or 100 times >> bigger on disk >> than it will be during backup. If it's all >> compressible text with few >> indexes, it might be a 1:1 or so size. You can't >> really tell without >> running pg_dump. The advantage of doing pg_dump|wc -l >> is that the db >> doesn't have to be stored somewhere. >> > Thanks...I started pg_dump|wc -l and its running now > Another question is that wc -l gives you the no of lines...right... > What is the size of each line...or how do you get the size from that? Whoops, pretty sure my first reply was just wc. the -l is a habit from using to count lines. Do it again without the -l... sorry.
On Tue, Mar 31, 2009 at 08:57:28AM -0700, SHARMILA JOTHIRAJAH wrote: > > Note you can find out by doing: > > pg_dump dbname | wc > Yes...I could find the space used after creating the dump. > But I need to pre-allocate some space for storing these dumps I'm not sure if you realize that you don't need any space for the command that Scott showed (pg_dump dbname | wc). it will not write anything to disk. Best regards, depesz -- Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/ jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007
On Tue, 31 Mar 2009, Scott Marlowe wrote: > Sadly, there is no exact maths for such things. If your database has > tons of indexes and such, it might be 20 or 100 times bigger on disk > than it will be during backup. If it's all compressible text with few > indexes, it might be a 1:1 or so size. Since running an entire pgdump can take forever on a big database, what I usually do here is start by running the disk usage query at http://wiki.postgresql.org/wiki/Disk_Usage That lets you better see index vs. table usage. Then, for the bigger tables, I do something like this: psql -c "COPY (select * from bigtable limit 100000) to stdout" | gzip > bigtable.gz gzip -l bigtable.gz That lets you get sample a decent sized chunk of the table to figure out what compression ratio you're likely to get on the data in there. Given all the table sizes and a compression ratio estimate, from there you can make a fairly accurate guess of what the whole dump is going to take up, presuming your data is fairly evenly distributed such that the first records that come back are typical of the whole thing. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith wrote: >On Tue, 31 Mar 2009, Scott Marlowe wrote: > >> Sadly, there is no exact maths for such things. If your database has >> tons of indexes and such, it might be 20 or 100 times bigger on disk >> than it will be during backup. If it's all compressible text with few >> indexes, it might be a 1:1 or so size. > >Since running an entire pgdump can take forever on a big database, what I >usually do here is start by running the disk usage query at >http://wiki.postgresql.org/wiki/Disk_Usage Interesting. However, the query gives an error if the table name contains upper case characters, like in my case "tblConnections": ERROR: relation "public.tblconnections" does not exist. Replacing all occurences of <relname> by <'"' || relname || '"'> fixes the error. Rainer
Rainer Bauer <usenet@munnin.com> writes: > Greg Smith wrote: >> Since running an entire pgdump can take forever on a big database, what I >> usually do here is start by running the disk usage query at >> http://wiki.postgresql.org/wiki/Disk_Usage > Interesting. However, the query gives an error if the table name contains > upper case characters, like in my case "tblConnections": > ERROR: relation "public.tblconnections" does not exist. > Replacing all occurences of <relname> by <'"' || relname || '"'> fixes the > error. That still fails if the table name contains double quotes. A proper solution is to use the table OID --- I've corrected the example. regards, tom lane
Hi, > Rainer Bauer <usenet@munnin.com> writes: >> Greg Smith wrote: >>> Since running an entire pgdump can take forever on a big database, >>> what I >>> usually do here is start by running the disk usage query at >>> http://wiki.postgresql.org/wiki/Disk_Usage > >> Interesting. However, the query gives an error if the table name >> contains >> upper case characters, like in my case "tblConnections": > >> ERROR: relation "public.tblconnections" does not exist. > >> Replacing all occurences of <relname> by <'"' || relname || '"'> >> fixes the >> error. > > That still fails if the table name contains double quotes. A proper > solution is to use the table OID --- I've corrected the example. If you have big toast tables you get wrong results with the query suggested at http://wiki.postgresql.org/wiki/Disk_Usage because it takes the toasted values not into account. Simple example (take a look at the first row -> public.media): SELECT nspname || '.' || relname AS "relation", pg_size_pretty(pg_relation_size(C.oid)) AS "size" FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE nspname NOT IN ('pg_catalog', 'information_schema') AND nspname !~ '^pg_toast' ORDER BY pg_relation_size(C.oid) DESC LIMIT 20; relation | size ---------------------------+------------ public.media | 727 MB public.identifier_idx | 342 MB public.media_pk | 190 MB public.mediateypes_pk | 16 kB public.mediaformats_uk | 16 kB public.contentsizes_pk | 16 kB public.contenttype_pk | 16 kB public.mediaformats_pk | 16 kB public.contenttypes | 8192 bytes public.media_media_id_seq | 8192 bytes public.contentsizes | 8192 bytes public.mediaformats | 8192 bytes public.mediatypes | 8192 bytes public.vmedia2 | 0 bytes public.vmedia | 0 bytes (15 rows) Now a fixed query which gets the sizes of the related pg_toast_oid and pg_toast_oid_index too: SELECT nspname || '.' || relname AS "relation", pg_size_pretty(pg_relation_size(C.oid) + COALESCE((SELECT pg_relation_size(C2.oid) FROM pg_class C2 WHERE C2.relname = 'pg_toast_' || C.oid ),0::bigint) + COALESCE((SELECT pg_relation_size(C3.oid) FROM pg_class C3 WHERE C3.relname = 'pg_toast_' || C.oid || '_index'), 0::bigint) ) AS "size" FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE nspname NOT IN ('pg_catalog', 'information_schema') AND nspname !~ '^pg_toast' ORDER BY pg_relation_size(C.oid) + COALESCE((SELECT pg_relation_size(C2.oid) FROM pg_class C2 WHERE C2.relname = 'pg_toast_' || C.oid ),0::bigint) + COALESCE((SELECT pg_relation_size(C3.oid) FROM pg_class C3 WHERE C3.relname = 'pg_toast_' || C.oid || '_index'), 0::bigint) DESC LIMIT 20; relation | size ---------------------------+------------ public.media | 164 GB public.identifier_idx | 342 MB public.media_pk | 190 MB public.contenttype_pk | 16 kB public.contenttypes | 16 kB public.contentsizes | 16 kB public.contentsizes_pk | 16 kB public.mediateypes_pk | 16 kB public.mediaformats | 16 kB public.mediatypes | 16 kB public.mediaformats_pk | 16 kB public.mediaformats_uk | 16 kB public.media_media_id_seq | 8192 bytes public.vmedia | 0 bytes public.vmedia2 | 0 bytes (15 rows) There is a difference of about 163 GB (which is from the toast of public.media) relation | size ---------------------------------+------------ pg_toast.pg_toast_6366088 | 162 GB pg_toast.pg_toast_6366088_index | 1832 MB public.media | 727 MB If you have only small or no toast tables the query from the wiki will be working for you. regards, jan
Jan Otto wrote: > If you have big toast tables you get wrong results with the query > suggested > at http://wiki.postgresql.org/wiki/Disk_Usage because it takes the > toasted > values not into account. > Now a fixed query which gets the sizes of the related pg_toast_oid and > pg_toast_oid_index too: Note that there is also the pg_total_relation_size function which will report the total table size, including toast table and indexes. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Apr 20, 2009, at 7:35 PM, Alvaro Herrera wrote: > Jan Otto wrote: > >> If you have big toast tables you get wrong results with the query >> suggested >> at http://wiki.postgresql.org/wiki/Disk_Usage because it takes the >> toasted >> values not into account. > > >> Now a fixed query which gets the sizes of the related pg_toast_oid >> and >> pg_toast_oid_index too: > > Note that there is also the pg_total_relation_size function which will > report the total table size, including toast table and indexes. Ahhh, it was a long day... sometimes i think too complicated ;-) regards, jan
On Mon, 20 Apr 2009, Jan Otto wrote: > If you have big toast tables you get wrong results with the query suggested > at http://wiki.postgresql.org/wiki/Disk_Usage because it takes the toasted > values not into account. I can't recall why I wrote that to filter out things in the pg_toast namespace in the first place. I just took the easy way out here--the query has been updated to not exclude relations in that namespace anymore and I link to the TOAST docs for more details. I consider combining all the values together, as you did in your example code and as pg_total_relation_size does, as a different type of report. Accordingly, I just updated with examples of both types, as well as something to work against pre-8.1 databases. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD