Обсуждение: Survey on backing up unlogged tables: help us with PostgreSQL development!
Folks, Please help us resolve a discussion on -hackers. PostgreSQL 9.1 is likely to have, as a feature, the ability to create tables which are "unlogged", meaning that they are not added to the transaction log, and will be truncated (emptied) on database restart. Such tables are intended for highly volatile, but not very valuable, data, such as session statues, application logs, etc. The question is, how would you, as a DBA, expect pg_dump backups to treat unlogged tables? Backing them up by default has the potential to both cause performance drag on the unlogged table and make your backups take longer unless you remember to omit them. Not backing them up by default has the drawback that if you forget --include-unlogged switch, and shut the database down, any unlogged data is gone. How would you *expect* unlogged tables to behave? Survey is here: https://spreadsheets.google.com/ccc?key=0AoeuP3g2YZsFdDFnT2VKNC1FQ0pQNmJGS2dWMTNYMEE&hl=en&authkey=CISbwuYD -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!
От
"Karsten Hilbert"
Дата:
> PostgreSQL 9.1 is likely to have, as a feature, the ability to create > tables which are "unlogged", meaning that they are not added to the > transaction log, and will be truncated (emptied) on database restart. > Such tables are intended for highly volatile, but not very valuable, > data, such as session statues, application logs, etc. > > The question is, how would you, as a DBA, expect pg_dump backups to > treat unlogged tables? Backing them up by default has the potential to > both cause performance drag on the unlogged table and make your backups > take longer unless you remember to omit them. Not backing them up by > default has the drawback that if you forget --include-unlogged switch, > and shut the database down, any unlogged data is gone. How would you > *expect* unlogged tables to behave? ALTER DATABASE ... SET PG_DUMP_INCLUDE_UNLOGGED TO ON/OFF with default OFF. That way I can think about it once per database *before* I am in the situation when I regret forgetting. (pg_dump would still support --include-unlogged, defaulting to the database default) Karsten -- Neu: GMX De-Mail - Einfach wie E-Mail, sicher wie ein Brief! Jetzt De-Mail-Adresse reservieren: http://portal.gmx.net/de/go/demail
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!
От
Derrick Rice
Дата:
On Tue, Nov 16, 2010 at 5:46 PM, Josh Berkus <josh@agliodbs.com> wrote:
Survey is here:
https://spreadsheets.google.com/ccc?key=0AoeuP3g2YZsFdDFnT2VKNC1FQ0pQNmJGS2dWMTNYMEE&hl=en&authkey=CISbwuYD
This is a link to a read-only spreadsheet for me.
Derrick
On Nov 16, 2010, at 3:46 PM, Josh Berkus wrote: > ...and will be truncated (emptied) on database restart. I think that's key. Anything that won't survive a database restart, I sure don't expect to survive backup & restore. -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice
On 11/16/2010 03:24 PM, Karsten Hilbert wrote: >> PostgreSQL 9.1 is likely to have, as a feature, the ability to create >> tables which are "unlogged", meaning that they are not added to the >> transaction log, and will be truncated (emptied) on database restart. >> Such tables are intended for highly volatile, but not very valuable, >> data, such as session statues, application logs, etc. I have been following loosely this discussion on HACKERS, but seem to have missed the part about truncating such tables on server restart. I have an immediate use for unlogged tables (application logs), but having them truncate after even a clean server restart would be a show stopper. I keep log data for 2 months, and never back it up. Having it disappear after a system melt down is acceptable, but not after a clean restart. That would be utterly ridiculous! As to the topic of the thread, I think pg_dump needs to dump unlogged tables by default. -Glen
> This is a link to a read-only spreadsheet for me. You're correct. Darn those Google unreadable links! https://spreadsheets.google.com/viewform?formkey=dDFnT2VKNC1FQ0pQNmJGS2dWMTNYMEE6MQ That should work. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Tue, Nov 16, 2010 at 7:21 PM, Glen Parker <glenebob@nwlink.com> wrote:
+1 -- Is there a technical reason to do a TRUNCATE on restart? I'd feel better if I could just have unlogged tables that survive unless something like a power-outage etc... I'm in the exact same boat here, lots of big logging tables that need to survive reboot, but are frustrating when it comes to WAL generation.
On 11/16/2010 03:24 PM, Karsten Hilbert wrote:I have been following loosely this discussion on HACKERS, but seem to have missed the part about truncating such tables on server restart.PostgreSQL 9.1 is likely to have, as a feature, the ability to create
tables which are "unlogged", meaning that they are not added to the
transaction log, and will be truncated (emptied) on database restart.
Such tables are intended for highly volatile, but not very valuable,
data, such as session statues, application logs, etc.
I have an immediate use for unlogged tables (application logs), but having them truncate after even a clean server restart would be a show stopper. I keep log data for 2 months, and never back it up. Having it disappear after a system melt down is acceptable, but not after a clean restart. That would be utterly ridiculous!
+1 -- Is there a technical reason to do a TRUNCATE on restart? I'd feel better if I could just have unlogged tables that survive unless something like a power-outage etc... I'm in the exact same boat here, lots of big logging tables that need to survive reboot, but are frustrating when it comes to WAL generation.
As to the topic of the thread, I think pg_dump needs to dump unlogged tables by default.
-1 I disagree. I'm fine with having the loaded weapon pointed at my foot.
--Scott
--Scott
-Glen
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Scott Mead <scott@scottrmead.com> writes: > +1 -- Is there a technical reason to do a TRUNCATE on restart? I'd feel > better if I could just have unlogged tables that survive unless something > like a power-outage etc... I'm in the exact same boat here, lots of big > logging tables that need to survive reboot, but are frustrating when it > comes to WAL generation. Keep in mind that these tables are *not* going to survive any type of backend crash. Maybe my perceptions are colored because I deal with Postgres bugs all the time, but I think of backend crashes as pretty common, certainly much more common than an OS-level crash. I'm afraid you may be expecting unlogged tables to be significantly more robust than they really will be. regards, tom lane
On 11/16/2010 04:46 PM, Josh Berkus wrote: > PostgreSQL 9.1 is likely to have, as a feature, the ability to create > tables which are "unlogged", meaning that they are not added to the > transaction log, and will be truncated (emptied) on database restart. > Such tables are intended for highly volatile, but not very valuable, > data, such as session statues, application logs, etc. With the current patches, the data survives a restart just fine. I'd like to vote for: safe restart = save data bad crashy restart = drop date -Andy
On 11/16/2010 05:15 PM, Tom Lane wrote: > Keep in mind that these tables are *not* going to survive any type of > backend crash. Maybe my perceptions are colored because I deal with > Postgres bugs all the time, but I think of backend crashes as pretty > common, certainly much more common than an OS-level crash. I'm afraid > you may be expecting unlogged tables to be significantly more robust > than they really will be. But an individual backend crash != server restart, unless that's changed since 8.1 (yes, I'm still stuck on 8.1 :( )... So if I, for example, kill -9 a backend that's busy updating a nonlogged table, the table could be corrupted, but it wouldn't be truncated (and could cause trouble) for possibly weeks until the postmaster is restarted. Conversely, even if no backend crash occurs whatsoever, all the nonlogged tables would be truncated after an orderly postmaster restart. Just doesn't make sense to me. -Glen
> With the current patches, the data survives a restart just fine. Per -hackers, that's not guarenteed. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On 11/16/2010 07:33 PM, Josh Berkus wrote: > >> With the current patches, the data survives a restart just fine. > > Per -hackers, that's not guarenteed. > Ah, I just read the thread on -hackers. And yea, my system had 24 hours to write/flush/etc before I'd restarted it momentsago as a test. I have NOT tested a bunch of writes and then quickly restarting PG. I CAN report that given 24 hours,your data will survive a restart :-) -Andy
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!
От
Scott Marlowe
Дата:
On Tue, Nov 16, 2010 at 5:23 PM, Scott Ribe <scott_ribe@elevated-dev.com> wrote: > On Nov 16, 2010, at 3:46 PM, Josh Berkus wrote: > >> ...and will be truncated (emptied) on database restart. > > I think that's key. Anything that won't survive a database restart, I sure don't expect to survive backup & restore. I'd vote for backing up the schema of an unlogged table so it's there on a restore.
On Tue, Nov 16, 2010 at 8:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Not surviving a crash is fine. IMHO, if we'd lose data in myisam files, I'm happy to lose them on pg nologging tables. I just want it to survive a stop / start operation. The benefits (think of multi-host syslog consolidation with FTS <drools> ) on these tables FAR outweigh the off-chance that a crash will cause me some heartache.
Bugs? What bugs :)
Honestly, I've only had a couple of *Prod* crashes (knocks on wood), but the need to restart occurs every now and then.
--Scott
Scott Mead <scott@scottrmead.com> writes:Keep in mind that these tables are *not* going to survive any type of
> +1 -- Is there a technical reason to do a TRUNCATE on restart? I'd feel
> better if I could just have unlogged tables that survive unless something
> like a power-outage etc... I'm in the exact same boat here, lots of big
> logging tables that need to survive reboot, but are frustrating when it
> comes to WAL generation.
backend crash.
Not surviving a crash is fine. IMHO, if we'd lose data in myisam files, I'm happy to lose them on pg nologging tables. I just want it to survive a stop / start operation. The benefits (think of multi-host syslog consolidation with FTS <drools> ) on these tables FAR outweigh the off-chance that a crash will cause me some heartache.
Maybe my perceptions are colored because I deal with
Postgres bugs all the time, but I think of backend crashes as pretty
common, certainly much more common than an OS-level crash. I'm afraid
you may be expecting unlogged tables to be significantly more robust
than they really will be.
Bugs? What bugs :)
Honestly, I've only had a couple of *Prod* crashes (knocks on wood), but the need to restart occurs every now and then.
--Scott
regards, tom lane
On 11/16/2010 05:33 PM, Josh Berkus wrote: > >> With the current patches, the data survives a restart just fine. > > Per -hackers, that's not guarenteed. > Which is fine. If you choose to set a table to nonlogged, that implies that you accept the risk of corrupted data, or that you don't "get it", in which case . It should not however, imply that you want it all thrown out every so often for no good reason. If you do wish to have the data tossed out for no good reason every so often, then there ought to be a separate attribute to control that. I'm really having trouble seeing how such behavior would be desirable enough to ever have the server do it for you, on its terms rather than yours. -Glen
> If you do wish to have the data tossed out for no good reason every so > often, then there ought to be a separate attribute to control that. I'm > really having trouble seeing how such behavior would be desirable enough > to ever have the server do it for you, on its terms rather than yours. I don't quite follow you. The purpose of unlogged tables is for data which is disposable in the event of downtime; the classic example is the a user_session_status table. In the event of a restart, all user sessions are going to be invalid anyway. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
> I'd vote for backing up the schema of an unlogged table so it's there > on a restore. The schema is always there. What may or may not be there is the data. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On 11/16/2010 07:55 PM, Josh Berkus wrote: > >> If you do wish to have the data tossed out for no good reason every so >> often, then there ought to be a separate attribute to control that. I'm >> really having trouble seeing how such behavior would be desirable enough >> to ever have the server do it for you, on its terms rather than yours. > > I don't quite follow you. The purpose of unlogged tables is for data > which is disposable in the event of downtime; the classic example is the > a user_session_status table. In the event of a restart, all user > sessions are going to be invalid anyway. > Why? If you dont blow away the sessions table, everything should be fine. -Andy
On 11/16/2010 05:55 PM, Josh Berkus wrote: > >> If you do wish to have the data tossed out for no good reason every so >> often, then there ought to be a separate attribute to control that. I'm >> really having trouble seeing how such behavior would be desirable enough >> to ever have the server do it for you, on its terms rather than yours. > > I don't quite follow you. The purpose of unlogged tables is for data > which is disposable in the event of downtime; the classic example is the > a user_session_status table. In the event of a restart, all user > sessions are going to be invalid anyway. > As was already mentioned, application logs. Unlogged tables would be perfect for that, provided they don't go *poof* every now and then for no good reason. Nobody's going to be too heart broken if a handful of log records go missing, or get garbled, after a server crash or power outage. Delete 'em all after every restart though, and that's a problem. -Glen
Man, the number of misunderstandings in this thread is staggering. Let me try to explain what the proposed feature will and will not do. 1. The system catalog entries for all tables will be wal-logged. So schema (DDL) will survive a crash. There wouldn't be any way to make it not do that, because we can't wal-log only some updates to a particular table, and that includes the catalogs in particular. 2. What's proposed as the new feature is that specific non-system tables can be marked as unlogged, meaning that WAL entries won't be made for changes in those tables' contents (nor their indexes' contents). So we can't guarantee that the contents of such tables will be correct or consistent after a crash. The proposed feature deals with this by forcibly truncating all such tables after a crash, thus ensuring that they're consistent though not populated. So the possible use-cases for such tables are limited to where (a) you can repopulate the tables on demand, or (b) you don't really care about losing data on a crash. 3. There's a lot of wishful thinking here about what constitutes a crash. A backend crash *is* a crash, even if the postmaster keeps going. Data that had been in shared buffers doesn't get written out in such a scenario (and if we tried, it might be corrupt anyway). So unlogged tables would be corrupt and in need of truncation after such an event. Obviously, the same goes for an OS-level crash or power failure. 4. The last bit of discussion on -hackers concerned what to do in the case where the server got shut down cleanly. If it was shut down cleanly, then any data for unlogged tables would have been written out from shared buffers ... but did the data make it to disk? There's no easy way to know that. In the event of an OS crash or power failure shortly after server shutdown, it's possible that the unlogged tables would be corrupt. So Robert's initial proposal includes truncating unlogged tables at any database startup, even if the previous shutdown was clean. Some (including me) are arguing that that is unnecessarily strict; but you do have to realize that you're taking some risk with data validity if it doesn't do that. The bottom line here is that you really can only use the feature for data that you're willing to accept losing on no notice. Allowing the data to persist across clean shutdowns would probably improve usability a bit, but it's not changing that fundamental fact. regards, tom lane
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!
От
Karsten Hilbert
Дата:
On Tue, Nov 16, 2010 at 10:25:13PM -0500, Tom Lane wrote: > 4. The last bit of discussion on -hackers concerned what to do in > the case where the server got shut down cleanly. If it was shut > down cleanly, then any data for unlogged tables would have been > written out from shared buffers ... but did the data make it to disk? > There's no easy way to know that. In the event of an OS crash or > power failure shortly after server shutdown, it's possible that > the unlogged tables would be corrupt. Aaah, indeed. > So Robert's initial proposal > includes truncating unlogged tables at any database startup, even > if the previous shutdown was clean. Sounds reasonable. > Some (including me) are arguing > that that is unnecessarily strict; but you do have to realize that > you're taking some risk with data validity Don't. We've always liked PostgreSQL for that. Or at least let us point the gun at our feet ourselves (such as with fsync). Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
On 2010-11-17 02:55, Josh Berkus wrote: >> If you do wish to have the data tossed out for no good reason every so >> often, then there ought to be a separate attribute to control that. I'm >> really having trouble seeing how such behavior would be desirable enough >> to ever have the server do it for you, on its terms rather than yours. > I don't quite follow you. The purpose of unlogged tables is for data > which is disposable in the event of downtime; the classic example is the > a user_session_status table. That sounds an awful lot like temporary tables. Perhaps the biggest problem of "unlogged tables" is that it doesn't connote "truncate at restart". With the truncate an unlogged table is more like a 'cluster temporary table'. While this is a very ugly name, I wonder if an DBA would expect a cluster temporary table to be backed up by default. I just filled in the questionaire, and to my surprise I agreed more with the 'don't backup by default' question. The reason is that because the question also said: because it contains disposable data. Maybe a better question would have been: would you expect pg_dump to backup unlogged tables, at the point that you didn't more about them than that they are not written to the WAL? In that case I'd say: yes. regards, Yeb Havinga
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!
От
Derrick Rice
Дата:
On Wed, Nov 17, 2010 at 8:20 AM, Yeb Havinga <yebhavinga@gmail.com> wrote:
That sounds an awful lot like temporary tables.
A lot like a GLOBAL temporary table, which isn't currently supported.
Is there a difference between a global temporary table (if such a thing existed in PostgreSQL) and an unlogged table?
Derrick
On Tue, Nov 16, 2010 at 10:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I would rather be allowed to decide that for myself.
Right, just let *me* decide, that's all.
It is too strict, it makes the feature barely more usable than a temp table.
As a DBA, I realize the implication of the feature:
*) b0rked indexes
*) b0rked data
*) Not knowing what's good and what's bad
*) Bad reports
*) Bad Bi
etc..., etc... etc...
Still, I'd rather be allowed to make the decision here. I think that having the database try to enforce integrity on something i've marked as 'corruptable' (via the 'unlogged' flag) will be a constant fight between me and the system. In the end, I'd just not use the feature.
Agreed, and that's fine. IMHO, it improves the usability 10 fold. Having it truncated on server restart is useful for only a fraction of the use-cases for this feature.
--Scott
Man, the number of misunderstandings in this thread is staggering.
Let me try to explain what the proposed feature will and will not do.
1. The system catalog entries for all tables will be wal-logged.
So schema (DDL) will survive a crash. There wouldn't be any way
to make it not do that, because we can't wal-log only some updates
to a particular table, and that includes the catalogs in particular.
Gotcha
2. What's proposed as the new feature is that specific non-system
tables can be marked as unlogged, meaning that WAL entries won't
be made for changes in those tables' contents (nor their indexes'
contents). So we can't guarantee that the contents of such tables
will be correct or consistent after a crash. The proposed feature
deals with this by forcibly truncating all such tables after a crash,
thus ensuring that they're consistent though not populated. So the
possible use-cases for such tables are limited to where (a) you can
repopulate the tables on demand, or (b) you don't really care about
losing data on a crash.
I would rather be allowed to decide that for myself.
3. There's a lot of wishful thinking here about what constitutes a
crash. A backend crash *is* a crash, even if the postmaster keeps
going. Data that had been in shared buffers doesn't get written out
in such a scenario (and if we tried, it might be corrupt anyway). So
unlogged tables would be corrupt and in need of truncation after such an
event. Obviously, the same goes for an OS-level crash or power failure.
Right, just let *me* decide, that's all.
4. The last bit of discussion on -hackers concerned what to do in
the case where the server got shut down cleanly. If it was shut
down cleanly, then any data for unlogged tables would have been
written out from shared buffers ... but did the data make it to disk?
There's no easy way to know that. In the event of an OS crash or
power failure shortly after server shutdown, it's possible that
the unlogged tables would be corrupt. So Robert's initial proposal
includes truncating unlogged tables at any database startup, even
if the previous shutdown was clean. Some (including me) are arguing
that that is unnecessarily strict; but you do have to realize that
you're taking some risk with data validity if it doesn't do that.
It is too strict, it makes the feature barely more usable than a temp table.
As a DBA, I realize the implication of the feature:
*) b0rked indexes
*) b0rked data
*) Not knowing what's good and what's bad
*) Bad reports
*) Bad Bi
etc..., etc... etc...
Still, I'd rather be allowed to make the decision here. I think that having the database try to enforce integrity on something i've marked as 'corruptable' (via the 'unlogged' flag) will be a constant fight between me and the system. In the end, I'd just not use the feature.
The bottom line here is that you really can only use the feature
for data that you're willing to accept losing on no notice.
Allowing the data to persist across clean shutdowns would probably
improve usability a bit, but it's not changing that fundamental fact.
Agreed, and that's fine. IMHO, it improves the usability 10 fold. Having it truncated on server restart is useful for only a fraction of the use-cases for this feature.
--Scott
regards, tom lane
On Wed, Nov 17, 2010 at 10:38 AM, Scott Mead <scott@scottrmead.com> wrote:
Now that I've just sent that last piece, what about a 'truncate on restart' option that is defaulted to on? That way, the community feels good knowing that we're trying to protect people from themselves, but like the 'fsync' feature, I can load the gun and pull the trigger if I really want to. I'd like to see that so even if there is a server crash, it doesn't truncate. That way, i can rename the garbage table if I want, create a new one for all new data and then be allowed to glean what I can from the last one.
--Scott
On Tue, Nov 16, 2010 at 10:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:Man, the number of misunderstandings in this thread is staggering.
Let me try to explain what the proposed feature will and will not do.
1. The system catalog entries for all tables will be wal-logged.
So schema (DDL) will survive a crash. There wouldn't be any way
to make it not do that, because we can't wal-log only some updates
to a particular table, and that includes the catalogs in particular.Gotcha
2. What's proposed as the new feature is that specific non-system
tables can be marked as unlogged, meaning that WAL entries won't
be made for changes in those tables' contents (nor their indexes'
contents). So we can't guarantee that the contents of such tables
will be correct or consistent after a crash. The proposed feature
deals with this by forcibly truncating all such tables after a crash,
thus ensuring that they're consistent though not populated. So the
possible use-cases for such tables are limited to where (a) you can
repopulate the tables on demand, or (b) you don't really care about
losing data on a crash.
I would rather be allowed to decide that for myself.
3. There's a lot of wishful thinking here about what constitutes a
crash. A backend crash *is* a crash, even if the postmaster keeps
going. Data that had been in shared buffers doesn't get written out
in such a scenario (and if we tried, it might be corrupt anyway). So
unlogged tables would be corrupt and in need of truncation after such an
event. Obviously, the same goes for an OS-level crash or power failure.
Right, just let *me* decide, that's all.
4. The last bit of discussion on -hackers concerned what to do in
the case where the server got shut down cleanly. If it was shut
down cleanly, then any data for unlogged tables would have been
written out from shared buffers ... but did the data make it to disk?
There's no easy way to know that. In the event of an OS crash or
power failure shortly after server shutdown, it's possible that
the unlogged tables would be corrupt. So Robert's initial proposal
includes truncating unlogged tables at any database startup, even
if the previous shutdown was clean. Some (including me) are arguing
that that is unnecessarily strict; but you do have to realize that
you're taking some risk with data validity if it doesn't do that.
It is too strict, it makes the feature barely more usable than a temp table.
As a DBA, I realize the implication of the feature:
*) b0rked indexes
*) b0rked data
*) Not knowing what's good and what's bad
*) Bad reports
*) Bad Bi
etc..., etc... etc...
Still, I'd rather be allowed to make the decision here. I think that having the database try to enforce integrity on something i've marked as 'corruptable' (via the 'unlogged' flag) will be a constant fight between me and the system. In the end, I'd just not use the feature.
The bottom line here is that you really can only use the feature
for data that you're willing to accept losing on no notice.
Allowing the data to persist across clean shutdowns would probably
improve usability a bit, but it's not changing that fundamental fact.
Agreed, and that's fine. IMHO, it improves the usability 10 fold. Having it truncated on server restart is useful for only a fraction of the use-cases for this feature.
Now that I've just sent that last piece, what about a 'truncate on restart' option that is defaulted to on? That way, the community feels good knowing that we're trying to protect people from themselves, but like the 'fsync' feature, I can load the gun and pull the trigger if I really want to. I'd like to see that so even if there is a server crash, it doesn't truncate. That way, i can rename the garbage table if I want, create a new one for all new data and then be allowed to glean what I can from the last one.
--Scott
--Scott
regards, tom lane
Derrick Rice <derrick.rice@gmail.com> writes: > Is there a difference between a global temporary table (if such a thing > existed in PostgreSQL) and an unlogged table? Yes --- IIRC, a "global temp table" per spec has session-local contents. An unlogged table acts just like any other table except with respect to crash safety. regards, tom lane
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!
От
Steve Crawford
Дата:
On 11/16/2010 07:25 PM, Tom Lane wrote: > Man, the number of misunderstandings in this thread is staggering.... > First, I have plenty of processes that I would immediately convert to using this (and, FWIW, none of them would benefit from preserving data across restarts). But I have some questions that may expose my misunderstandings: 1. Would there be restrictions preventing a standard table from having a FK or other constraint that depends on an unlogged table? If not, it seems like there could be an unwanted ripple-effect from lost of the unlogged table. 2. Would it be possible to accidentally mix logged and unlogged tables in an inheritance chain? What would be the impact? 3. If unlogged data is included in a dump (my vote is no), would this lead to inconsistent behavior between dumps taken from a master and dumps taken from a hot-standby? 4. Would it be reasonable for temporary-tables to be unlogged by default? Cheers, Steve
On 11/17/10 01:23, Scott Ribe wrote: > On Nov 16, 2010, at 3:46 PM, Josh Berkus wrote: > >> ...and will be truncated (emptied) on database restart. > > I think that's key. Anything that won't survive a database restart, I sure don't expect to survive backup& restore. FWIW, I agree with this reasoning. Iff the automatic truncate on clean restart could be turned off, I'd also expect the data to be in the backup - in that case only.
On 11/17/10 02:55, Josh Berkus wrote: > >> If you do wish to have the data tossed out for no good reason every so >> often, then there ought to be a separate attribute to control that. I'm >> really having trouble seeing how such behavior would be desirable enough >> to ever have the server do it for you, on its terms rather than yours. > > I don't quite follow you. The purpose of unlogged tables is for data > which is disposable in the event of downtime; the classic example is the > a user_session_status table. In the event of a restart, all user > sessions are going to be invalid anyway. Depends on what you mean by "session". Typical web application session data, e.g. for PHP applications which are deployed in *huge* numbers resides directly on file systems, and are not guarded by anything (not even fsyncs). On operating system crash (and I do mean when the whole machine and the OS go down), the most that can happen is that some of those session files get garbled or missing - all the others work perfectly fine when the server is brought back again and the users can continue to work within their sessions. -- *That* is useful session behaviour and it is also useful for logs. The definition of unlogged tables which are deliberately being emptied for no good reason does not seem very useful to me. I'd rather support a (optional) mode (if it can be implemented) in which PostgreSQL scans through these unlogged tables on startup and discards any pages whose checkums don't match, but accepts all others as "good enough". Even better: maybe not all pages need to be scanned, only the last few, if there is a chance for any kind of mechanism which can act as checkpoints for data validity.
On Nov 17, 2010, at 11:32 AM, Ivan Voras wrote: > On 11/17/10 02:55, Josh Berkus wrote: >> >>> If you do wish to have the data tossed out for no good reason every so >>> often, then there ought to be a separate attribute to control that. I'm >>> really having trouble seeing how such behavior would be desirable enough >>> to ever have the server do it for you, on its terms rather than yours. >> >> I don't quite follow you. The purpose of unlogged tables is for data >> which is disposable in the event of downtime; the classic example is the >> a user_session_status table. In the event of a restart, all user >> sessions are going to be invalid anyway. > > Depends on what you mean by "session". > > Typical web application session data, e.g. for PHP applications which are deployed in *huge* numbers resides directly onfile systems, and are not guarded by anything (not even fsyncs). On operating system crash (and I do mean when the wholemachine and the OS go down), the most that can happen is that some of those session files get garbled or missing - allthe others work perfectly fine when the server is brought back again and the users can continue to work within their sessions.-- *That* is useful session behaviour and it is also useful for logs. > > The definition of unlogged tables which are deliberately being emptied for no good reason does not seem very useful tome. I'd rather support a (optional) mode (if it can be implemented) in which PostgreSQL scans through these unlogged tableson startup and discards any pages whose checkums don't match, but accepts all others as "good enough". Even better:maybe not all pages need to be scanned, only the last few, if there is a chance for any kind of mechanism which canact as checkpoints for data validity. This is not really a fair feature comparison. With the file-based sessions, the webserver will continue to deal with potentiallycorrupted sessions, which is worse than dealing with no sessions. This new PostgreSQL feature will ensure thatsuch a thing a cannot happen while also offering the performance of the file-based session storage and the ability touse queries against the session data. In my backups (using whatever flag or dump default), I will be ensuring that thesessions are *not* in the backup. I also plan on using this feature for materialized views to replace memcached. Considering that I have been waiting on this feature for years, I, for one, welcome our unlogged table overlords. Cheers, M
Steve Crawford <scrawford@pinpointresearch.com> writes: > 1. Would there be restrictions preventing a standard table from having a > FK or other constraint that depends on an unlogged table? If not, it > seems like there could be an unwanted ripple-effect from lost of the > unlogged table. I would assume that we should disallow an FK referencing an unlogged table from a regular table. I don't know whether the current patch covers that point, but if not it's an oversight. > 2. Would it be possible to accidentally mix logged and unlogged tables > in an inheritance chain? What would be the impact? I don't see any logical problem there: some of the data in the inheritance tree would be subject to loss on crash, other data not. But the schema is all logged so no inconsistency arises. > 3. If unlogged data is included in a dump (my vote is no), would this > lead to inconsistent behavior between dumps taken from a master and > dumps taken from a hot-standby? Hmm, that is a really interesting point. You're right that a dump taken from a standby slave could not possibly include such data, since for lack of WAL it would never be propagated to the slave. I am not sure whether that inconsistency is a sufficiently good reason to not dump the data from the master, though. I think that in any case we are going to want a pg_dump option to dump/not dump unlogged data --- the argument is only about which behavior will be default. I'm not sure that Robert's completely thought through the implications of this patch for behavior on a slave, anyway. Given the sequence * take base backup from running system (which will surely include inconsistent data for unlogged tables) * install base backup on slave * run recovery, transitioning to hot standby * go live it's unclear to me where along the line the slave has an opportunity to clean out its bogus images of the unlogged tables. But it had better do so before opening up for hot standby queries, let alone going live. > 4. Would it be reasonable for temporary-tables to be unlogged by default? Temp tables already are, always have been, always will be, unlogged. This patch is about attempting to bring that performance benefit of a temp table to regular tables. regards, tom lane
> As was already mentioned, application logs. Unlogged tables would be > perfect for that, provided they don't go *poof* every now and then for > no good reason. Nobody's going to be too heart broken if a handful of > log records go missing, or get garbled, after a server crash or power > outage. Delete 'em all after every restart though, and that's a problem. That's a nice thought, but it's not how data corruption works in the event of a crash. If a table is corrupted, *we don't know* how it's corrupted, and it's not just "the last few records" which are corrupted. So for unlogged tables, there is never going to be any other option for crashes than to truncate them. Robert Haas did discuss the ability to synch unlogged tables on a planned shutdown, though. However, that's liable to wait until 9.2, given the multiple steps required to make it work. Note that you would have the option of periodically synching an unlogged table to pgdump or to a logged table, via script, if you cared about retaining the data. That would probably give you the behavior you want, above. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On 11/16/2010 07:25 PM, Tom Lane wrote: > 2. What's proposed as the new feature is that specific non-system > tables can be marked as unlogged, meaning that WAL entries won't > be made for changes in those tables' contents (nor their indexes' > contents). So we can't guarantee that the contents of such tables > will be correct or consistent after a crash. The proposed feature > deals with this by forcibly truncating all such tables after a crash, > thus ensuring that they're consistent though not populated. So the > possible use-cases for such tables are limited to where (a) you can > repopulate the tables on demand, or (b) you don't really care about > losing data on a crash. Well if you guys would just hurry up and implement a way to mark indexes as inconsistent and continue to run without using them, you'd at least have the index problem solved :D I was one of the guys drooling over WAL way back when it was a new feature. I understand the risks in not logging updates, and most of the time won't accept the risk if I don't absolutely have to. But, OTOH, before WAL, ALL tables were unlogged, and we still never lost any data that I'm aware of. I don't remember ever having anything worse than errors from corrupt indexes. Maybe there's some old code somewhere to repair PG tables that could be dusted off and updated...? Heck, even if the postmaster refused to do anything with tables it thought might be FUBAR, and we had to repair them to even issue selects against them, in some cases that would still be better than having all the data go *poof*. -Glen
On Wed, Nov 17, 2010 at 12:49 PM, Josh Berkus <josh@agliodbs.com> wrote:
In an airplane, a pilot can kill the engine mid-flight if [s]he wants to. They can deploy the flaps /slats at cruise speed / altitude, and if they're so minded, they can land with a full tank of gas. Now, none of these things are particularly wise, but that's why the pilots are given *slightly* more learning than your average bus driver.
If you want to have a widely usable 'unlogged' table feature, I highly recommend that 'truncate on server crash/restart' be an option that is defaulted to true. That way, I can go in an push the buttons I want and give corrupted data to whomever, whenever i like. (Land with a full tank of Jet-A).
Whatever the decision is about backup, doesn't really matter IMO, but I honestly think that the benefit of an unlogged table is there for both session data (I run my session db's in fsync mode anyway and re-initdb them on boot) AND for logging data where I can't take WAL anymore, but would like to be able to have them in the same cluster as other stuff. If they just disappear then this feature won't be useful [to me] and I'll have to either wait for the patch or give up on it and do a flat-file / lucene project just to deal with it (I really don't want to do that :-).
--Scott
That's a nice thought, but it's not how data corruption works in the event of a crash. If a table is corrupted, *we don't know* how it's corrupted, and it's not just "the last few records" which are corrupted. So for unlogged tables, there is never going to be any other option for crashes than to truncate them.As was already mentioned, application logs. Unlogged tables would be
perfect for that, provided they don't go *poof* every now and then for
no good reason. Nobody's going to be too heart broken if a handful of
log records go missing, or get garbled, after a server crash or power
outage. Delete 'em all after every restart though, and that's a problem.
Robert Haas did discuss the ability to synch unlogged tables on a planned shutdown, though. However, that's liable to wait until 9.2, given the multiple steps required to make it work.
Note that you would have the option of periodically synching an unlogged table to pgdump or to a logged table, via script, if you cared about retaining the data. That would probably give you the behavior you want, above.
In an airplane, a pilot can kill the engine mid-flight if [s]he wants to. They can deploy the flaps /slats at cruise speed / altitude, and if they're so minded, they can land with a full tank of gas. Now, none of these things are particularly wise, but that's why the pilots are given *slightly* more learning than your average bus driver.
If you want to have a widely usable 'unlogged' table feature, I highly recommend that 'truncate on server crash/restart' be an option that is defaulted to true. That way, I can go in an push the buttons I want and give corrupted data to whomever, whenever i like. (Land with a full tank of Jet-A).
Whatever the decision is about backup, doesn't really matter IMO, but I honestly think that the benefit of an unlogged table is there for both session data (I run my session db's in fsync mode anyway and re-initdb them on boot) AND for logging data where I can't take WAL anymore, but would like to be able to have them in the same cluster as other stuff. If they just disappear then this feature won't be useful [to me] and I'll have to either wait for the patch or give up on it and do a flat-file / lucene project just to deal with it (I really don't want to do that :-).
--Scott
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!
От
"Marc Mamin"
Дата:
I would like to choose the table behaviour on restart (restore/forget it) Currently, I'm looking for a way to split large transaction on different threads (with dblink). AN issue is to efficiently share temp data across the threads. unlogged tables would be here fine, something like globaltemp tables with shared data. here an example to illustrate the current situation: select cic_connect_me('c'); select dblink_exec ('c', 'drop table if exists my_share'); select dblink_exec ('c', 'create table my_share( a int)'); select dblink_disconnect ('c'); SELECT cic_multithread(ARRAY[ 'insert into my_share select * from generate_series(1,10000)', 'insert into my_share select * from generate_series(1,10000)', 'insert into my_share select * from generate_series(1,10000)', 'insert into my_share select * from generate_series(1,10000)', 'insert into my_share select * from generate_series(1,10000)', 'insert into my_share select * from generate_series(1,10000)'] ,max_threads=4); create temp table my_result as select * from my_share; drop table my_share; select * from my_result; For pg dump, I guess that having an optional flag is fine, but: unlogged tables could also be useful to store very large 'raw' data to be processed, whereas the client would only query the processed results. In such a case, restoring the logged table has a higher priority. The best solution in my opinion, would allow to dump/restore these 2 table types in separate processes (or threads..). (and by the way: would it be possible to choose the compress tool as an option for pg_dump) pgdump -F.. -Compress pigz -f out.dmp -f_unlogged out_unlogged.dmp. regards, Marc Mamin
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!
От
"Marc Mamin"
Дата:
Hello, another question. I haven't read the complete threads, so I apologize if this was already discussed. Will it be possible to switch from unlogged to logged ? To improve COPY performances, I currently: - make a copy of heavily indexed tables - load new data in the shadow table - add the indexes - drop the live table - rename the shadow table to the visible one. Is it imaginable to use unlogged tables foe the shadow one and then enable logging after the switch ? regards, Marc Mamin
"Marc Mamin" <M.Mamin@intershop.de> writes: > Will it be possible to switch from unlogged to logged ? Probably not, because it would completely confuse hot standby slaves (or anything else looking at the WAL replay stream). You can't just start issuing WAL records against an already-built table or index, because the WAL entries are deltas. regards, tom lane
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!
От
"Marc Mamin"
Дата:
yep, but I'll miss this as I only use WAL for crash recovery... regards, Marc Mamin -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Mittwoch, 17. November 2010 23:40 To: Marc Mamin Cc: PostgreSQL general; Josh Berkus Subject: Re: [GENERAL] Survey on backing up unlogged tables: help us with PostgreSQL development! "Marc Mamin" <M.Mamin@intershop.de> writes: > Will it be possible to switch from unlogged to logged ? Probably not, because it would completely confuse hot standby slaves (or anything else looking at the WAL replay stream). You can't just start issuing WAL records against an already-built table or index, because the WAL entries are deltas. regards, tom lane
On 11/17/10 17:43, A.M. wrote: > > On Nov 17, 2010, at 11:32 AM, Ivan Voras wrote: > >> On 11/17/10 02:55, Josh Berkus wrote: >>> >>>> If you do wish to have the data tossed out for no good reason every so >>>> often, then there ought to be a separate attribute to control that. I'm >>>> really having trouble seeing how such behavior would be desirable enough >>>> to ever have the server do it for you, on its terms rather than yours. >>> >>> I don't quite follow you. The purpose of unlogged tables is for data >>> which is disposable in the event of downtime; the classic example is the >>> a user_session_status table. In the event of a restart, all user >>> sessions are going to be invalid anyway. >> >> Depends on what you mean by "session". >> >> Typical web application session data, e.g. for PHP applications which are deployed in *huge* numbers resides directlyon file systems, and are not guarded by anything (not even fsyncs). On operating system crash (and I do mean whenthe whole machine and the OS go down), the most that can happen is that some of those session files get garbled or missing- all the others work perfectly fine when the server is brought back again and the users can continue to work withintheir sessions. -- *That* is useful session behaviour and it is also useful for logs. >> >> The definition of unlogged tables which are deliberately being emptied for no good reason does not seem very useful tome. I'd rather support a (optional) mode (if it can be implemented) in which PostgreSQL scans through these unlogged tableson startup and discards any pages whose checkums don't match, but accepts all others as "good enough". Even better:maybe not all pages need to be scanned, only the last few, if there is a chance for any kind of mechanism which canact as checkpoints for data validity. > > This is not really a fair feature comparison. With the file-based sessions, the webserver will continue to deal with potentiallycorrupted sessions, which is worse than dealing with no sessions. I guess it depends on specific use case, but in the common case (i.e. non-mission critical massive deployments) I'd say it's definitely *not* worse than no sessions. "Dealing with potential corruption" in this case usually means the web application will attempt to deserialize the session data and fail if it's corrupted, leading to a new session being created. > I also plan on using this feature for materialized views to replace memcached. Just how large a performance gain is expected from this thing? :) I don't see a mention that fsync will be disabled on unlogged tables (though it makes sense so it probably will be). Having materialized views this way will mean that something - either an application or an external script triggered by database startup - will have to calculate and create this materialized view, which will probably involve massive table scanning all around - I suspect that performance gains from unlogged tables could be hidden by this scanning. Anyway, I'm not arguing against them, I'm arguing for making them more powerful.
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!
От
Merlin Moncure
Дата:
On Tue, Nov 16, 2010 at 5:46 PM, Josh Berkus <josh@agliodbs.com> wrote: > Folks, > > Please help us resolve a discussion on -hackers. > > PostgreSQL 9.1 is likely to have, as a feature, the ability to create > tables which are "unlogged", meaning that they are not added to the > transaction log, and will be truncated (emptied) on database restart. > Such tables are intended for highly volatile, but not very valuable, > data, such as session statues, application logs, etc. > > The question is, how would you, as a DBA, expect pg_dump backups to > treat unlogged tables? Backing them up by default has the potential to > both cause performance drag on the unlogged table and make your backups > take longer unless you remember to omit them. Not backing them up by > default has the drawback that if you forget --include-unlogged switch, > and shut the database down, any unlogged data is gone. How would you > *expect* unlogged tables to behave? For 'as regular table' argument: We are *assuming* the data is not very valuable. I'd rather assume it is valuable. pg_dump has a very specific purpose: to back up the database in it's current state into a file that can replay that state. Not backing up large objects by default is a huge gotcha imnsho. I understand the reasoning, but disagree with it. Certainly a switch to tune them out would be nice, but not the default. Against argument: hm, I guess the counter agrument is that since because they are not WAL logged, they can't possibly be replayed to a standby, and it makes sense to have pg_dump and archive log based backup behave in similar fashion. It's weird that different backup strategies produce different results. I think the 'against argument' is stronger by about .5, so I'm voting .5 for pg_dump not to dump them. In fact, if that side of it wins, maybe pg_dump shouldn't even deal with them at all. I guess that would by my position. merlin
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!
От
Alban Hertroys
Дата:
On 16 Nov 2010, at 23:46, Josh Berkus wrote: > Folks, > > Please help us resolve a discussion on -hackers. > > PostgreSQL 9.1 is likely to have, as a feature, the ability to create > tables which are "unlogged", meaning that they are not added to the > transaction log, and will be truncated (emptied) on database restart. > Such tables are intended for highly volatile, but not very valuable, > data, such as session statues, application logs, etc. > > The question is, how would you, as a DBA, expect pg_dump backups to > treat unlogged tables? Backing them up by default has the potential to > both cause performance drag on the unlogged table and make your backups > take longer unless you remember to omit them. Not backing them up by > default has the drawback that if you forget --include-unlogged switch, > and shut the database down, any unlogged data is gone. How would you > *expect* unlogged tables to behave? From the discussion so far it appears to me that "unlogged" should probably be split into various gradations of unlogged.There appear to be a number of popular use-cases for such tables, with different requirements, namely: 1. Session tables These tables contain data about a user session in some application. It is temporary data at best, it's no problem to loseit at all. Dumping it makes no sense. 2. Staging tables These tables contain data that's being processed and prepared to be entered into other tables in the database. If the processfails it can usually be restarted, so losing the data is no problem. Here as well, dumping makes little sense. 3. Logging tables This is data from application logs. It's not usually mission critical, so losing it isn't a very big deal, but it is usefuldata of itself. It should in most cases survive a normal backend restart, but if it doesn't survive a backend crashthat's acceptable. This data should in most cases be included in dumps (or dumped separately?). 4. Materialized views Stored results of a query that's likely to have a big footprint on system resources. Losing the data after a backend crashis acceptable, but it should survive a normal system restart. Since the data can be generated from the contents of thedatabase, it's not necessary to include it in dumps (but maybe it is convenient in some cases?) I think this is the gist of it. Which leads me to think that people want three knobs to play with: should the table be logged or not? Can it be truncatedat normal server restart or not? Should it be included in dumps or not? And possibly, should it be fsynced or not? Of course, without WAL logs, PITR and WAL-based replication are out of the question for these tables. Also, since the datacan be lost, they can't be referenced by foreign keys. Does that sum it up adequately? There's one thing that I didn't see mentioned and that I'm not sure fits into the picture here, namely read-only tables (materializedviews would qualify for those in most cases). These tables are written every once in a while by a system user, but it doesn't change in between at all. There's not muchpoint in giving every user their own session, and it should be possible to assume all index entries point to a live record(which has consequences for COUNT(), for example). Changing that has quite some implications though, I wager... Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4ce4daf610425035851824!
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!
От
Karsten Hilbert
Дата:
On Thu, Nov 18, 2010 at 08:49:12AM +0100, Alban Hertroys wrote: > From the discussion so far it appears to me that > "unlogged" should probably be split into various gradations > of unlogged. There appear to be a number of popular > use-cases for such tables, with different requirements, That's precisely the point why this discussion doesn't lead to a *solution*. It can only lead to a *decision*. It seems that it needs to be decided first whether in the case of unWALed tables we want PostgreSQL to provide *means* or *policies*. The former are decidable and robustly implementable in a piece of infrastructure software like PostgreSQL. The latter are up to the whims of each deployment site. > Which leads me to think that people want three knobs to > play with: should the table be logged or not? Can it be > truncated at normal server restart or not? Should it be > included in dumps or not? And possibly, should it be fsynced > or not? Yep, your analysis breaks down the policy stage (the grading of "logged") into "modes" or "means" which people can apply to achieve a certain policies. That is why I argued for options: - alter database dump_unlogged_tables to on/off default on: better safe than sorry, point the gun but don't pull the trigger - pg_dump --include-unlogged-tables default: whatever alter database says - psqlrc: \set include_unlogged_tables to on/off default: doesn't exist, falls back to what "alter database" or --include-unlogged say That way I can use certain means to work out the policy I want, namely setting "alter database" to what it should be on this database waaay before the time comes when it is crucial to not forget --included-unlogged. Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!
От
Karsten Hilbert
Дата:
On Thu, Nov 18, 2010 at 10:30:46AM +0100, Karsten Hilbert wrote: > That is why I argued for options: > > - alter database dump_unlogged_tables to on/off > > default on: better safe than sorry, point the gun but don't pull the trigger (I agree, however, that the database metadata isn't really the appropriate place to store application specific configuration items. So, maybe pgdumprc is a better place for that.) Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Hello, > > PostgreSQL 9.1 is likely to have, as a feature, the ability to create > > tables which are "unlogged", meaning that they are not added to the > > transaction log, and will be truncated (emptied) on database restart. > > Such tables are intended for highly volatile, but not very valuable, > > data, such as session statues, application logs, etc. > > One doubt - if the tables are 'unlogged' , will the DMLs against these still be replicated to a slave? Regards, Jayadevan DISCLAIMER: "The information in this e-mail and any attachment is intended only for the person to whom it is addressed and may contain confidential and/or privileged material. If you have received this e-mail in error, kindly contact the sender and destroy all copies of the original communication. IBS makes no warranty, express or implied, nor guarantees the accuracy, adequacy or completeness of the information contained in this email or any attachment and is not liable for any errors, defects, omissions, viruses or for resultant loss or damage, if any, direct or indirect."
On 11/18/2010 3:46 AM, Jayadevan M wrote: > Hello, > >>> PostgreSQL 9.1 is likely to have, as a feature, the ability to create >>> tables which are "unlogged", meaning that they are not added to the >>> transaction log, and will be truncated (emptied) on database restart. >>> Such tables are intended for highly volatile, but not very valuable, >>> data, such as session statues, application logs, etc. >>> > > One doubt - if the tables are 'unlogged' , will the DMLs against these > still be replicated to a slave? > > Regards, > Jayadevan Yes, because the system tables, which store the layout of all tables, is written to WAL. -Andy
> Does that sum it up adequately? One more thing: that you might not get all of these options in 9.1. Currently the discussion is talking about *maybe* offering checkpointing of unlogged tables, which would allow such tables to survive a normal restart, and including unlogged tables in pg_dump by default. But it's also possible that we'll only have one type of unlogged table in 9.1., with other options waiting for 9.2. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Hi, > > One doubt - if the tables are 'unlogged' , will the DMLs against these > > still be replicated to a slave? > > > > Yes, because the system tables, which store the layout of all tables, is > written to WAL. Thank you for the reply. But my doubt was not about layout, rather the DMLs. If I do an insert into an 'unlogged' table, what happens to that? Will that be replicated in the slave (using PostgreSQL's inbuilt replication)? Regards, Jayadevan DISCLAIMER: "The information in this e-mail and any attachment is intended only for the person to whom it is addressed and may contain confidential and/or privileged material. If you have received this e-mail in error, kindly contact the sender and destroy all copies of the original communication. IBS makes no warranty, express or implied, nor guarantees the accuracy, adequacy or completeness of the information contained in this email or any attachment and is not liable for any errors, defects, omissions, viruses or for resultant loss or damage, if any, direct or indirect."
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!
От
Alban Hertroys
Дата:
On 19 Nov 2010, at 4:23, Jayadevan M wrote: > Hi, > >>> One doubt - if the tables are 'unlogged' , will the DMLs against these >>> still be replicated to a slave? >>> >> >> Yes, because the system tables, which store the layout of all tables, is > >> written to WAL. > Thank you for the reply. But my doubt was not about layout, rather the > DMLs. If I do an insert into an 'unlogged' table, what happens to that? > Will that be replicated in the slave (using PostgreSQL's inbuilt > replication)? What are the use-cases for replicating unlogged tables? Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4ce6246e10421025920086!
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!
От
"Marc Mamin"
Дата:
> What are the use-cases for replicating unlogged tables? Hello, I guess they could be useful in cloud infrastructures. (see http://archives.postgresql.org/pgsql-general/2010-11/msg00865.php) regards, Marc Mamin
Hello, > > Thank you for the reply. But my doubt was not about layout, rather the > > DMLs. If I do an insert into an 'unlogged' table, what happens to that? > > Will that be replicated in the slave (using PostgreSQL's inbuilt > > replication)? > > What are the use-cases for replicating unlogged tables? > I do not have a use case for replicating unlogged tables. But I may use temp/unlogged tables in my master to populate actual tables. Say, I have a db archival/purge process. I populate temp tables with PKs of my permanent tables and use that to drive my insertion into history tables, deletion from live tables etc. Pseudocode Insert into mytemptable (id) tables select mypk from liveable where lastuseddate < archivedate; insert into myhist select a.* from livetable a join mytemp table on a.mypk=mytemptable.id delete from liveable where mypk in (select id from mytemp ) Reading about what goes into WAL tells me that the permanent table data will be replicated all right even if the temp tables are not logged. Is that right? Regards, Jayadevan DISCLAIMER: "The information in this e-mail and any attachment is intended only for the person to whom it is addressed and may contain confidential and/or privileged material. If you have received this e-mail in error, kindly contact the sender and destroy all copies of the original communication. IBS makes no warranty, express or implied, nor guarantees the accuracy, adequacy or completeness of the information contained in this email or any attachment and is not liable for any errors, defects, omissions, viruses or for resultant loss or damage, if any, direct or indirect."
Glen Parker wrote: > As was already mentioned, application logs. Unlogged tables would be > perfect for that, provided they don't go *poof* every now and then for > no good reason. Nobody's going to be too heart broken if a handful of > log records go missing, or get garbled, after a server crash or power > outage. Delete 'em all after every restart though, and that's a problem. How often are you doing unintentional restarts? I'd guess for many people it's "whenever I had so many backend crashes that I get motivated to check if I'm running the latest minor release". And if it's an intentional restart - surely you could archive your application logs before doing the restart, no?
Josh Berkus wrote: >> With the current patches, the data survives a restart just fine. > > Per -hackers, that's not guarenteed. "Not guaranteed" is fine. What people are asking for is "often survives". AFAIK we don't truncate the log file created by the log_filename GUC on every unclean crash and every clean shutdown. Should we? :-) Why not? For people who intend to use these tables to log application data, they'd have the exact same reasons for not wanting them truncated when they don't need to be.
> AFAIK we don't truncate the log file created by the log_filename GUC > on every unclean crash and every clean shutdown. That's not a remotely relevant analogy. A log file is not a database table. If we allow a database table to become corrupted due to being unsynched at the time of shutdown, it's not a matter of "missing a few rows". The table is *unreadable*, and may cause the backend or even the whole server to crash when you try to read it. Anyway, per discussion on hackers, unlogged tables (or "volatile tables" as they're now being called) include two modes in the spec; one which checkpoints (and thus can survive a planned restart) and one which doesn't (and will truncate on every restart, but doesn't cause physical I/O). We may or may not have both modes for 9.1. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com