Обсуждение: Survey on backing up unlogged tables: help us with PostgreSQL development!

Поиск

Список

Период

Сортировка

Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Josh Berkus

Дата:

16 ноября 2010 г., 21:46:57

Folks,

Please help us resolve a discussion on -hackers.

PostgreSQL 9.1 is likely to have, as a feature, the ability to create
tables which are "unlogged", meaning that they are not added to the
transaction log, and will be truncated (emptied) on database restart.
Such tables are intended for highly volatile, but not very valuable,
data, such as session statues, application logs, etc.

The question is, how would you, as a DBA, expect pg_dump backups to
treat unlogged tables? Backing them up by default has the potential to
both cause performance drag on the unlogged table and make your backups
take longer unless you remember to omit them. Not backing them up by
default has the drawback that if you forget --include-unlogged switch,
and shut the database down, any unlogged data is gone. How would you
*expect* unlogged tables to behave?

Survey is here:
https://spreadsheets.google.com/ccc?key=0AoeuP3g2YZsFdDFnT2VKNC1FQ0pQNmJGS2dWMTNYMEE&hl=en&authkey=CISbwuYD


--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

"Karsten Hilbert"

Дата:

16 ноября 2010 г., 22:24:28

> PostgreSQL 9.1 is likely to have, as a feature, the ability to create
> tables which are "unlogged", meaning that they are not added to the
> transaction log, and will be truncated (emptied) on database restart.
> Such tables are intended for highly volatile, but not very valuable,
> data, such as session statues, application logs, etc.
>
> The question is, how would you, as a DBA, expect pg_dump backups to
> treat unlogged tables? Backing them up by default has the potential to
> both cause performance drag on the unlogged table and make your backups
> take longer unless you remember to omit them. Not backing them up by
> default has the drawback that if you forget --include-unlogged switch,
> and shut the database down, any unlogged data is gone. How would you
> *expect* unlogged tables to behave?

ALTER DATABASE ... SET PG_DUMP_INCLUDE_UNLOGGED TO ON/OFF

with default OFF.

That way I can think about it once per database *before* I am in
the situation when I regret forgetting.

(pg_dump would still support --include-unlogged, defaulting to the
database default)

Karsten
--
Neu: GMX De-Mail - Einfach wie E-Mail, sicher wie ein Brief!
Jetzt De-Mail-Adresse reservieren: http://portal.gmx.net/de/go/demail

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Derrick Rice

Дата:

16 ноября 2010 г., 22:57:18

On Tue, Nov 16, 2010 at 5:46 PM, Josh Berkus <josh@agliodbs.com> wrote:

Survey is here:
https://spreadsheets.google.com/ccc?key=0AoeuP3g2YZsFdDFnT2VKNC1FQ0pQNmJGS2dWMTNYMEE&hl=en&authkey=CISbwuYD

This is a link to a read-only spreadsheet for me.

Derrick

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Scott Ribe

Дата:

16 ноября 2010 г., 23:23:16

On Nov 16, 2010, at 3:46 PM, Josh Berkus wrote:

>  ...and will be truncated (emptied) on database restart.

I think that's key. Anything that won't survive a database restart, I sure don't expect to survive backup & restore.

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Glen Parker

Дата:

16 ноября 2010 г., 23:52:38

On 11/16/2010 03:24 PM, Karsten Hilbert wrote:
>> PostgreSQL 9.1 is likely to have, as a feature, the ability to create
>> tables which are "unlogged", meaning that they are not added to the
>> transaction log, and will be truncated (emptied) on database restart.
>> Such tables are intended for highly volatile, but not very valuable,
>> data, such as session statues, application logs, etc.

I have been following loosely this discussion on HACKERS, but seem to
have missed the part about truncating such tables on server restart.

I have an immediate use for unlogged tables (application logs), but
having them truncate after even a clean server restart would be a show
stopper.  I keep log data for 2 months, and never back it up.  Having it
disappear after a system melt down is acceptable, but not after a clean
restart.  That would be utterly ridiculous!

As to the topic of the thread, I think pg_dump needs to dump unlogged
tables by default.

-Glen

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Josh Berkus

Дата:

17 ноября 2010 г., 00:02:05

> This is a link to a read-only spreadsheet for me.

You're correct.  Darn those Google unreadable links!

https://spreadsheets.google.com/viewform?formkey=dDFnT2VKNC1FQ0pQNmJGS2dWMTNYMEE6MQ

That should work.

--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Scott Mead

Дата:

17 ноября 2010 г., 00:03:24

On Tue, Nov 16, 2010 at 7:21 PM, Glen Parker <glenebob@nwlink.com> wrote:

On 11/16/2010 03:24 PM, Karsten Hilbert wrote:
PostgreSQL 9.1 is likely to have, as a feature, the ability to create
tables which are "unlogged", meaning that they are not added to the
transaction log, and will be truncated (emptied) on database restart.
Such tables are intended for highly volatile, but not very valuable,
data, such as session statues, application logs, etc.

I have been following loosely this discussion on HACKERS, but seem to have missed the part about truncating such tables on server restart.

I have an immediate use for unlogged tables (application logs), but having them truncate after even a clean server restart would be a show stopper. I keep log data for 2 months, and never back it up. Having it disappear after a system melt down is acceptable, but not after a clean restart. That would be utterly ridiculous!

+1 -- Is there a technical reason to do a TRUNCATE on restart? I'd feel better if I could just have unlogged tables that survive unless something like a power-outage etc... I'm in the exact same boat here, lots of big logging tables that need to survive reboot, but are frustrating when it comes to WAL generation.

As to the topic of the thread, I think pg_dump needs to dump unlogged tables by default.

-1 I disagree. I'm fine with having the loaded weapon pointed at my foot.

--Scott

-Glen

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Tom Lane

Дата:

17 ноября 2010 г., 00:16:51

Scott Mead <scott@scottrmead.com> writes:
> +1  -- Is there a technical reason to do a TRUNCATE on restart?  I'd feel
> better if I could just have unlogged tables that survive unless something
> like a power-outage etc...  I'm in the exact same boat here, lots of big
> logging tables that need to survive reboot, but are frustrating when it
> comes to WAL generation.

Keep in mind that these tables are *not* going to survive any type of
backend crash.  Maybe my perceptions are colored because I deal with
Postgres bugs all the time, but I think of backend crashes as pretty
common, certainly much more common than an OS-level crash.  I'm afraid
you may be expecting unlogged tables to be significantly more robust
than they really will be.

            regards, tom lane

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Andy Colson

Дата:

17 ноября 2010 г., 00:30:08

On 11/16/2010 04:46 PM, Josh Berkus wrote:

> PostgreSQL 9.1 is likely to have, as a feature, the ability to create
> tables which are "unlogged", meaning that they are not added to the
> transaction log, and will be truncated (emptied) on database restart.
> Such tables are intended for highly volatile, but not very valuable,
> data, such as session statues, application logs, etc.

With the current patches, the data survives a restart just fine.

I'd like to vote for:
    safe restart = save data
    bad crashy restart = drop date

-Andy

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Glen Parker

Дата:

17 ноября 2010 г., 00:30:27

On 11/16/2010 05:15 PM, Tom Lane wrote:
> Keep in mind that these tables are *not* going to survive any type of
> backend crash.  Maybe my perceptions are colored because I deal with
> Postgres bugs all the time, but I think of backend crashes as pretty
> common, certainly much more common than an OS-level crash.  I'm afraid
> you may be expecting unlogged tables to be significantly more robust
> than they really will be.

But an individual backend crash != server restart, unless that's changed
since 8.1 (yes, I'm still stuck on 8.1 :( )...  So if I, for example,
kill -9 a backend that's busy updating a nonlogged table, the table
could be corrupted, but it wouldn't be truncated (and could cause
trouble) for possibly weeks until the postmaster is restarted.
Conversely, even if no backend crash occurs whatsoever, all the
nonlogged tables would be truncated after an orderly postmaster restart.

Just doesn't make sense to me.

-Glen

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Josh Berkus

Дата:

17 ноября 2010 г., 00:33:59

> With the current patches, the data survives a restart just fine.

Per -hackers, that's not guarenteed.

--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Andy Colson

Дата:

17 ноября 2010 г., 00:44:26

On 11/16/2010 07:33 PM, Josh Berkus wrote:
>
>> With the current patches, the data survives a restart just fine.
>
> Per -hackers, that's not guarenteed.
>

Ah, I just read the thread on -hackers.  And yea, my system had 24 hours to write/flush/etc before I'd restarted it
momentsago as a test.  I have NOT tested a bunch of writes and then quickly restarting PG.  I CAN report that given 24
hours,your data will survive a restart :-) 

-Andy

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Scott Marlowe

Дата:

17 ноября 2010 г., 00:45:12

On Tue, Nov 16, 2010 at 5:23 PM, Scott Ribe <scott_ribe@elevated-dev.com> wrote:
> On Nov 16, 2010, at 3:46 PM, Josh Berkus wrote:
>
>>  ...and will be truncated (emptied) on database restart.
>
> I think that's key. Anything that won't survive a database restart, I sure don't expect to survive backup & restore.

I'd vote for backing up the schema of an unlogged table so it's there
on a restore.

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Scott Mead

Дата:

17 ноября 2010 г., 00:45:39

On Tue, Nov 16, 2010 at 8:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Scott Mead <scott@scottrmead.com> writes:
> +1 -- Is there a technical reason to do a TRUNCATE on restart? I'd feel
> better if I could just have unlogged tables that survive unless something
> like a power-outage etc... I'm in the exact same boat here, lots of big
> logging tables that need to survive reboot, but are frustrating when it
> comes to WAL generation.

Keep in mind that these tables are *not* going to survive any type of
backend crash.

Not surviving a crash is fine. IMHO, if we'd lose data in myisam files, I'm happy to lose them on pg nologging tables. I just want it to survive a stop / start operation. The benefits (think of multi-host syslog consolidation with FTS <drools> ) on these tables FAR outweigh the off-chance that a crash will cause me some heartache.

Maybe my perceptions are colored because I deal with
Postgres bugs all the time, but I think of backend crashes as pretty
common, certainly much more common than an OS-level crash. I'm afraid
you may be expecting unlogged tables to be significantly more robust
than they really will be.

Bugs? What bugs :)

Honestly, I've only had a couple of *Prod* crashes (knocks on wood), but the need to restart occurs every now and then.

--Scott

regards, tom lane

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Glen Parker

Дата:

17 ноября 2010 г., 00:52:03

On 11/16/2010 05:33 PM, Josh Berkus wrote:
>
>> With the current patches, the data survives a restart just fine.
>
> Per -hackers, that's not guarenteed.
>

Which is fine.  If you choose to set a table to nonlogged, that implies
that you accept the risk of corrupted data, or that you don't "get it",
in which case .  It should not however, imply that you want it all
thrown out every so often for no good reason.

If you do wish to have the data tossed out for no good reason every so
often, then there ought to be a separate attribute to control that.  I'm
really having trouble seeing how such behavior would be desirable enough
to ever have the server do it for you, on its terms rather than yours.

-Glen

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Josh Berkus

Дата:

17 ноября 2010 г., 00:55:59

> If you do wish to have the data tossed out for no good reason every so
> often, then there ought to be a separate attribute to control that.  I'm
> really having trouble seeing how such behavior would be desirable enough
> to ever have the server do it for you, on its terms rather than yours.

I don't quite follow you.  The purpose of unlogged tables is for data
which is disposable in the event of downtime; the classic example is the
a user_session_status table.  In the event of a restart, all user
sessions are going to be invalid anyway.

--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Josh Berkus

Дата:

17 ноября 2010 г., 00:59:24

> I'd vote for backing up the schema of an unlogged table so it's there
> on a restore.

The schema is always there.  What may or may not be there is the data.

--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Andy Colson

Дата:

17 ноября 2010 г., 00:59:52

On 11/16/2010 07:55 PM, Josh Berkus wrote:
>
>> If you do wish to have the data tossed out for no good reason every so
>> often, then there ought to be a separate attribute to control that.  I'm
>> really having trouble seeing how such behavior would be desirable enough
>> to ever have the server do it for you, on its terms rather than yours.
>
> I don't quite follow you.  The purpose of unlogged tables is for data
> which is disposable in the event of downtime; the classic example is the
> a user_session_status table.  In the event of a restart, all user
> sessions are going to be invalid anyway.
>

Why?  If you dont blow away the sessions table, everything should be fine.

-Andy

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Glen Parker

Дата:

17 ноября 2010 г., 01:47:50

On 11/16/2010 05:55 PM, Josh Berkus wrote:
>
>> If you do wish to have the data tossed out for no good reason every so
>> often, then there ought to be a separate attribute to control that.  I'm
>> really having trouble seeing how such behavior would be desirable enough
>> to ever have the server do it for you, on its terms rather than yours.
>
> I don't quite follow you.  The purpose of unlogged tables is for data
> which is disposable in the event of downtime; the classic example is the
> a user_session_status table.  In the event of a restart, all user
> sessions are going to be invalid anyway.
>

As was already mentioned, application logs.  Unlogged tables would be
perfect for that, provided they don't go *poof* every now and then for
no good reason.  Nobody's going to be too heart broken if a handful of
log records go missing, or get garbled, after a server crash or power
outage.  Delete 'em all after every restart though, and that's a problem.

-Glen

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Tom Lane

Дата:

17 ноября 2010 г., 02:25:29

Man, the number of misunderstandings in this thread is staggering.
Let me try to explain what the proposed feature will and will not do.

1. The system catalog entries for all tables will be wal-logged.
So schema (DDL) will survive a crash.  There wouldn't be any way
to make it not do that, because we can't wal-log only some updates
to a particular table, and that includes the catalogs in particular.

2. What's proposed as the new feature is that specific non-system
tables can be marked as unlogged, meaning that WAL entries won't
be made for changes in those tables' contents (nor their indexes'
contents).  So we can't guarantee that the contents of such tables
will be correct or consistent after a crash.  The proposed feature
deals with this by forcibly truncating all such tables after a crash,
thus ensuring that they're consistent though not populated.  So the
possible use-cases for such tables are limited to where (a) you can
repopulate the tables on demand, or (b) you don't really care about
losing data on a crash.

3. There's a lot of wishful thinking here about what constitutes a
crash.  A backend crash *is* a crash, even if the postmaster keeps
going.  Data that had been in shared buffers doesn't get written out
in such a scenario (and if we tried, it might be corrupt anyway).  So
unlogged tables would be corrupt and in need of truncation after such an
event.  Obviously, the same goes for an OS-level crash or power failure.

4. The last bit of discussion on -hackers concerned what to do in
the case where the server got shut down cleanly.  If it was shut
down cleanly, then any data for unlogged tables would have been
written out from shared buffers ... but did the data make it to disk?
There's no easy way to know that.  In the event of an OS crash or
power failure shortly after server shutdown, it's possible that
the unlogged tables would be corrupt.  So Robert's initial proposal
includes truncating unlogged tables at any database startup, even
if the previous shutdown was clean.  Some (including me) are arguing
that that is unnecessarily strict; but you do have to realize that
you're taking some risk with data validity if it doesn't do that.

The bottom line here is that you really can only use the feature
for data that you're willing to accept losing on no notice.
Allowing the data to persist across clean shutdowns would probably
improve usability a bit, but it's not changing that fundamental fact.

            regards, tom lane

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Karsten Hilbert

Дата:

17 ноября 2010 г., 10:41:15

On Tue, Nov 16, 2010 at 10:25:13PM -0500, Tom Lane wrote:

> 4. The last bit of discussion on -hackers concerned what to do in
> the case where the server got shut down cleanly.  If it was shut
> down cleanly, then any data for unlogged tables would have been
> written out from shared buffers ... but did the data make it to disk?
> There's no easy way to know that.  In the event of an OS crash or
> power failure shortly after server shutdown, it's possible that
> the unlogged tables would be corrupt.

Aaah, indeed.

> So Robert's initial proposal
> includes truncating unlogged tables at any database startup, even
> if the previous shutdown was clean.

Sounds reasonable.

> Some (including me) are arguing
> that that is unnecessarily strict; but you do have to realize that
> you're taking some risk with data validity

Don't. We've always liked PostgreSQL for that. Or at least
let us point the gun at our feet ourselves (such as with
fsync).

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Yeb Havinga

Дата:

17 ноября 2010 г., 12:16:39

On 2010-11-17 02:55, Josh Berkus wrote:
>> If you do wish to have the data tossed out for no good reason every so
>> often, then there ought to be a separate attribute to control that.  I'm
>> really having trouble seeing how such behavior would be desirable enough
>> to ever have the server do it for you, on its terms rather than yours.
> I don't quite follow you.  The purpose of unlogged tables is for data
> which is disposable in the event of downtime; the classic example is the
> a user_session_status table.
That sounds an awful lot like temporary tables. Perhaps the biggest
problem of "unlogged tables" is that it doesn't connote "truncate at
restart". With the truncate an unlogged table is more like a 'cluster
temporary table'. While this is a very ugly name, I wonder if an DBA
would expect a cluster temporary table to be backed up by default.

I just filled in the questionaire, and to my surprise I agreed more with
the 'don't backup by default' question. The reason is that because the
question also said: because it contains disposable data. Maybe a better
question would have been: would you expect pg_dump to backup unlogged
tables, at the point that you didn't more about them than that they are
not written to the WAL? In that case I'd say: yes.

regards,
Yeb Havinga

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Derrick Rice

Дата:

17 ноября 2010 г., 14:37:02

On Wed, Nov 17, 2010 at 8:20 AM, Yeb Havinga <yebhavinga@gmail.com> wrote:

That sounds an awful lot like temporary tables.

A lot like a GLOBAL temporary table, which isn't currently supported.

Is there a difference between a global temporary table (if such a thing existed in PostgreSQL) and an unlogged table?

Derrick

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Scott Mead

Дата:

17 ноября 2010 г., 14:38:38

On Tue, Nov 16, 2010 at 10:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Man, the number of misunderstandings in this thread is staggering.
Let me try to explain what the proposed feature will and will not do.

1. The system catalog entries for all tables will be wal-logged.
So schema (DDL) will survive a crash. There wouldn't be any way
to make it not do that, because we can't wal-log only some updates
to a particular table, and that includes the catalogs in particular.

Gotcha

2. What's proposed as the new feature is that specific non-system
tables can be marked as unlogged, meaning that WAL entries won't
be made for changes in those tables' contents (nor their indexes'
contents). So we can't guarantee that the contents of such tables
will be correct or consistent after a crash. The proposed feature
deals with this by forcibly truncating all such tables after a crash,
thus ensuring that they're consistent though not populated. So the
possible use-cases for such tables are limited to where (a) you can
repopulate the tables on demand, or (b) you don't really care about
losing data on a crash.

I would rather be allowed to decide that for myself.

3. There's a lot of wishful thinking here about what constitutes a
crash. A backend crash *is* a crash, even if the postmaster keeps
going. Data that had been in shared buffers doesn't get written out
in such a scenario (and if we tried, it might be corrupt anyway). So
unlogged tables would be corrupt and in need of truncation after such an
event. Obviously, the same goes for an OS-level crash or power failure.

Right, just let *me* decide, that's all.

4. The last bit of discussion on -hackers concerned what to do in
the case where the server got shut down cleanly. If it was shut
down cleanly, then any data for unlogged tables would have been
written out from shared buffers ... but did the data make it to disk?
There's no easy way to know that. In the event of an OS crash or
power failure shortly after server shutdown, it's possible that
the unlogged tables would be corrupt. So Robert's initial proposal
includes truncating unlogged tables at any database startup, even
if the previous shutdown was clean. Some (including me) are arguing
that that is unnecessarily strict; but you do have to realize that
you're taking some risk with data validity if it doesn't do that.

It is too strict, it makes the feature barely more usable than a temp table.
As a DBA, I realize the implication of the feature:
   *) b0rked indexes
   *) b0rked data
   *) Not knowing what's good and what's bad
   *) Bad reports
   *) Bad Bi

etc..., etc... etc...

   Still, I'd rather be allowed to make the decision here. I think that having the database try to enforce integrity on something i've marked as 'corruptable' (via the 'unlogged' flag) will be a constant fight between me and the system. In the end, I'd just not use the feature.

The bottom line here is that you really can only use the feature
for data that you're willing to accept losing on no notice.
Allowing the data to persist across clean shutdowns would probably
improve usability a bit, but it's not changing that fundamental fact.

Agreed, and that's fine. IMHO, it improves the usability 10 fold. Having it truncated on server restart is useful for only a fraction of the use-cases for this feature.

--Scott

regards, tom lane

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Scott Mead

Дата:

17 ноября 2010 г., 14:41:21

On Wed, Nov 17, 2010 at 10:38 AM, Scott Mead <scott@scottrmead.com> wrote:

On Tue, Nov 16, 2010 at 10:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Man, the number of misunderstandings in this thread is staggering.
Let me try to explain what the proposed feature will and will not do.

1. The system catalog entries for all tables will be wal-logged.
So schema (DDL) will survive a crash. There wouldn't be any way
to make it not do that, because we can't wal-log only some updates
to a particular table, and that includes the catalogs in particular.

Gotcha

2. What's proposed as the new feature is that specific non-system
tables can be marked as unlogged, meaning that WAL entries won't
be made for changes in those tables' contents (nor their indexes'
contents). So we can't guarantee that the contents of such tables
will be correct or consistent after a crash. The proposed feature
deals with this by forcibly truncating all such tables after a crash,
thus ensuring that they're consistent though not populated. So the
possible use-cases for such tables are limited to where (a) you can
repopulate the tables on demand, or (b) you don't really care about
losing data on a crash.

I would rather be allowed to decide that for myself.

3. There's a lot of wishful thinking here about what constitutes a
crash. A backend crash *is* a crash, even if the postmaster keeps
going. Data that had been in shared buffers doesn't get written out
in such a scenario (and if we tried, it might be corrupt anyway). So
unlogged tables would be corrupt and in need of truncation after such an
event. Obviously, the same goes for an OS-level crash or power failure.

Right, just let *me* decide, that's all.

4. The last bit of discussion on -hackers concerned what to do in
the case where the server got shut down cleanly. If it was shut
down cleanly, then any data for unlogged tables would have been
written out from shared buffers ... but did the data make it to disk?
There's no easy way to know that. In the event of an OS crash or
power failure shortly after server shutdown, it's possible that
the unlogged tables would be corrupt. So Robert's initial proposal
includes truncating unlogged tables at any database startup, even
if the previous shutdown was clean. Some (including me) are arguing
that that is unnecessarily strict; but you do have to realize that
you're taking some risk with data validity if it doesn't do that.

It is too strict, it makes the feature barely more usable than a temp table.
As a DBA, I realize the implication of the feature:
   *) b0rked indexes
   *) b0rked data
   *) Not knowing what's good and what's bad
   *) Bad reports
   *) Bad Bi

etc..., etc... etc...

   Still, I'd rather be allowed to make the decision here. I think that having the database try to enforce integrity on something i've marked as 'corruptable' (via the 'unlogged' flag) will be a constant fight between me and the system. In the end, I'd just not use the feature.

The bottom line here is that you really can only use the feature
for data that you're willing to accept losing on no notice.
Allowing the data to persist across clean shutdowns would probably
improve usability a bit, but it's not changing that fundamental fact.

Agreed, and that's fine. IMHO, it improves the usability 10 fold. Having it truncated on server restart is useful for only a fraction of the use-cases for this feature.

Now that I've just sent that last piece, what about a 'truncate on restart' option that is defaulted to on? That way, the community feels good knowing that we're trying to protect people from themselves, but like the 'fsync' feature, I can load the gun and pull the trigger if I really want to. I'd like to see that so even if there is a server crash, it doesn't truncate. That way, i can rename the garbage table if I want, create a new one for all new data and then be allowed to glean what I can from the last one.

--Scott

--Scott

regards, tom lane

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Tom Lane

Дата:

17 ноября 2010 г., 15:07:27

Derrick Rice <derrick.rice@gmail.com> writes:
> Is there a difference between a global temporary table (if such a thing
> existed in PostgreSQL) and an unlogged table?

Yes --- IIRC, a "global temp table" per spec has session-local contents.
An unlogged table acts just like any other table except with respect to
crash safety.

            regards, tom lane

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Steve Crawford

Дата:

17 ноября 2010 г., 15:16:58

On 11/16/2010 07:25 PM, Tom Lane wrote:
> Man, the number of misunderstandings in this thread is staggering....
>

First, I have plenty of processes that I would immediately convert to
using this (and, FWIW,  none of them would benefit from preserving data
across restarts). But I have some questions that may expose my
misunderstandings:

1. Would there be restrictions preventing a standard table from having a
FK or other constraint that depends on an unlogged table? If not, it
seems like there could be an unwanted ripple-effect from lost of the
unlogged table.

2. Would it be possible to accidentally mix logged and unlogged tables
in an inheritance chain? What would be the impact?

3. If unlogged data is included in a dump (my vote is no), would this
lead to inconsistent behavior between dumps taken from a master and
dumps taken from a hot-standby?

4. Would it be reasonable for temporary-tables to be unlogged by default?

Cheers,
Steve

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Ivan Voras

Дата:

17 ноября 2010 г., 15:23:09

On 11/17/10 01:23, Scott Ribe wrote:
> On Nov 16, 2010, at 3:46 PM, Josh Berkus wrote:
>
>>   ...and will be truncated (emptied) on database restart.
>
> I think that's key. Anything that won't survive a database restart, I sure don't expect to survive backup&  restore.

FWIW, I agree with this reasoning. Iff the automatic truncate on clean
restart could be turned off, I'd also expect the data to be in the
backup - in that case only.

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Ivan Voras

Дата:

17 ноября 2010 г., 15:33:53

On 11/17/10 02:55, Josh Berkus wrote:
>
>> If you do wish to have the data tossed out for no good reason every so
>> often, then there ought to be a separate attribute to control that.  I'm
>> really having trouble seeing how such behavior would be desirable enough
>> to ever have the server do it for you, on its terms rather than yours.
>
> I don't quite follow you.  The purpose of unlogged tables is for data
> which is disposable in the event of downtime; the classic example is the
> a user_session_status table.  In the event of a restart, all user
> sessions are going to be invalid anyway.

Depends on what you mean by "session".

Typical web application session data, e.g. for PHP applications which
are deployed in *huge* numbers resides directly on file systems, and are
not guarded by anything (not even fsyncs). On operating system crash
(and I do mean when the whole machine and the OS go down), the most that
can happen is that some of those session files get garbled or missing -
all the others work perfectly fine when the server is brought back again
and the users can continue to work within their sessions. -- *That* is
useful session behaviour and it is also useful for logs.

The definition of unlogged tables which are deliberately being emptied
for no good reason does not seem very useful to me. I'd rather support a
(optional) mode (if it can be implemented) in which PostgreSQL scans
through these unlogged tables on startup and discards any pages whose
checkums don't match, but accepts all others as "good enough". Even
better: maybe not all pages need to be scanned, only the last few, if
there is a chance for any kind of mechanism which can act as checkpoints
for data validity.

Re: Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

"A.M."

Дата:

17 ноября 2010 г., 15:43:46

On Nov 17, 2010, at 11:32 AM, Ivan Voras wrote:

> On 11/17/10 02:55, Josh Berkus wrote:
>>
>>> If you do wish to have the data tossed out for no good reason every so
>>> often, then there ought to be a separate attribute to control that.  I'm
>>> really having trouble seeing how such behavior would be desirable enough
>>> to ever have the server do it for you, on its terms rather than yours.
>>
>> I don't quite follow you.  The purpose of unlogged tables is for data
>> which is disposable in the event of downtime; the classic example is the
>> a user_session_status table.  In the event of a restart, all user
>> sessions are going to be invalid anyway.
>
> Depends on what you mean by "session".
>
> Typical web application session data, e.g. for PHP applications which are deployed in *huge* numbers resides directly
onfile systems, and are not guarded by anything (not even fsyncs). On operating system crash (and I do mean when the
wholemachine and the OS go down), the most that can happen is that some of those session files get garbled or missing -
allthe others work perfectly fine when the server is brought back again and the users can continue to work within their
sessions.-- *That* is useful session behaviour and it is also useful for logs. 
>
> The definition of unlogged tables which are deliberately being emptied for no good reason does not seem very useful
tome. I'd rather support a (optional) mode (if it can be implemented) in which PostgreSQL scans through these unlogged
tableson startup and discards any pages whose checkums don't match, but accepts all others as "good enough". Even
better:maybe not all pages need to be scanned, only the last few, if there is a chance for any kind of mechanism which
canact as checkpoints for data validity. 

This is not really a fair feature comparison. With the file-based sessions, the webserver will continue to deal with
potentiallycorrupted sessions, which is worse than dealing with no sessions. This new PostgreSQL feature will ensure
thatsuch a thing a cannot happen while also offering the performance of the file-based session storage and the ability
touse queries against the session data. In my backups (using whatever flag or dump default), I will be ensuring that
thesessions are *not* in the backup. I also plan on using this feature for materialized views to replace memcached. 

Considering that I have been waiting on this feature for years, I, for one, welcome our unlogged table overlords.

Cheers,
M

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Tom Lane

Дата:

17 ноября 2010 г., 15:56:44

Steve Crawford <scrawford@pinpointresearch.com> writes:
> 1. Would there be restrictions preventing a standard table from having a
> FK or other constraint that depends on an unlogged table? If not, it
> seems like there could be an unwanted ripple-effect from lost of the
> unlogged table.

I would assume that we should disallow an FK referencing an unlogged
table from a regular table.  I don't know whether the current patch
covers that point, but if not it's an oversight.

> 2. Would it be possible to accidentally mix logged and unlogged tables
> in an inheritance chain? What would be the impact?

I don't see any logical problem there: some of the data in the
inheritance tree would be subject to loss on crash, other data not.
But the schema is all logged so no inconsistency arises.

> 3. If unlogged data is included in a dump (my vote is no), would this
> lead to inconsistent behavior between dumps taken from a master and
> dumps taken from a hot-standby?

Hmm, that is a really interesting point.  You're right that a dump taken
from a standby slave could not possibly include such data, since for
lack of WAL it would never be propagated to the slave.  I am not sure
whether that inconsistency is a sufficiently good reason to not dump the
data from the master, though.  I think that in any case we are going to
want a pg_dump option to dump/not dump unlogged data --- the argument is
only about which behavior will be default.

I'm not sure that Robert's completely thought through the implications
of this patch for behavior on a slave, anyway.  Given the sequence
    * take base backup from running system (which will surely
      include inconsistent data for unlogged tables)
    * install base backup on slave
    * run recovery, transitioning to hot standby
    * go live
it's unclear to me where along the line the slave has an opportunity to
clean out its bogus images of the unlogged tables.  But it had better do
so before opening up for hot standby queries, let alone going live.

> 4. Would it be reasonable for temporary-tables to be unlogged by default?

Temp tables already are, always have been, always will be, unlogged.
This patch is about attempting to bring that performance benefit of
a temp table to regular tables.

            regards, tom lane

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Josh Berkus

Дата:

17 ноября 2010 г., 16:49:20

> As was already mentioned, application logs. Unlogged tables would be
> perfect for that, provided they don't go *poof* every now and then for
> no good reason. Nobody's going to be too heart broken if a handful of
> log records go missing, or get garbled, after a server crash or power
> outage. Delete 'em all after every restart though, and that's a problem.

That's a nice thought, but it's not how data corruption works in the
event of a crash.  If a table is corrupted, *we don't know* how it's
corrupted, and it's not just "the last few records" which are corrupted.
  So for unlogged tables, there is never going to be any other option
for crashes than to truncate them.

Robert Haas did discuss the ability to synch unlogged tables on a
planned shutdown, though.   However, that's liable to wait until 9.2,
given the multiple steps required to make it work.

Note that you would have the option of periodically synching an unlogged
table to pgdump or to a logged table, via script, if you cared about
retaining the data.  That would probably give you the behavior you want,
above.

--
                                   -- Josh Berkus
                                      PostgreSQL Experts Inc.
                                      http://www.pgexperts.com

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Glen Parker

Дата:

17 ноября 2010 г., 18:59:10

On 11/16/2010 07:25 PM, Tom Lane wrote:
> 2. What's proposed as the new feature is that specific non-system
> tables can be marked as unlogged, meaning that WAL entries won't
> be made for changes in those tables' contents (nor their indexes'
> contents).  So we can't guarantee that the contents of such tables
> will be correct or consistent after a crash.  The proposed feature
> deals with this by forcibly truncating all such tables after a crash,
> thus ensuring that they're consistent though not populated.  So the
> possible use-cases for such tables are limited to where (a) you can
> repopulate the tables on demand, or (b) you don't really care about
> losing data on a crash.

Well if you guys would just hurry up and implement a way to mark indexes
as inconsistent and continue to run without using them, you'd at least
have the index problem solved :D

I was one of the guys drooling over WAL way back when it was a new
feature.  I understand the risks in not logging updates, and most of the
time won't accept the risk if I don't absolutely have to.  But, OTOH,
before WAL, ALL tables were unlogged, and we still never lost any data
that I'm aware of.  I don't remember ever having anything worse than
errors from corrupt indexes.

Maybe there's some old code somewhere to repair PG tables that could be
dusted off and updated...?

Heck, even if the postmaster refused to do anything with tables it
thought might be FUBAR, and we had to repair them to even issue selects
against them, in some cases that would still be better than having all
the data go *poof*.

-Glen

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Scott Mead

Дата:

17 ноября 2010 г., 19:22:12

On Wed, Nov 17, 2010 at 12:49 PM, Josh Berkus <josh@agliodbs.com> wrote:

As was already mentioned, application logs. Unlogged tables would be
perfect for that, provided they don't go *poof* every now and then for
no good reason. Nobody's going to be too heart broken if a handful of
log records go missing, or get garbled, after a server crash or power
outage. Delete 'em all after every restart though, and that's a problem.

That's a nice thought, but it's not how data corruption works in the event of a crash. If a table is corrupted, *we don't know* how it's corrupted, and it's not just "the last few records" which are corrupted. So for unlogged tables, there is never going to be any other option for crashes than to truncate them.

Robert Haas did discuss the ability to synch unlogged tables on a planned shutdown, though. However, that's liable to wait until 9.2, given the multiple steps required to make it work.

Note that you would have the option of periodically synching an unlogged table to pgdump or to a logged table, via script, if you cared about retaining the data. That would probably give you the behavior you want, above.

In an airplane, a pilot can kill the engine mid-flight if [s]he wants to. They can deploy the flaps /slats at cruise speed / altitude, and if they're so minded, they can land with a full tank of gas. Now, none of these things are particularly wise, but that's why the pilots are given *slightly* more learning than your average bus driver.

If you want to have a widely usable 'unlogged' table feature, I highly recommend that 'truncate on server crash/restart' be an option that is defaulted to true. That way, I can go in an push the buttons I want and give corrupted data to whomever, whenever i like. (Land with a full tank of Jet-A).

Whatever the decision is about backup, doesn't really matter IMO, but I honestly think that the benefit of an unlogged table is there for both session data (I run my session db's in fsync mode anyway and re-initdb them on boot) AND for logging data where I can't take WAL anymore, but would like to be able to have them in the same cluster as other stuff. If they just disappear then this feature won't be useful [to me] and I'll have to either wait for the patch or give up on it and do a flat-file / lucene project just to deal with it (I really don't want to do that :-).

--Scott

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

"Marc Mamin"

Дата:

17 ноября 2010 г., 20:19:32

I would like to choose the table behaviour on restart (restore/forget it)

    
Currently, I'm looking for a way to split large transaction on different threads (with dblink).
AN issue is to efficiently share temp data across the threads. unlogged tables would be here fine, something like
globaltemp tables with shared data.
 

here an example to illustrate the current situation:


select cic_connect_me('c');
select dblink_exec   ('c', 'drop table if exists my_share');
select dblink_exec   ('c', 'create table my_share( a int)');
select dblink_disconnect ('c');

SELECT cic_multithread(ARRAY[
  'insert into my_share select * from generate_series(1,10000)',
   'insert into my_share select * from generate_series(1,10000)',
   'insert into my_share select * from generate_series(1,10000)',
   'insert into my_share select * from generate_series(1,10000)',
   'insert into my_share select * from generate_series(1,10000)',
  'insert into my_share select * from generate_series(1,10000)']
,max_threads=4);

create temp table my_result as select * from my_share;
drop table my_share;

select * from my_result;

For pg dump, I guess that having an optional flag is fine, but:
unlogged tables could also be useful to store very large 'raw' data to be processed,
whereas the client would only query the processed results.
In such a case, restoring the logged table has a higher priority.
The best solution in my opinion, would allow to dump/restore these 2 table types in separate processes (or threads..).

(and by the way: would it be possible to choose the compress tool as an option for pg_dump)

pgdump -F.. -Compress pigz -f out.dmp -f_unlogged out_unlogged.dmp.

regards, 

Marc Mamin

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

"Marc Mamin"

Дата:

17 ноября 2010 г., 21:23:58

Hello, 
another question.
I haven't read the complete threads, so I apologize if this was already discussed.

Will it be possible to switch from unlogged to logged ?

To improve COPY performances, I currently: 

- make a copy of heavily indexed tables
- load new data in the shadow table
- add the indexes
- drop the live table
- rename the shadow table to the visible one.

Is it imaginable to use unlogged tables foe the shadow one and then enable logging after the switch ?

regards,

Marc Mamin

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Tom Lane

Дата:

17 ноября 2010 г., 21:40:19

"Marc Mamin" <M.Mamin@intershop.de> writes:
> Will it be possible to switch from unlogged to logged ?

Probably not, because it would completely confuse hot standby slaves
(or anything else looking at the WAL replay stream).  You can't just
start issuing WAL records against an already-built table or index,
because the WAL entries are deltas.

            regards, tom lane

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

"Marc Mamin"

Дата:

17 ноября 2010 г., 21:46:52

yep,

but I'll miss this as I only use WAL for crash recovery...

regards,

Marc Mamin

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Mittwoch, 17. November 2010 23:40
To: Marc Mamin
Cc: PostgreSQL general; Josh Berkus
Subject: Re: [GENERAL] Survey on backing up unlogged tables: help us
with PostgreSQL development!

"Marc Mamin" <M.Mamin@intershop.de> writes:
> Will it be possible to switch from unlogged to logged ?

Probably not, because it would completely confuse hot standby slaves
(or anything else looking at the WAL replay stream).  You can't just
start issuing WAL records against an already-built table or index,
because the WAL entries are deltas.

            regards, tom lane

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Ivan Voras

Дата:

17 ноября 2010 г., 22:07:18

On 11/17/10 17:43, A.M. wrote:
>
> On Nov 17, 2010, at 11:32 AM, Ivan Voras wrote:
>
>> On 11/17/10 02:55, Josh Berkus wrote:
>>>
>>>> If you do wish to have the data tossed out for no good reason every so
>>>> often, then there ought to be a separate attribute to control that.  I'm
>>>> really having trouble seeing how such behavior would be desirable enough
>>>> to ever have the server do it for you, on its terms rather than yours.
>>>
>>> I don't quite follow you.  The purpose of unlogged tables is for data
>>> which is disposable in the event of downtime; the classic example is the
>>> a user_session_status table.  In the event of a restart, all user
>>> sessions are going to be invalid anyway.
>>
>> Depends on what you mean by "session".
>>
>> Typical web application session data, e.g. for PHP applications which are deployed in *huge* numbers resides
directlyon file systems, and are not guarded by anything (not even fsyncs). On operating system crash (and I do mean
whenthe whole machine and the OS go down), the most that can happen is that some of those session files get garbled or
missing- all the others work perfectly fine when the server is brought back again and the users can continue to work
withintheir sessions. -- *That* is useful session behaviour and it is also useful for logs. 
>>
>> The definition of unlogged tables which are deliberately being emptied for no good reason does not seem very useful
tome. I'd rather support a (optional) mode (if it can be implemented) in which PostgreSQL scans through these unlogged
tableson startup and discards any pages whose checkums don't match, but accepts all others as "good enough". Even
better:maybe not all pages need to be scanned, only the last few, if there is a chance for any kind of mechanism which
canact as checkpoints for data validity. 
>
> This is not really a fair feature comparison. With the file-based sessions, the webserver will continue to deal with
potentiallycorrupted sessions, which is worse than dealing with no sessions. 

I guess it depends on specific use case, but in the common case (i.e.
non-mission critical massive deployments) I'd say it's definitely *not*
worse than no sessions. "Dealing with potential corruption" in this case
usually means the web application will attempt to deserialize the
session data and fail if it's corrupted, leading to a new session being
created.

> I also plan on using this feature for materialized views to replace memcached.

Just how large a performance gain is expected from this thing? :) I
don't see a mention that fsync will be disabled on unlogged tables
(though it makes sense so it probably will be).

Having materialized views this way will mean that something - either an
application or an external script triggered by database startup - will
have to calculate and create this materialized view, which will probably
involve massive table scanning all around - I suspect that performance
gains from unlogged tables could be hidden by this scanning.

Anyway, I'm not arguing against them, I'm arguing for making them more
powerful.

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Merlin Moncure

Дата:

17 ноября 2010 г., 22:31:02

On Tue, Nov 16, 2010 at 5:46 PM, Josh Berkus <josh@agliodbs.com> wrote:
> Folks,
>
> Please help us resolve a discussion on -hackers.
>
> PostgreSQL 9.1 is likely to have, as a feature, the ability to create
> tables which are "unlogged", meaning that they are not added to the
> transaction log, and will be truncated (emptied) on database restart.
> Such tables are intended for highly volatile, but not very valuable,
> data, such as session statues, application logs, etc.
>
> The question is, how would you, as a DBA, expect pg_dump backups to
> treat unlogged tables? Backing them up by default has the potential to
> both cause performance drag on the unlogged table and make your backups
> take longer unless you remember to omit them. Not backing them up by
> default has the drawback that if you forget --include-unlogged switch,
> and shut the database down, any unlogged data is gone. How would you
> *expect* unlogged tables to behave?

For 'as regular table' argument:
We are *assuming* the data is not very valuable.  I'd rather assume it
is valuable.  pg_dump has a very specific purpose: to back up the
database in it's current state into a file that can replay that state.
 Not backing up large objects by default is a huge gotcha imnsho.  I
understand the reasoning, but disagree with it.  Certainly a switch to
tune them out would be nice, but not the default.

Against argument:
hm, I guess the counter agrument is that since because they are not
WAL logged, they can't possibly be replayed to a standby, and it makes
sense to have pg_dump and archive log based backup behave in similar
fashion.  It's weird that different backup strategies produce
different results.

I think the 'against argument' is stronger by about .5, so I'm voting
.5 for pg_dump not to dump them.  In fact, if that side of it wins,
maybe pg_dump shouldn't even deal with them at all.  I guess that
would by my position.

merlin

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Alban Hertroys

Дата:

18 ноября 2010 г., 06:51:29

On 16 Nov 2010, at 23:46, Josh Berkus wrote:

> Folks,
>
> Please help us resolve a discussion on -hackers.
>
> PostgreSQL 9.1 is likely to have, as a feature, the ability to create
> tables which are "unlogged", meaning that they are not added to the
> transaction log, and will be truncated (emptied) on database restart.
> Such tables are intended for highly volatile, but not very valuable,
> data, such as session statues, application logs, etc.
>
> The question is, how would you, as a DBA, expect pg_dump backups to
> treat unlogged tables? Backing them up by default has the potential to
> both cause performance drag on the unlogged table and make your backups
> take longer unless you remember to omit them. Not backing them up by
> default has the drawback that if you forget --include-unlogged switch,
> and shut the database down, any unlogged data is gone. How would you
> *expect* unlogged tables to behave?

From the discussion so far it appears to me that "unlogged" should probably be split into various gradations of
unlogged.There appear to be a number of popular use-cases for such tables, with different requirements, namely: 

1. Session tables
These tables contain data about a user session in some application. It is temporary data at best, it's no problem to
loseit at all. Dumping it makes no sense. 

2. Staging tables
These tables contain data that's being processed and prepared to be entered into other tables in the database. If the
processfails it can usually be restarted, so losing the data is no problem. Here as well, dumping makes little sense. 

3. Logging tables
This is data from application logs. It's not usually mission critical, so losing it isn't a very big deal, but it is
usefuldata of itself. It should in most cases survive a normal backend restart, but if it doesn't survive a backend
crashthat's acceptable. This data should in most cases be included in dumps (or dumped separately?). 

4. Materialized views
Stored results of a query that's likely to have a big footprint on system resources. Losing the data after a backend
crashis acceptable, but it should survive a normal system restart. Since the data can be generated from the contents of
thedatabase, it's not necessary to include it in dumps (but maybe it is convenient in some cases?) 

I think this is the gist of it.
Which leads me to think that people want three knobs to play with: should the table be logged or not? Can it be
truncatedat normal server restart or not? Should it be included in dumps or not? And possibly, should it be fsynced or
not?

Of course, without WAL logs, PITR and WAL-based replication are out of the question for these tables. Also, since the
datacan be lost, they can't be referenced by foreign keys. 

Does that sum it up adequately?

There's one thing that I didn't see mentioned and that I'm not sure fits into the picture here, namely read-only tables
(materializedviews would qualify for those in most cases). 
These tables are written every once in a while by a system user, but it doesn't change in between at all. There's not
muchpoint in giving every user their own session, and it should be possible to assume all index entries point to a live
record(which has consequences for COUNT(), for example). 

Changing that has quite some implications though, I wager...

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,4ce4daf610425035851824!

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Karsten Hilbert

Дата:

18 ноября 2010 г., 08:30:57

On Thu, Nov 18, 2010 at 08:49:12AM +0100, Alban Hertroys wrote:

> From the discussion so far it appears to me that
> "unlogged" should probably be split into various gradations
> of unlogged. There appear to be a number of popular
> use-cases for such tables, with different requirements,

That's precisely the point why this discussion doesn't lead
to a *solution*. It can only lead to a *decision*.

It seems that it needs to be decided first whether in the
case of unWALed tables we want PostgreSQL to provide *means*
or *policies*. The former are decidable and robustly
implementable in a piece of infrastructure software like
PostgreSQL. The latter are up to the whims of each
deployment site.

> Which leads me to think that people want three knobs to
> play with: should the table be logged or not? Can it be
> truncated at normal server restart or not? Should it be
> included in dumps or not? And possibly, should it be fsynced
> or not?

Yep, your analysis breaks down the policy stage (the grading
of "logged") into "modes" or "means" which people can apply
to achieve a certain policies.

That is why I argued for options:

- alter database dump_unlogged_tables to on/off

    default on: better safe than sorry, point the gun but don't pull the trigger

- pg_dump --include-unlogged-tables

    default: whatever alter database says

- psqlrc:  \set include_unlogged_tables to on/off

    default: doesn't exist, falls back to what "alter database" or --include-unlogged say

That way I can use certain means to work out the policy I
want, namely setting "alter database" to what it should be
on this database waaay before the time comes when it is
crucial to not forget --included-unlogged.

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Karsten Hilbert

Дата:

18 ноября 2010 г., 08:44:26

On Thu, Nov 18, 2010 at 10:30:46AM +0100, Karsten Hilbert wrote:

> That is why I argued for options:
>
> - alter database dump_unlogged_tables to on/off
>
>     default on: better safe than sorry, point the gun but don't pull the trigger

(I agree, however, that the database metadata isn't really
the appropriate place to store application specific
configuration items. So, maybe pgdumprc is a better place
for that.)

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Jayadevan M

Дата:

18 ноября 2010 г., 08:48:19

Hello,

> > PostgreSQL 9.1 is likely to have, as a feature, the ability to create
> > tables which are "unlogged", meaning that they are not added to the
> > transaction log, and will be truncated (emptied) on database restart.
> > Such tables are intended for highly volatile, but not very valuable,
> > data, such as session statues, application logs, etc.
> >

One doubt - if the tables are 'unlogged' , will the DMLs against these
still be replicated to a slave?

Regards,
Jayadevan





DISCLAIMER:

"The information in this e-mail and any attachment is intended only for
the person to whom it is addressed and may contain confidential and/or
privileged material. If you have received this e-mail in error, kindly
contact the sender and destroy all copies of the original communication.
IBS makes no warranty, express or implied, nor guarantees the accuracy,
adequacy or completeness of the information contained in this email or any
attachment and is not liable for any errors, defects, omissions, viruses
or for resultant loss or damage, if any, direct or indirect."

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Andy Colson

Дата:

18 ноября 2010 г., 13:47:38

On 11/18/2010 3:46 AM, Jayadevan M wrote:
> Hello,
>
>>> PostgreSQL 9.1 is likely to have, as a feature, the ability to create
>>> tables which are "unlogged", meaning that they are not added to the
>>> transaction log, and will be truncated (emptied) on database restart.
>>> Such tables are intended for highly volatile, but not very valuable,
>>> data, such as session statues, application logs, etc.
>>>
>
> One doubt - if the tables are 'unlogged' , will the DMLs against these
> still be replicated to a slave?
>
> Regards,
> Jayadevan

Yes, because the system tables, which store the layout of all tables, is
written to WAL.

-Andy

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Josh Berkus

Дата:

18 ноября 2010 г., 16:41:35

> Does that sum it up adequately?

One more thing: that you might not get all of these options in 9.1.
Currently the discussion is talking about *maybe* offering checkpointing
of unlogged tables, which would allow such tables to survive a normal
restart, and including unlogged tables in pg_dump by default.  But it's
also possible that we'll only have one type of unlogged table in 9.1.,
with other options waiting for 9.2.

--
                                   -- Josh Berkus
                                      PostgreSQL Experts Inc.
                                      http://www.pgexperts.com

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Jayadevan M

Дата:

19 ноября 2010 г., 02:25:13

Hi,

> > One doubt - if the tables are 'unlogged' , will the DMLs against these
> > still be replicated to a slave?
> >
>
> Yes, because the system tables, which store the layout of all tables, is

> written to WAL.
Thank you for the reply. But my doubt was not about layout, rather the
DMLs. If I do an insert into an 'unlogged' table, what happens to that?
Will that be replicated in the slave (using PostgreSQL's inbuilt
replication)?
Regards,
Jayadevan





DISCLAIMER:

"The information in this e-mail and any attachment is intended only for
the person to whom it is addressed and may contain confidential and/or
privileged material. If you have received this e-mail in error, kindly
contact the sender and destroy all copies of the original communication.
IBS makes no warranty, express or implied, nor guarantees the accuracy,
adequacy or completeness of the information contained in this email or any
attachment and is not liable for any errors, defects, omissions, viruses
or for resultant loss or damage, if any, direct or indirect."

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Alban Hertroys

Дата:

19 ноября 2010 г., 06:17:11

On 19 Nov 2010, at 4:23, Jayadevan M wrote:

> Hi,
>
>>> One doubt - if the tables are 'unlogged' , will the DMLs against these
>>> still be replicated to a slave?
>>>
>>
>> Yes, because the system tables, which store the layout of all tables, is
>
>> written to WAL.
> Thank you for the reply. But my doubt was not about layout, rather the
> DMLs. If I do an insert into an 'unlogged' table, what happens to that?
> Will that be replicated in the slave (using PostgreSQL's inbuilt
> replication)?

What are the use-cases for replicating unlogged tables?

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4ce6246e10421025920086!

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

"Marc Mamin"

Дата:

19 ноября 2010 г., 06:41:26

> What are the use-cases for replicating unlogged tables?

Hello,

I guess they could be useful in cloud infrastructures.

(see http://archives.postgresql.org/pgsql-general/2010-11/msg00865.php)

regards,

Marc Mamin

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Jayadevan M

Дата:

19 ноября 2010 г., 07:10:29

Hello,

> > Thank you for the reply. But my doubt was not about layout, rather the

> > DMLs. If I do an insert into an 'unlogged' table, what happens to
that?
> > Will that be replicated in the slave (using PostgreSQL's inbuilt
> > replication)?
>
> What are the use-cases for replicating unlogged tables?
>
I do not have a use case for replicating unlogged tables. But I may use
temp/unlogged tables in my master to populate actual tables.

Say, I have a db archival/purge process. I populate temp tables with PKs
of my permanent tables and use that to drive my insertion into history
tables, deletion from live tables etc.

Pseudocode

Insert into mytemptable (id)  tables select mypk from liveable where
lastuseddate < archivedate;

insert into myhist select a.* from livetable  a join mytemp table on
a.mypk=mytemptable.id
delete from liveable where mypk in (select id from mytemp )

Reading about what goes into WAL tells me that the permanent table data
will be replicated all right even if the temp tables are not logged. Is
that right?

Regards,
Jayadevan





DISCLAIMER:

"The information in this e-mail and any attachment is intended only for
the person to whom it is addressed and may contain confidential and/or
privileged material. If you have received this e-mail in error, kindly
contact the sender and destroy all copies of the original communication.
IBS makes no warranty, express or implied, nor guarantees the accuracy,
adequacy or completeness of the information contained in this email or any
attachment and is not liable for any errors, defects, omissions, viruses
or for resultant loss or damage, if any, direct or indirect."

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Ron Mayer

Дата:

20 ноября 2010 г., 05:58:53

Glen Parker wrote:
> As was already mentioned, application logs.  Unlogged tables would be
> perfect for that, provided they don't go *poof* every now and then for
> no good reason.  Nobody's going to be too heart broken if a handful of
> log records go missing, or get garbled, after a server crash or power
> outage.  Delete 'em all after every restart though, and that's a problem.

How often are you doing unintentional restarts?

I'd guess for many people it's "whenever I had so many backend crashes
that I get motivated to check if I'm running the latest minor release".

And if it's an intentional restart - surely you could archive your
application logs before doing the restart, no?

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Ron Mayer

Дата:

20 ноября 2010 г., 12:04:09

Josh Berkus wrote:
>> With the current patches, the data survives a restart just fine.
>
> Per -hackers, that's not guarenteed.

"Not guaranteed" is fine. What people are asking for is "often survives".

AFAIK we don't truncate the log file created by the log_filename GUC
on every unclean crash and every clean shutdown.

Should we? :-)

Why not?

For people who intend to use these tables to log application data,
they'd have the exact same reasons for not wanting them truncated
when they don't need to be.

Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

От

Josh Berkus

Дата:

20 ноября 2010 г., 18:36:52

> AFAIK we don't truncate the log file created by the log_filename GUC
> on every unclean crash and every clean shutdown.

That's not a remotely relevant analogy.  A log file is not a database table.

If we allow a database table to become corrupted due to being unsynched
at the time of shutdown, it's not a matter of "missing a few rows".  The
table is *unreadable*, and may cause the backend or even the whole
server to crash when you try to read it.

Anyway, per discussion on hackers, unlogged tables (or "volatile tables"
as they're now being called) include two modes in the spec; one which
checkpoints (and thus can survive a planned restart) and one which
doesn't (and will truncate on every restart, but doesn't cause physical
I/O).  We may or may not have both modes for 9.1.


--
                                   -- Josh Berkus
                                      PostgreSQL Experts Inc.
                                      http://www.pgexperts.com

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Survey on backing up unlogged tables: help us with PostgreSQL development!