I like it.
-----Original Message-----
From: Jackson, DeJuan [SMTP:djackson@cpsgroup.com]
Sent: Friday, March 05, 1999 12:02 PM
To: jwieck@debis.com; Michael Davis
Cc: pgsql-general@postgreSQL.org; pgsql-hackers@postgreSQL.org
Subject: RE: [HACKERS] RE: [GENERAL] Transaction logging
Wouldn't it be possible to have the commit take care of the logging.
I
don't know enough about the backend, but all those functions have to
get
evaluated at some point before the commit in order to be written to
the
db. You could actually have the commit log the needed sql-commands
without any function interference. Also doesn't commit know if any
rows
have been updated by any of the transaction's statements. This
could/would require filtering rules and triggers from the output, or
turning them off on the restore.
As far as the shared-tables/catalogues are concerned. We could have
each full-database dump take care of it's entries into the shared
tables. Having the transaction logs be incremental from the point
of
the dump allows the catalogues to be self updating (assuming we are
logging vacuums).
All this depends on getting a database commit to output it's
information
easily. It also ignores MVCC's effect on transactions, but I think
it's
a workable model.
You can tell me I'm blowing smoke now.
-DEJ
> -----Original Message-----
> From: jwieck@debis.com [mailto:jwieck@debis.com]
> Sent: Friday, March 05, 1999 12:24 PM
> To: michael.davis@prevuenet.com
> Cc: pgsql-general@postgreSQL.org; pgsql-hackers@postgreSQL.org
> Subject: Re: [HACKERS] RE: [GENERAL] Transaction logging
>
>
> Michael Davis wrote:
>
> >
> > What would it take to have transaction logging added to
> Postgres. I am a
> > c/c++ programmer and will consider contributing to the
> Postgres development
> > effort. I really like everything I see and read about
> Postgres. As a
>
> I spent some time on transaction logging since it's a feature
> I'm missing too. There are mainly two different transaction
> log mechanisms out.
>
> 1. Log queries sent to the backend.
>
> 2. Log images of inserted/updated rows and row ID's of
> deleted ones.
>
> The query level logging will write less information if
> queries usually affect a large number of rows. Unfortunately
> the extensibility of Postgres work's against this approach.
> There could be any number of user written functions who's
> results aren't reproduceable during recovery. And core
> features of Postgres itself would introduce the same problem.
> Have a sequence which is used to create default values for
> multiple tables, so that one ID is unique across them. Now
> two backends insert (with INSERT ... SELECT) concurrently
> into different tables using the same sequence. It's a
> classic race condition and it depends on context switching
> and page faults which backend will get which sequence
> numbers. You cannot foresee and you cannot reproduce, except
> you hook into the sequence generator and log this too. Later
> when recovering, another hook into the sequence generator
> must reproduce the logged results on the per
> backend/transaction/command base, and the same must be done
> for each function that usually returns unreproduceable
> results (anything dealing with time, pid's, etc.).
>
> As said, this must also cover user functions. So at least
> there must be a general log API that provides such a
> functionality for user written functions.
>
> The image logging approach also has problems. First, the only
> thing given to the heap access methods to outdate a tuple on
> update/delete is the current tuple ID (information that tells
> which tuple in which block is meant). So you need to save
> the database files in binary format, because during the
> actually existing dump/restore this could change and the
> logged CTID's would hit the wrong tuples.
>
> Second, you must remember in the log which transaction ID
> these informations came from and later if the transaction
> committed or not, so the recovery can set this commit/abort
> information in pg_log too. pg_log is a shared system file and
> the transaction ID's are unique only for one server. Using
> this information for online replication of a single database
> to another Postgres installation will not work.
>
> Third, there are still some shared system catalogs across all
> databases (pg_database, pg_group, pg_log!!!, pg_shadow and
> pg_variable). Due to that it would be impossible (or at least
> very, very tricky) to restore/recover (maybe point in time)
> one single database. If you destroy one database and restore
> it from the binary backup, these shared catalogs cannot be
> restored too, so they're out of sync with the backup time.
> How should the recovery now hit the right things (which
> probably must not be there at all)?.
>
> All this is really a mess. I think the architecture of
> Postgres will only allow something on query level with some
> general API for things that must reproduce the same result
> during recovery. For example time(). Inside the backend,
> time() should never be called directly. Instead another
> function is to be called that log's during normal operation
> which time get's returned by this particular function call
> and if the backend is in recovery mode, returns the value
> from the log.
>
> And again, this all means trouble. Usually, most queries sent
> to the database don't change any data because they are
> SELECT's. It would dramatically blow up the log amount if you
> log ALL queries instead of only those that modify things. But
> when the query begins, you don't know this, because a SELECT
> might call a function that uses SPI to UPDATE something else.
> So the decision if the query must be logged or not can only
> be made when the query is done (by having some global
> variable where the heap access methods set a flag that
> something got written). Now you have to log function call's
> like time() even if the query will not modify any single row
> in the database because the query is a
>
> SELECT 'now'::datetime - updtime FROM ...
>
> Doing this on a table with thousands of rows will definitely
> waste much logging space and slowdown the whole thing by
> unnecessary logging.
>
> Maybe it's a compromise if at each query start the actual
> time and other such information is remembered by the backend,
> all time() calls return this remembered value instead of the
> real one (wouldn't be bad anyway IMHO), and this information
> is logged only if the query is to be logged.
>
> Finally I think I must have missed some more problems, but
> aren't these enough already to frustrate you :-?
>
>
> Jan
>
> --
>
> #=============================================================
> =========#
> # It's easier to get forgiveness for being wrong than for
> being right. #
> # Let's break this rule - forgive me.
> #
> #======================================== jwieck@debis.com
> (Jan Wieck) #
>
>
>