Обсуждение: MVCC snapshot timing
I received a private email report that our introductory MVCC documentation is unclear about when a snapshot is taken. I have adjusted the wording in the attached patch to be less precise about snapshot timing. Snapshot timing is controlled by the session isolation level, which I don't think we want to cover in this introductory paragraph. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Вложения
Bruce Momjian <bruce@momjian.us> writes: > I received a private email report that our introductory MVCC > documentation is unclear about when a snapshot is taken. I have > adjusted the wording in the attached patch to be less precise about > snapshot timing. Snapshot timing is controlled by the session isolation > level, which I don't think we want to cover in this introductory > paragraph. I'm not really seeing the point of s/transaction/session/ here. The phrasing is a bit awkward and maybe could be improved, but I think you should keep it referring to transactions. regards, tom lane
On Mon, Nov 11, 2013 at 03:39:45PM -0500, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > I received a private email report that our introductory MVCC > > documentation is unclear about when a snapshot is taken. I have > > adjusted the wording in the attached patch to be less precise about > > snapshot timing. Snapshot timing is controlled by the session isolation > > level, which I don't think we want to cover in this introductory > > paragraph. > > I'm not really seeing the point of s/transaction/session/ here. > The phrasing is a bit awkward and maybe could be improved, but I think you > should keep it referring to transactions. Well, the problem with the original wording is that we don't take a new snapshot for every transaction in the default read-committed mode. Would you prefer I refer to statements, e.g.: This means that while querying a database each statement sees This is our default behavior. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Bruce Momjian <bruce@momjian.us> writes: > On Mon, Nov 11, 2013 at 03:39:45PM -0500, Tom Lane wrote: >> I'm not really seeing the point of s/transaction/session/ here. > Well, the problem with the original wording is that we don't take a new > snapshot for every transaction in the default read-committed mode. We take at least one snapshot per transaction, in any mode. Referring to sessions makes it even further away from being a useful concept. > Would you prefer I refer to statements, e.g.: 'Statement' might work. regards, tom lane
On Mon, Nov 11, 2013 at 08:59:35PM -0500, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > On Mon, Nov 11, 2013 at 03:39:45PM -0500, Tom Lane wrote: > >> I'm not really seeing the point of s/transaction/session/ here. > > > Well, the problem with the original wording is that we don't take a new > > snapshot for every transaction in the default read-committed mode. > > We take at least one snapshot per transaction, in any mode. Referring > to sessions makes it even further away from being a useful concept. > > > Would you prefer I refer to statements, e.g.: > > 'Statement' might work. OK, updated patch attached. Is "statement" too vague here? SQL statement? query? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Вложения
Bruce Momjian <bruce@momjian.us> writes: > On Mon, Nov 11, 2013 at 08:59:35PM -0500, Tom Lane wrote: >> 'Statement' might work. > OK, updated patch attached. Is "statement" too vague here? SQL > statement? query? "SQL statement" might be a good idea in the first sentence, but I don't think you need to repeat it in the second. What's bothering me about this wording is that you're talking about statements and then suddenly reference transactions (as being "those other things messing with your data"). This seems weirdly asymmetric, since after all you could equally well be the one messing with their data. regards, tom lane
On Mon, Nov 11, 2013 at 09:27:15PM -0500, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > On Mon, Nov 11, 2013 at 08:59:35PM -0500, Tom Lane wrote: > >> 'Statement' might work. > > > OK, updated patch attached. Is "statement" too vague here? SQL > > statement? query? > > "SQL statement" might be a good idea in the first sentence, but > I don't think you need to repeat it in the second. > > What's bothering me about this wording is that you're talking about > statements and then suddenly reference transactions (as being "those > other things messing with your data"). This seems weirdly asymmetric, > since after all you could equally well be the one messing with their > data. Yes, that bugged me too, but then I realized that you only see the changes from a transaction when it completes, not from each statement, e.g. you can never see changes between statements of a multi-statement transaction. I used "SQL statement" in the updated, attached patch. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Вложения
This reads badly to my ears: > This means that while querying a database each SQL statement sees a > snapshot of data (a database version) as it was some time ago, regardless > of the current state of the underlying data. How about something closer to: > This means for each SQL statement the user can specify a relative > point-in-time snapshot (database version) of the database against which to > query. These snapshot options are 1) the most recent committed data > currently available database-wide - including implicit commits (see note), > or 2) the committed data as-of the beginning of the current transaction - > including any changes made in the same. > > Note: an implicit commit occurs only within a multi-statement transaction. > For the purpose of determining if data has been committed any prior > statements in the same transaction are deemed to have been committed when > viewed by later statements. I know this is an introduction paragraph so the broad concept is being focused on rather than how such a user would in fact make this choice. I don't know that the term "implicit commit" is used elsewhere, likely not, but in effect that is what a statement in a transaction is seeing with respect to prior statements in the same transaction. Naming this behavior in the introduction would allow for someone less verbose descriptions to be used in detail sections. The above could be better integrated into the intro but I wanted to get opinions on the approach first. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/MVCC-snapshot-timing-tp5777759p5777852.html Sent from the PostgreSQL - docs mailing list archive at Nabble.com.
David Johnston wrote > This reads badly to my ears: >> This means that while querying a database each SQL statement sees a >> snapshot of data (a database version) as it was some time ago, regardless >> of the current state of the underlying data. > How about something closer to: >> This means for each SQL statement the user can specify a relative >> point-in-time snapshot (database version) of the database against which >> to query. These snapshot options are 1) the most recent committed data >> currently available database-wide - including implicit commits (see >> note), or 2) the committed data as-of the beginning of the current >> transaction - including any changes made in the same. >> >> Note: an implicit commit occurs only within a multi-statement >> transaction. For the purpose of determining if data has been committed >> any prior statements in the same transaction are deemed to have been >> committed when viewed by later statements. > I know this is an introduction paragraph so the broad concept is being > focused on rather than how such a user would in fact make this choice. > > I don't know that the term "implicit commit" is used elsewhere, likely > not, but in effect that is what a statement in a transaction is seeing > with respect to prior statements in the same transaction. Naming this > behavior in the introduction would allow for someone less verbose > descriptions to be used in detail sections. > > The above could be better integrated into the intro but I wanted to get > opinions on the approach first. > > David J. So with the comment about implicit commits the phrase "including any changes made in the same." can be dropped since that is what I was trying to imply before I devised the new term. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/MVCC-snapshot-timing-tp5777759p5777854.html Sent from the PostgreSQL - docs mailing list archive at Nabble.com.
On Mon, Nov 11, 2013 at 07:25:59PM -0800, David Johnston wrote: > This reads badly to my ears: > > > > This means that while querying a database each SQL statement sees a > > snapshot of data (a database version) as it was some time ago, regardless > > of the current state of the underlying data. > > How about something closer to: > > > > This means for each SQL statement the user can specify a relative > > point-in-time snapshot (database version) of the database against which to > > query. These snapshot options are 1) the most recent committed data > > currently available database-wide - including implicit commits (see note), > > or 2) the committed data as-of the beginning of the current transaction - > > including any changes made in the same. > > > > Note: an implicit commit occurs only within a multi-statement transaction. > > For the purpose of determining if data has been committed any prior > > statements in the same transaction are deemed to have been committed when > > viewed by later statements. > > I know this is an introduction paragraph so the broad concept is being > focused on rather than how such a user would in fact make this choice. > > I don't know that the term "implicit commit" is used elsewhere, likely not, > but in effect that is what a statement in a transaction is seeing with > respect to prior statements in the same transaction. Naming this behavior > in the introduction would allow for someone less verbose descriptions to be > used in detail sections. > > The above could be better integrated into the intro but I wanted to get > opinions on the approach first. We just want to get across the MVCC concept in the intro --- we cover the snapshots later in the document. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Bruce Momjian wrote > We just want to get across the MVCC concept in the intro --- we cover > the snapshots later in the document. I just think we're being too vague here; and we are covering them in the intro with the use of "some point in the past". IMO, the main point regarding MVCC is that every change in the system creates a new record and causes a prior record to be invalidated at a point-in-time. The combination of these two things increases concurrency since you can create new records while people are still using the old ones. One consequence, though, is that it is necessary for the user to decide at what point in the timeline they want to view the database. Does this sound right? The current (and modified) intro indeed covers these two points so it really comes down to style. My current gut feel is the documentation (generally speaking) does a good job of describing the mechanics of the system but, in some areas, could use more detail as to why and also the various implications of those mechanics [1]. Bringing those up in the intro gives the reader additional context so that when they get into the "how" detail sections they can more quickly link the mechanics with the problem they are meant to solve. Thus introducing the more specific "snapshot" concept in the intro provides the context for when they are reading why isolation levels exist. [1] Not that I'm proactively looking; but when questions arise regarding the docs I try and put myself in the person's shoes and find not that the docs are incorrect but that they could be improved - which is just a part of our reality). My $0.02 David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/MVCC-snapshot-timing-tp5777759p5778016.html Sent from the PostgreSQL - docs mailing list archive at Nabble.com.
On Tue, Nov 12, 2013 at 03:36:01PM -0800, David Johnston wrote: > Bruce Momjian wrote > > We just want to get across the MVCC concept in the intro --- we cover > > the snapshots later in the document. > > I just think we're being too vague here; and we are covering them in the > intro with the use of "some point in the past". > > IMO, the main point regarding MVCC is that every change in the system > creates a new record and causes a prior record to be invalidated at a > point-in-time. The combination of these two things increases concurrency > since you can create new records while people are still using the old ones. > One consequence, though, is that it is necessary for the user to decide at > what point in the timeline they want to view the database. > > Does this sound right? I still do not see how this fits appropriately in the introduction. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Bruce Momjian wrote > On Tue, Nov 12, 2013 at 03:36:01PM -0800, David Johnston wrote: >> Bruce Momjian wrote >> > We just want to get across the MVCC concept in the intro --- we cover >> > the snapshots later in the document. >> >> I just think we're being too vague here; and we are covering them in the >> intro with the use of "some point in the past". >> >> IMO, the main point regarding MVCC is that every change in the system >> creates a new record and causes a prior record to be invalidated at a >> point-in-time. The combination of these two things increases concurrency >> since you can create new records while people are still using the old >> ones. >> One consequence, though, is that it is necessary for the user to decide >> at >> what point in the timeline they want to view the database. >> >> Does this sound right? > > I still do not see how this fits appropriately in the introduction. The concept or the actual wording? The intended question was whether my understanding (and simplification) of the concept is correct. My specific wording is incoherent mostly because it really belongs to a larger corpus that currently exists only in my head. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/MVCC-snapshot-timing-tp5777759p5778033.html Sent from the PostgreSQL - docs mailing list archive at Nabble.com.
On Tue, Nov 12, 2013 at 05:35:23PM -0800, David Johnston wrote: > Bruce Momjian wrote > > On Tue, Nov 12, 2013 at 03:36:01PM -0800, David Johnston wrote: > >> Bruce Momjian wrote > >> > We just want to get across the MVCC concept in the intro --- we cover > >> > the snapshots later in the document. > >> > >> I just think we're being too vague here; and we are covering them in the > >> intro with the use of "some point in the past". > >> > >> IMO, the main point regarding MVCC is that every change in the system > >> creates a new record and causes a prior record to be invalidated at a > >> point-in-time. The combination of these two things increases concurrency > >> since you can create new records while people are still using the old > >> ones. > >> One consequence, though, is that it is necessary for the user to decide > >> at > >> what point in the timeline they want to view the database. > >> > >> Does this sound right? > > > > I still do not see how this fits appropriately in the introduction. > > The concept or the actual wording? > > The intended question was whether my understanding (and simplification) of > the concept is correct. > > My specific wording is incoherent mostly because it really belongs to a > larger corpus that currently exists only in my head. Oh, OK, it sounds fine. The user really doesn't choose what timeline to see --- rather, it is the current xid at the time they take their snapshot and other running xids that controls that. You can control your transaction isolation level, but that only controls how often you take snapshots. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Bruce Momjian wrote > Oh, OK, it sounds fine. The user really doesn't choose what timeline to > see --- rather, it is the current xid at the time they take their > snapshot and other running xids that controls that. You can control > your transaction isolation level, but that only controls how often you > take snapshots. ^ This kind of makes my point. You've described perfectly the mechanics of the system but from a mostly black-box perspective a user-decision (choosing the isolation level) directly impacts which point-in-time (xid) is chosen for the snapshot behind a particular SQL statement. The fact they can only choose between two pre-defined and relative points-in-time is a detail to explain later but the fact is they are required to make such a choice (one is provided by default but still one chooses - even if through ignorance - to use the default) as an outcome of MVCC should be included - in some form - in the introduction. In the current, proposed, and my revisions it is indeed covered but to various degrees of detail and low/high level focus. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/MVCC-snapshot-timing-tp5777759p5778039.html Sent from the PostgreSQL - docs mailing list archive at Nabble.com.
On Mon, Nov 11, 2013 at 09:46:09PM -0500, Bruce Momjian wrote: > On Mon, Nov 11, 2013 at 09:27:15PM -0500, Tom Lane wrote: > > Bruce Momjian <bruce@momjian.us> writes: > > > On Mon, Nov 11, 2013 at 08:59:35PM -0500, Tom Lane wrote: > > >> 'Statement' might work. > > > > > OK, updated patch attached. Is "statement" too vague here? SQL > > > statement? query? > > > > "SQL statement" might be a good idea in the first sentence, but > > I don't think you need to repeat it in the second. > > > > What's bothering me about this wording is that you're talking about > > statements and then suddenly reference transactions (as being "those > > other things messing with your data"). This seems weirdly asymmetric, > > since after all you could equally well be the one messing with their > > data. > > Yes, that bugged me too, but then I realized that you only see the > changes from a transaction when it completes, not from each statement, > e.g. you can never see changes between statements of a multi-statement > transaction. > > I used "SQL statement" in the updated, attached patch. Applied. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +