Обсуждение: Potential G2-item cycles under serializable isolation

Поиск

Список

Период

Сортировка

Potential G2-item cycles under serializable isolation

От

Kyle Kingsbury

Дата:

01 июня 2020 г., 05:24:57

Hello everyone!

First off, I'm sorry for *gestures vaguely* all of this. Second, I think I may
have found a serializability violation in Postgres 12.3, involving
anti-dependency cycles around row insertion.

For background, my name is Kyle Kingsbury, and I test distributed database
safety properties (https://jepsen.io). I started looking at Stolon + PostgreSQL
this week, encountered this behavior, and managed to narrow it down to a single
Postgres node without Stolon at all. Normally I test with a variety of faults
(network partitions, crashes, etc.), but this behavior occurs in healthy
processes without any faults.

This test uses the Jepsen testing library (https://github.com/jepsen-io/jepsen)
and the Elle isolation checker (https://github.com/jepsen-io/elle). If you're
wondering "why would you ever do transactions like this", the Elle paper might
provide some helpful background: https://arxiv.org/abs/2003.10554. We install
Postgres 12.3-1.pgdg100+1 on a Debian 10 node, using the official Postgres
repository at http://apt.postgresql.org/pub/repos/apt/. Each client uses its own
JDBC connection, on a single thread, in a single JVM process. We use the JDBC
postgres driver (org.postgresql/postgresql 42.2.12). The JDK is 1.8.0_40-b25.

Logically, the test performs randomly generated transactions over a set of lists
identified by integer keys. Each operation is either a read, which returns the
current value of the list for a given key, or an append, which adds a unique
element to the end of the list for a given key. In Postgres, we store these
objects in tables like so:

create table if not exists txn0 (id int not null primary key,
sk int not null,
val text)

id is the key, and text stores comma-separated elements. sk is a secondary key,
which is unused here. We create three tables like this (txn0, txn1, txn2).
Records are striped across tables by hashing their key.

We set the session transaction isolation level immediately after opening every
connection:

SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL SERIALIZABLE

... and request SERIALIZABLE for each JDBC transaction as well.

Our reads are of the form:

select (val) from txn0 where id = $1

And our writes are of the form:

insert into txn1 as t (id, sk, val) values ($1, $2, $3) on conflict (id) do
update set val = CONCAT(t.val, ',', $4) where t.id = $5

where $1 and $5 are the key, and $2, $3, and $4 are the element we'd like to
append to the list.

You can try these transactions for yourself using Jepsen f47eb25. You'll need a
Jepsen environment; see
https://github.com/jepsen-io/jepsen#setting-up-a-jepsen-environment for details.

cd jepsen/stolon

lein run test-all -w append --nemesis none --max-writes-per-key 8 --node n1
--just-postgres --concurrency 50 -r 1000

Which typically produces, after about a minute, anomalies like the following:

G2-item #1
Let:
T1 = {:type :ok, :f :txn, :value [[:r 7 [1]] [:append 12 1]], :time 95024280,
:process 5, :index 50}
T2 = {:type :ok, :f :txn, :value [[:append 7 2] [:r 14 nil] [:append 14 1]
[:r 12 nil]], :time 98700211, :process 6, :index 70}

Then:
- T1 < T2, because T1 did not observe T2's append of 2 to 7.
- However, T2 < T1, because T2 observed the initial (nil) state of 12, which
T1 created by appending 1: a contradiction!

A dependency graph of this anomaly is attached to this email: lines marked `rw`
indicate read-write anti-dependencies between specific operations in each
transaction. Because there are multiple rw edges in this graph, it suggests the
presence of G2-item. It is also possible, of course, that worse anomalies
happened (e.g. aborted read) which caused us to incorrectly infer this causal
graph, but I suspect this is not the case.

You can find a full copy of this particular test run, including a history of
every transaction, Postgres logs, and a pcap file containing all client-server
interactions, at
http://jepsen.io.s3.amazonaws.com/analyses/postgresql-12.3/20200531T215019.000-0400.zip

If you'd like to look at the test code, see
https://github.com/jepsen-io/jepsen/tree/f47eb25ab32529a7b66f1dfdd3b5ef2fc84ed778/stolon/src/jepsen.
Specifically, setup code is here:

https://github.com/jepsen-io/jepsen/blob/f47eb25ab32529a7b66f1dfdd3b5ef2fc84ed778/stolon/src/jepsen/stolon/db.clj#L200-L233

... and the workload responsible for constructing and submitting transactions is
here:

https://github.com/jepsen-io/jepsen/blob/f47eb25ab32529a7b66f1dfdd3b5ef2fc84ed778/stolon/src/jepsen/stolon/append.clj#L31-L107

These anomalies appear limited to G2-item: I haven't seen G-single (read skew),
cyclic information flow, aborted reads, dirty writes, etc. It also looks as if
every anomaly involves a *nil* read, which suggests (and I know the bug report
guidelines say not to speculate, but experience suggests this might be helpful)
that there is something special about row insertion. In TiDB, for instance, we
found that G2-item anomalies with `select ... for update` was linked to the fact
that TiDB's lock manager couldn't lock keys which hadn't been created yet
(https://jepsen.io/analyses/tidb-2.1.7#select-for-update). I don't know anything
about Postgres' internals, but I hope this is of some use!

Is this... known behavior? Unexpected? Are there configuration flags or client
settings I should double-check? I know this is all a bit much, so I'm happy to
answer any questions you might have. :-)

Sincerely,

--Kyle

Вложения

G2-item.png

Re: Potential G2-item cycles under serializable isolation

От

Peter Geoghegan

Дата:

01 июня 2020 г., 06:04:14

Hi Kyle,

On Sun, May 31, 2020 at 7:25 PM Kyle Kingsbury <aphyr@jepsen.io> wrote:
> Our reads are of the form:
>
>    select (val) from txn0 where id = $1
>
> And our writes are of the form:
>
>    insert into txn1 as t (id, sk, val) values ($1, $2, $3) on conflict (id) do
> update set val = CONCAT(t.val, ',', $4) where t.id = $5

ON CONFLICT DO UPDATE has some novel behaviors in READ COMMITTED mode,
but they're not supposed to affect SERIALIZABLE mode.

> where $1 and $5 are the key, and $2, $3, and $4 are the element we'd like to
> append to the list.
>
> You can try these transactions for yourself using Jepsen f47eb25. You'll need a
> Jepsen environment; see
> https://github.com/jepsen-io/jepsen#setting-up-a-jepsen-environment for details.
>
>    cd jepsen/stolon
>
>    lein run test-all -w append --nemesis none --max-writes-per-key 8 --node n1
> --just-postgres --concurrency 50 -r 1000

We generally like to produce tests for SSI, ON CONFLICT DO UPDATE, and
anything else involving concurrent behavior using something called
isolation tester:

https://github.com/postgres/postgres/tree/master/src/test/isolation

We may end up writing an isolation test for the issue you reported as
part of an eventual fix. You might find it helpful to review some of
the existing tests.

> Which typically produces, after about a minute, anomalies like the following:
>
> G2-item #1
> Let:
>    T1 = {:type :ok, :f :txn, :value [[:r 7 [1]] [:append 12 1]], :time 95024280,
> :process 5, :index 50}
>    T2 = {:type :ok, :f :txn, :value [[:append 7 2] [:r 14 nil] [:append 14 1]
> [:r 12 nil]], :time 98700211, :process 6, :index 70}
>
> Then:
>    - T1 < T2, because T1 did not observe T2's append of 2 to 7.
>    - However, T2 < T1, because T2 observed the initial (nil) state of 12, which
> T1 created by appending 1: a contradiction!

Could you test Postgres 9.5? It would be nice to determine if this is
a new issue, or a regression.

Thanks
-- 
Peter Geoghegan

Re: Potential G2-item cycles under serializable isolation

От

Kyle Kingsbury

Дата:

01 июня 2020 г., 06:37:37

On 5/31/20 11:04 PM, Peter Geoghegan wrote:
> We generally like to produce tests for SSI, ON CONFLICT DO UPDATE, and
> anything else involving concurrent behavior using something called isolation 
> tester: https://github.com/postgres/postgres/tree/master/src/test/isolation We 
> may end up writing an isolation test for the issue you reported as part of an 
> eventual fix. You might find it helpful to review some of the existing tests.

Ah, wonderful! I don't exactly know how to plug Elle's history analysis into 
this, but I think it... should be possible to write down some special cases 
based on the histories I've seen.

> Could you test Postgres 9.5? It would be nice to determine if this is
> a new issue, or a regression.

I'll look into that tomorrow morning! :)

I, uh, backed off to running these tests at read committed (which, uh... should 
be SI, right?) and found what appear to be scads of SI violations, including 
read skew and even *internal* consistency anomalies. Read-only transactions 
can... apparently... see changing values of a record? Here's a single 
transaction which read key 21, got [1], then read key 21 again, and saw [1 2 3]:

   [[:r 21 [1]] [:r 20 [1 2]] [:r 20 [1 2]] [:r 21 [1 2 3]]]

See 
http://jepsen.io.s3.amazonaws.com/analyses/postgresql-12.3/20200531T223558.000-0400.zip 
-- jepsen.log from 22:36:09,907 to 22:36:09,909:

   2020-05-31 22:36:09,907{GMT}    INFO    [jepsen worker 6] jepsen.util: 6
   :invoke :txn    [[:r 21 nil] [:r 20 nil] [:r 20 nil] [:r 21 nil]]

   ...

   2020-05-31 22:36:09,909{GMT}    INFO    [jepsen worker 6] jepsen.util: 6
   :ok     :txn    [[:r 21 [1]] [:r 20 [1 2]] [:r 20 [1 2]] [:r 21 [1 2 3]]]

You can fire up wireshark and point it at the pcap file in n1/ to 
double-check--try `tcp.stream eq 4`. The BEGIN statement for this transaction is 
at 22:36:09.908115. There are a bunch more anomalies called out in analysis.edn, 
if it's helpful.

This looks so weird that I assume I've *got* to be doing it wrong, but trawling 
through the source code and pcap trace, I can't see where the mistake is. Maybe 
I'll have fresher eyes in the morning. :)

Sincerely,

--Kyle

Re: Potential G2-item cycles under serializable isolation

От

Peter Geoghegan

Дата:

01 июня 2020 г., 06:56:03

On Sun, May 31, 2020 at 8:37 PM Kyle Kingsbury <aphyr@jepsen.io> wrote:
> This looks so weird that I assume I've *got* to be doing it wrong, but trawling
> through the source code and pcap trace, I can't see where the mistake is. Maybe
> I'll have fresher eyes in the morning. :)

READ COMMITTED starts each command within a transaction with its own
snapshot, much like Oracle:

https://www.postgresql.org/docs/devel/transaction-iso.html

There cannot be serialization errors with READ COMMITTED mode, and in
general it is a lot more permissive. Probably to the point where it
isn't sensible to test with Jepsen at all.

It would make sense for you to test REPEATABLE READ isolation level
separately, though. It implements snapshot isolation without the added
overhead of the mechanisms that prevent (or are supposed to prevent)
serialization anomalies.

-- 
Peter Geoghegan

Re: Potential G2-item cycles under serializable isolation

От

Kyle Kingsbury

Дата:

01 июня 2020 г., 07:05:19

Oh! Gosh, yes, that DOES explain it. Somehow I had it in my head that both RU and RC mapped to SI, and RR & Serializable mapped to SSI. That's the case in YugabyteDB, but not here!

I'll also see about getting a version of these tests that doesn't involve ON CONFLICT, in case that's relevant.

--Kyle

On Sun, May 31, 2020, 23:56 Peter Geoghegan <pg@bowt.ie> wrote:

On Sun, May 31, 2020 at 8:37 PM Kyle Kingsbury <aphyr@jepsen.io> wrote:
> This looks so weird that I assume I've *got* to be doing it wrong, but trawling
> through the source code and pcap trace, I can't see where the mistake is. Maybe
> I'll have fresher eyes in the morning. :)

READ COMMITTED starts each command within a transaction with its own
snapshot, much like Oracle:

https://www.postgresql.org/docs/devel/transaction-iso.html

There cannot be serialization errors with READ COMMITTED mode, and in
general it is a lot more permissive. Probably to the point where it
isn't sensible to test with Jepsen at all.

It would make sense for you to test REPEATABLE READ isolation level
separately, though. It implements snapshot isolation without the added
overhead of the mechanisms that prevent (or are supposed to prevent)
serialization anomalies.

--
Peter Geoghegan

Re: Potential G2-item cycles under serializable isolation

От

Thomas Munro

Дата:

01 июня 2020 г., 07:19:16

On Mon, Jun 1, 2020 at 4:05 PM Kyle Kingsbury <aphyr@jepsen.io> wrote:
> I'll also see about getting a version of these tests that doesn't involve ON CONFLICT, in case that's relevant.

That should be interesting.  I'm wondering if the read of the old
value in the UPDATE case is not done with appropriate predicate locks,
so we miss a graph edge?

Re: Potential G2-item cycles under serializable isolation

От

Peter Geoghegan

Дата:

01 июня 2020 г., 07:20:30

Hi Kyle,

On Sun, May 31, 2020 at 9:05 PM Kyle Kingsbury <aphyr@jepsen.io> wrote:
> Oh! Gosh, yes, that DOES explain it. Somehow I had it in my head that both RU and RC mapped to SI, and RR &
Serializablemapped to SSI. That's the case in YugabyteDB, but not here!

It's confusing because the standard only requires that the isolation
levels avoid certain read phenomena, but implementations are free to
go above and beyond. For example, you can ask Postgres for  READ
UNCOMMITTED, but you'll get READ COMMITTED. (So RC, RR, and SI each
provide distinct behavior.)

> I'll also see about getting a version of these tests that doesn't involve ON CONFLICT, in case that's relevant.

That would be great. It could easily be relevant.

Thanks
-- 
Peter Geoghegan

Re: Potential G2-item cycles under serializable isolation

От

Kyle Kingsbury

Дата:

01 июня 2020 г., 07:29:57

On 6/1/20 12:20 AM, Peter Geoghegan wrote:
> It's confusing because the standard only requires that the isolation > levels avoid certain read phenomena, but implementations are free to > go above and beyond. For example, you can ask Postgres for READ > UNCOMMITTED, but you'll get READ COMMITTED. (So RC, RR, and SI each > provide distinct behavior.)
Right, right. I was thinking "Oh, repeatable read is incomparable with snapshot, so it must be that read committed is snapshot, and repeatable is serializable." This way around, Postgres "repeatable read" actually gives you behavior that violates repeatable read! But I understand the pragmatic rationale of "we need 3 levels, and this is the closest mapping we could get to the ANSI SQL names". :)

--Kyle

Re: Potential G2-item cycles under serializable isolation

От

Peter Geoghegan

Дата:

01 июня 2020 г., 07:47:13

On Sun, May 31, 2020 at 9:29 PM Kyle Kingsbury <aphyr@jepsen.io> wrote:
> Right, right. I was thinking "Oh, repeatable read is incomparable with snapshot, so it must be that read committed is
snapshot,and repeatable is serializable."
 

Right.

We used to call snapshot isolation (i.e., the behavior we now provide
under RR mode) SERIALIZABLE, which was wrong (still is in Oracle).
This was how Postgres worked before the SSI feature was added back in
2011. SSI became the new SERIALIZABLE at that time. Ordinary snapshot
isolation was "demoted" to being called RR mode.

-- 
Peter Geoghegan

Re: Potential G2-item cycles under serializable isolation

От

Kyle Kingsbury

Дата:

02 июня 2020 г., 19:18:52

OK! So I've designed a variant of this test which doesn't use ON CONFLICT.
Instead, we do a homebrew sort of upsert: we try to update the row in place by
primary key; if we see zero records updated, we insert a new row, and if *that*
fails due to the primary key conflict, we try the update again, under the theory
that since we now know a copy of the row exists, we should be able to update it.

https://github.com/jepsen-io/jepsen/blob/f47eb25ab32529a7b66f1dfdd3b5ef2fc84ed778/stolon/src/jepsen/stolon/append.clj#L31-L108

This isn't bulletproof; even under SERIALIZABLE, Postgres will allow a
transaction to fail to update any rows, then fail to insert due to a primary key
conflict--but because the primary key conflict forces a transaction abort (or
rolling back to a savepoint, which negates the conflict), I think it's still
serializable. The isolation docs explain this, I think.

Unfortunately, I'm still seeing tons of G2-item cycles. Whatever this is, it's
not related to ON CONFLICT.

As a side note, I can confirm that Postgres repeatable read is definitely weaker
than repeatable read--at least as formalized by Adya et al. This makes sense if
you understand repeatable read to mean SI (after all, the SI paper says they're
incomparable!), but the Postgres docs seem to imply Postgres is strictly
*stronger* than the ANSI SQL spec, and I'm not convinced that's entirely true:

https://www.postgresql.org/docs/12/transaction-iso.html

> The table also shows that PostgreSQL's Repeatable Read implementation does
not allow phantom reads. Stricter behavior is permitted by the SQL standard...

> This is a stronger guarantee than is required by the SQL standard for this
isolation level...

I get the sense that the Postgres docs have already diverged from the ANSI SQL
standard a bit, since SQL 92 only defines three anomalies (P1, P2, P3), and
Postgres defines a fourth: "serialization anomaly".

This results in a sort of weird situation: on the one hand, it's true: we don't
(I think) observe P1 or P2 under Postgres Repeatable Read. On the other hand,
SQL 92 says that the difference between repeatable read and serializable is
*exactly* the prohibition of P3 ("phantom"). Even though all our operations are
performed by primary key, we can observe a distinct difference between Postgres
repeatable read and Postgres serializable.

I can see two ways to reconcile this--one being that Postgres chose the anomaly
interpretation of the SQL spec, and the result is... maybe internally
inconsistent? Or perhaps one of the operations in this workload actually *is* a
predicate operation--maybe by dint of relying on a uniqueness constraint?

I'm surprised the transaction isolation docs don't say something like "Postgres
repeatable read level means snapshot isolation, which is incomparable with SQL
repeatable read." Then it'd be obvious that Postgres repeatable read exhibits
G2-item! Weirdly, the term "snapshot isolation" doesn't show up on the page at all!

--Kyle

Re: Potential G2-item cycles under serializable isolation

От

Peter Geoghegan

Дата:

02 июня 2020 г., 19:50:47

On Tue, Jun 2, 2020 at 9:19 AM Kyle Kingsbury <aphyr@jepsen.io> wrote:
> OK! So I've designed a variant of this test which doesn't use ON CONFLICT.
> Instead, we do a homebrew sort of upsert: we try to update the row in place by
> primary key; if we see zero records updated, we insert a new row, and if *that*
> fails due to the primary key conflict, we try the update again, under the theory
> that since we now know a copy of the row exists, we should be able to update it.
>
>
https://github.com/jepsen-io/jepsen/blob/f47eb25ab32529a7b66f1dfdd3b5ef2fc84ed778/stolon/src/jepsen/stolon/append.clj#L31-L108

Thanks, but I think that this link is wrong, since you're still using
ON CONFLICT. Correct me if I'm wrong, I believe that you intended to
link to this:

https://github.com/jepsen-io/jepsen/commit/ac4956871c8227d57d11a665e43c3d68bb7d7ec1#diff-0f5b390b5cdbd8650cf39e3c3f6f365fR31-R65

> Unfortunately, I'm still seeing tons of G2-item cycles. Whatever this is, it's
> not related to ON CONFLICT.

Good to have that confirmed. Obviously we'll need to do more analysis
of the exact circumstances of the anomaly. That might take a while.

> I get the sense that the Postgres docs have already diverged from the ANSI SQL
> standard a bit, since SQL 92 only defines three anomalies (P1, P2, P3), and
> Postgres defines a fourth: "serialization anomaly".

> I can see two ways to reconcile this--one being that Postgres chose the anomaly
> interpretation of the SQL spec, and the result is... maybe internally
> inconsistent? Or perhaps one of the operations in this workload actually *is* a
> predicate operation--maybe by dint of relying on a uniqueness constraint?

You might find that "A Critique of ANSI SQL Isolation Levels" provides
useful background information:

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-95-51.pdf

One section in particular may be of interest:

"ANSI SQL intended to define REPEATABLE READ isolation to exclude all
anomalies except Phantom. The anomaly definition of Table 1 does not
achieve this goal, but the locking definition of Table 2 does. ANSI’s
choice of the term Repeatable Read is doubly unfortunate: (1)
repeatable reads do not give repeatable results, and (2) the industry
had already used the term to mean exactly that: repeatable reads mean
serializable in several products. We recommend that another term be
found for this."

--
Peter Geoghegan

Re: Potential G2-item cycles under serializable isolation

От

Kyle Kingsbury

Дата:

02 июня 2020 г., 19:58:01

On 6/2/20 12:50 PM, Peter Geoghegan wrote:
> On Tue, Jun 2, 2020 at 9:19 AM Kyle Kingsbury <aphyr@jepsen.io> wrote:
>> OK! So I've designed a variant of this test which doesn't use ON CONFLICT.
>> Instead, we do a homebrew sort of upsert: we try to update the row in place by
>> primary key; if we see zero records updated, we insert a new row, and if *that*
>> fails due to the primary key conflict, we try the update again, under the theory
>> that since we now know a copy of the row exists, we should be able to update it.
>>
>>
https://github.com/jepsen-io/jepsen/blob/f47eb25ab32529a7b66f1dfdd3b5ef2fc84ed778/stolon/src/jepsen/stolon/append.clj#L31-L108
> Thanks, but I think that this link is wrong, since you're still using
> ON CONFLICT. Correct me if I'm wrong, I believe that you intended to
> link to this:
Whoops, yes, that's correct. :-)
> Good to have that confirmed. Obviously we'll need to do more analysis
> of the exact circumstances of the anomaly. That might take a while.
No worries! Is it still important that I check this behavior with 9.x as well?
> You might find that "A Critique of ANSI SQL Isolation Levels" provides
> useful background information:
>
> https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-95-51.pdf

This is one of my favorite papers, and precisely why I asked!

> One section in particular may be of interest:
>
> "ANSI SQL intended to define REPEATABLE READ isolation to exclude all
> anomalies except Phantom. The anomaly definition of Table 1 does not
> achieve this goal, but the locking definition of Table 2 does. ANSI’s
> choice of the term Repeatable Read is doubly unfortunate: (1)
> repeatable reads do not give repeatable results, and (2) the industry
> had already used the term to mean exactly that: repeatable reads mean
> serializable in several products. We recommend that another term be
> found for this."

So... just to confirm, Postgres *did* go along with the anomaly interpretation, 
rather than the strict interpretation? It's just weird cuz, like... the Postgres 
docs act like SI is stronger than RR, but Berenson et al are pretty clear that's 
not how they see it!

--Kyle

Re: Potential G2-item cycles under serializable isolation

От

Peter Geoghegan

Дата:

02 июня 2020 г., 20:24:51

On Tue, Jun 2, 2020 at 9:58 AM Kyle Kingsbury <aphyr@jepsen.io> wrote:
> No worries! Is it still important that I check this behavior with 9.x as well?

I asked about 9.5 because I think that it's possible (though not
particularly likely) that some of the B-Tree indexing work that went
into Postgres 12 is a factor (predicate locks can be taken against
individual leaf pages, and the way that that works changed slightly).
SSI was verified using extensive stress tests during its initial
development (by Dan Ports), so it's not inconceivable that there was
some kind of subtle regression since that time. That's just a guess,
but it would be nice to eliminate it as a theory.

I'd be surprised if your existing test cases needed any adjustment. My
guess is that this won't take long.

> So... just to confirm, Postgres *did* go along with the anomaly interpretation,
> rather than the strict interpretation? It's just weird cuz, like... the Postgres
> docs act like SI is stronger than RR, but Berenson et al are pretty clear that's
> not how they see it!

I wasn't involved in the decision making process that led to that, and
it's possible that those that were weren't even aware of the paper. It
was necessary to shoehorn SSI/true serializability into the existing
isolation levels for compatibility reasons, and those were always
based on the anomaly interpretation.

-- 
Peter Geoghegan

Re: Potential G2-item cycles under serializable isolation

От

Thomas Munro

Дата:

03 июня 2020 г., 02:13:02

On Wed, Jun 3, 2020 at 4:19 AM Kyle Kingsbury <aphyr@jepsen.io> wrote:
>
https://github.com/jepsen-io/jepsen/blob/f47eb25ab32529a7b66f1dfdd3b5ef2fc84ed778/stolon/src/jepsen/stolon/append.clj#L31-L108

I'm looking into this, but just by the way, you said:

; OK, so first worrying thing: why can this throw duplicate key errors if
; it's executed with "if not exists"?
(try
  (j/execute! conn
    [(str "create table if not exists " (table-name i)
    ....

That's (unfortunately) a known problem under concurrency.  It'd be
very nice to fix that, but it's an independent problem relating to DDL
(not just tables, and not just IF EXISTS DDL, but anything modifying
catalogues can race in this way and miss out on "nice" error messages
or the IF EXISTS no-op).  Here's a good short summary:

https://www.postgresql.org/message-id/CA%2BTgmoZAdYVtwBfp1FL2sMZbiHCWT4UPrzRLNnX1Nb30Ku3-gg%40mail.gmail.com

Re: Potential G2-item cycles under serializable isolation

От

Peter Geoghegan

Дата:

03 июня 2020 г., 02:17:42

On Tue, Jun 2, 2020 at 10:24 AM Peter Geoghegan <pg@bowt.ie> wrote:
> I'd be surprised if your existing test cases needed any adjustment. My
> guess is that this won't take long.

You said it takes about a minute in your opening e-mail; how
consistent is this? I note from the Postgres logs you provided that
Postgres starts accepting connections at 2020-05-31 18:50:27.580, and
shows its last log message at  2020-05-31 18:51:29.781 PDT. So it's
suspiciously close to *exactly* one minute. Note that
autovacuum_naptime has as its default '1min'. Your workload probably
generates a lot of index bloat, which may tend to cause autovacuum to
want to delete whole B-Tree leaf pages, which impacts predicate
locking.

Could you check what happens when you reduce autovacuum_naptime to
(say) '5s' in postgresql.conf? Does that change make the G2-item cycle
issue manifest itself earlier? And can you discern any pattern like
that yourself?

It seems kind of inconvenient to run Jepsen -- I suppose I could use
Docker or something like that, but I don't have experience with it.
What do you think is the simplest workflow for somebody that just
wants to recreate your result on a Debian system?

-- 
Peter Geoghegan

Re: Potential G2-item cycles under serializable isolation

От

"Daniel Verite"

Дата:

03 июня 2020 г., 14:51:19

    Kyle Kingsbury wrote:

> SQL 92 says that the difference between repeatable read and serializable is
> *exactly* the prohibition of P3 ("phantom").

You're probably refering to Table-9 in SQL-92, showing
that P3 can happen under Repeatable Read and cannot happen
under Serializable.

But it doesn't say that the *only* difference between RR and Serializable
is avoiding P3. When defining P1, P2, P3, it doesn't explicitly say
that these are the only anomalies that can arise from concurrency.
The PG doc refers to the other cases as "serialization anomalies".

Compared to the manual, https://wiki.postgresql.org/wiki/Serializable
has more in-depth explanations on the specifics of PG serializable
implementation.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: Potential G2-item cycles under serializable isolation

От

Kyle Kingsbury

Дата:

03 июня 2020 г., 15:48:56

On 6/3/20 7:51 AM, Daniel Verite wrote:
> But it doesn't say that the *only* difference between RR and > Serializable is avoiding P3. When defining P1, P2, P3, it doesn't > explicitly say that these are the only anomalies that can arise from > concurrency.
Ah, yes, now I understand. I've been working with Berenson/Adya too long--I've taken their conclusions for granted. You're right: the spec comes awfully close, but doesn't say these are the *only* phenomena possible:

> The following phenomena are possible: [description of P1, P2, and > P3] > The isolation levels are different with respect to phenomena P1, P2, > and P3.
Berenson et al., in "A Critique of ANSI SQL Isolation Levels", identified this as a key challenge in interpreting the spec:

> The isolation levels are defined by the phenomena they are forbidden > to experience.
But also:

> The prominence of the table compared to this extra proviso leads to > a common misconception that disallowing the three phenomena implies > serializability.
Which led Berenson et al. to claim that the SQL spec's definitions are a.) ambiguous and b.) incomplete. They argue that the "strict" (or, in Adya, the "anomaly") interpretation is incorrect, and construct a "broad" (Adya: "preventative") interpretation in terms of a new phenomenon (P0: dirty writes), and more general forms of P1, P2, and P3. The bulk of section 3 demonstrates that the anomaly interpretations lead to weird results, and that what the ANSI spec *intended* was the preventative interpretations.

> Strict interpretations A1, A2, and A3 have unintended weaknesses. > The correct interpretations are the Broad ones. We assume in what > follows that ANSI meant to define P1, P2, and P3.

For a more concise overview of this, see Adya's thesis, section 2.3. Adya goes on to show that the preventative interpretation forbids some serializable histories, and redefines the ANSI levels again in terms of generalized, MVCC-friendly phenomena G0, G1, and G2. When I say Postgres violates repeatable read, I mean in the sense that it allows G2-item, which is prevented by Adya PL-2.99: repeatable read. See Adya 3.2.4 for his formulation of repeatable read, which *does* differ from serializability only in terms of predicate-related phenomena.

http://pmg.csail.mit.edu/papers/adya-phd.pdf

For a compact version of this argument and formalism, see Adya, Liskov, & O'Neil's "Generalized Isolation Level Definitions": http://bnrg.cs.berkeley.edu/~adj/cs262/papers/icde00.pdf.

That's why I was asking a few days ago whether Postgres had adopted the anomaly interpretation--Since Postgres implements SI, it made sense that y'all would have followed Berenson et al. in using the preventative interpretation, or Adya in using generalized definitions. But... in the Postgres docs, and your comments in this thread, it seems like y'all are going with the anomaly interpretation instead. That'd explain why there's this extra "serialization anomaly" difference between RR and serializable. That'd also explain why the Postgres docs imply RR << SI, even though Berenson et al. and Adya both say RR >><< SI.

Is... that the right way to understand things?
--Kyle

Re: Potential G2-item cycles under serializable isolation

От

Kyle Kingsbury

Дата:

03 июня 2020 г., 17:20:22

On 6/2/20 7:17 PM, Peter Geoghegan wrote:
> On Tue, Jun 2, 2020 at 10:24 AM Peter Geoghegan <pg@bowt.ie> wrote:
>> I'd be surprised if your existing test cases needed any adjustment. My
>> guess is that this won't take long.
> You said it takes about a minute in your opening e-mail; how
> consistent is this? I note from the Postgres logs you provided that
> Postgres starts accepting connections at 2020-05-31 18:50:27.580, and
> shows its last log message at  2020-05-31 18:51:29.781 PDT. So it's
> suspiciously close to *exactly* one minute.
I set the test duration to 60 seconds for those runs, but it'll break in as 
little as 10. That's less of a sure thing though. :)
>   Note that
> autovacuum_naptime has as its default '1min'. Your workload probably
> generates a lot of index bloat, which may tend to cause autovacuum to
> want to delete whole B-Tree leaf pages, which impacts predicate
> locking.
With the default (debian) postgresql.conf, which has autovacuum_naptime 
commented out (default 1min), I see anomalies at (just picking a random recent 
test) 8.16 seconds, 9.76 seconds, and 19.6 seconds.  Another run: 28.0 seconds, 
32.3 seconds.
> Could you check what happens when you reduce autovacuum_naptime to
> (say) '5s' in postgresql.conf? Does that change make the G2-item cycle
> issue manifest itself earlier? And can you discern any pattern like
> that yourself?
It doesn't look like setting autovacuum_naptime makes a difference.
> It seems kind of inconvenient to run Jepsen -- I suppose I could use
> Docker or something like that, but I don't have experience with it.
> What do you think is the simplest workflow for somebody that just
> wants to recreate your result on a Debian system?

I am really sorry about that--I know it's not convenient. Jepsen's built for 
testing whole distributed systems, and is probably a bit overkill for testing a 
single Postgres process. I don't have any experience with Docker, but I think 
Docker Compose might be a good option for a single-node system? I apologize--I 
*just* started writing this test against Debian Buster a few days ago, and the 
existing AWS Marketplace and Docker Compose environments are still on Stretch, 
so on top of setting up a Jepsen environment you also gotta do a Debian upgrade. 
:'-O

I'll see about writing a version of the test that doesn't use any of the 
automation, so you can point it at a local postgres instance. Then all you'll 
need is lein and a jdk.

--Kyle

Re: Potential G2-item cycles under serializable isolation

От

Kyle Kingsbury

Дата:

03 июня 2020 г., 17:22:30

On 6/2/20 7:13 PM, Thomas Munro wrote:
> That's (unfortunately) a known problem under concurrency.  It'd be
> very nice to fix that, but it's an independent problem relating to DDL
> (not just tables, and not just IF EXISTS DDL, but anything modifying
> catalogues can race in this way and miss out on "nice" error messages
> or the IF EXISTS no-op).  Here's a good short summary:

Ah, yes, this does explain it, thank you! I was a bit concerned, because I know 
Postgres has a reputation for having transactional DDL. I guess this part of the 
API isn't. :)

--Kyle

Re: Potential G2-item cycles under serializable isolation

От

Kyle Kingsbury

Дата:

03 июня 2020 г., 19:44:42

On 6/3/20 10:20 AM, Kyle Kingsbury wrote:
> On 6/2/20 7:17 PM, Peter Geoghegan wrote: >> It seems kind of inconvenient to run Jepsen -- I suppose I could >> use Docker or something like that, but I don't have experience >> with it. What do you think is the simplest workflow for somebody >> that just wants to recreate your result on a Debian system? > > I'll see about writing a version of the test that doesn't use any of > the automation, so you can point it at a local postgres instance. > Then all you'll need is lein and a jdk.
OK, I think we're all set. With Jepsen 0ec25ec3, you should be able to run:

cd stolon;

lein run test-all -w append --max-writes-per-key 4 --concurrency 50 -r 500 --isolation serializable --time-limit 60 --nemesis none --existing-postgres --node localhost --no-ssh --postgres-user jepsen --postgres-password pw

... and it'll connect to an already-running instance of Postgres on localhost (or whatever you want to connect to) using the provided username and password. It expects a DB with the same name as the user. You'll need a JDK (1.8 or higher), and Leiningen (https://leiningen.org/), which should be pretty easy to install. :)

> Could you test Postgres 9.5? It would be nice to determine if this > is a new issue, or a regression.

I can also confirm that Postgres 9.5.22, 10.13, and 11.8 all exhibit the same G2-item anomalies as 12.3. It looks like this issue has been here a while! (Either that, or... my transactions are malformed?).

Is there additional debugging data I can get you that'd be helpful? Or if you'd like, I can hop in an IRC or whatever kind of chat/video session you'd like to help you get these Jepsen tests running.
--Kyle

Re: Potential G2-item cycles under serializable isolation

От

Peter Geoghegan

Дата:

03 июня 2020 г., 23:08:01

Hi Kyle,

On Wed, Jun 3, 2020 at 9:44 AM Kyle Kingsbury <aphyr@jepsen.io> wrote:
> OK, I think we're all set. With Jepsen 0ec25ec3, you should be able to run:
>
> cd stolon;
>
> lein run test-all -w append --max-writes-per-key 4 --concurrency 50 -r 500 --isolation serializable --time-limit 60
--nemesisnone --existing-postgres --node localhost --no-ssh --postgres-user jepsen --postgres-password pw
 

I can almost get this to work now. Thank you for accommodating me here.

I still see this on Postgres 13:

***SNIP***
NFO [2020-06-03 13:00:07,120] jepsen worker 2 - jepsen.util 2 :invoke
:txn [[:r 8338 nil]]
INFO [2020-06-03 13:00:07,120] jepsen worker 2 - jepsen.util 2 :ok
:txn [[:r 8338 [1 2]]]
INFO [2020-06-03 13:00:07,123] jepsen worker 44 - jepsen.util 44
:invoke :txn [[:r 8337 nil] [:r 8336 nil]]
INFO [2020-06-03 13:00:07,125] jepsen worker 44 - jepsen.util 44 :ok
:txn [[:r 8337 [1 2 3]] [:r 8336 [1]]]
INFO [2020-06-03 13:00:07,150] jepsen test runner - jepsen.core Run
complete, writing
INFO [2020-06-03 13:00:08,056] jepsen test runner - jepsen.core Analyzing...
INFO [2020-06-03 13:00:13,670] clojure-agent-send-off-pool-46 -
elle.txn Timing out search for :G2-item in SCC of 346 transactions
INFO [2020-06-03 13:00:17,626] clojure-agent-send-off-pool-46 -
elle.txn Timing out search for :G1c in SCC of 3123 transactions
INFO [2020-06-03 13:00:18,841] clojure-agent-send-off-pool-46 -
elle.txn Timing out search for :G-single in SCC of 685 transactions
INFO [2020-06-03 13:00:20,614] clojure-agent-send-off-pool-46 -
elle.txn Timing out search for :G-single-process in SCC of 1505
transactions
INFO [2020-06-03 13:00:24,151] clojure-agent-send-off-pool-46 -
elle.txn Timing out search for :G-single-process in SCC of 756
transactions
INFO [2020-06-03 13:00:26,004] clojure-agent-send-off-pool-46 -
elle.txn Timing out search for :G-single-process in SCC of 289
transactions
INFO [2020-06-03 13:00:27,113] clojure-agent-send-off-pool-46 -
elle.txn Timing out search for :G-single in SCC of 767 transactions
INFO [2020-06-03 13:00:29,679] clojure-agent-send-off-pool-46 -
elle.txn Timing out search for :G1c-realtime in SCC of 383
transactions
INFO [2020-06-03 13:00:30,954] clojure-agent-send-off-pool-46 -
elle.txn Timing out search for :G-nonadjacent in SCC of 152
transactions
INFO [2020-06-03 13:00:34,478] clojure-agent-send-off-pool-22 -
elle.viz Skipping plot of 78703 bytes
INFO [2020-06-03 13:00:34,484] clojure-agent-send-off-pool-27 -
elle.viz Skipping plot of 96697 bytes
INFO [2020-06-03 13:00:34,650] clojure-agent-send-off-pool-45 -
elle.viz Skipping plot of 97250 bytes
INFO [2020-06-03 13:00:35,847] clojure-agent-send-off-pool-42 -
elle.viz Skipping plot of 145419 bytes
INFO [2020-06-03 13:00:35,882] clojure-agent-send-off-pool-22 -
elle.viz Skipping plot of 197214 bytes
INFO [2020-06-03 13:00:35,996] clojure-agent-send-off-pool-39 -
elle.viz Skipping plot of 409386 bytes
INFO [2020-06-03 13:00:36,166] clojure-agent-send-off-pool-52 -
elle.viz Skipping plot of 894916 bytes
INFO [2020-06-03 13:00:36,428] clojure-agent-send-off-pool-26 -
elle.viz Skipping plot of 427866 bytes
INFO [2020-06-03 13:00:36,735] clojure-agent-send-off-pool-68 -
elle.viz Skipping plot of 137646 bytes
INFO [2020-06-03 13:00:37,698] clojure-agent-send-off-pool-31 -
elle.viz Skipping plot of 205089 bytes
INFO [2020-06-03 13:00:38,251] clojure-agent-send-off-pool-31 -
elle.viz Skipping plot of 81743 bytes
INFO [2020-06-03 13:00:38,341] clojure-agent-send-off-pool-60 -
elle.viz Skipping plot of 222988 bytes
INFO [2020-06-03 13:00:38,998] clojure-agent-send-off-pool-83 -
elle.viz Skipping plot of 104619 bytes
INFO [2020-06-03 13:00:39,048] clojure-agent-send-off-pool-56 -
elle.viz Skipping plot of 191453 bytes
INFO [2020-06-03 13:00:40,142] clojure-agent-send-off-pool-20 -
elle.viz Skipping plot of 85355 bytes
INFO [2020-06-03 13:00:44,027] jepsen test runner - jepsen.core
Analysis complete


INFO [2020-06-03 13:03:04,832] jepsen results - jepsen.store Wrote
/home/pg/code/jepsen/stolon/store/stolon append S (S)
/20200603T125906.000-0700/results.edn
WARN [2020-06-03 13:03:04,836] main - jepsen.core Test crashed!
java.util.concurrent.ExecutionException: java.lang.AssertionError:
Assert failed: No transaction wrote 8119 1
t2
at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
at clojure.core$deref_future.invokeStatic(core.clj:2300)
at clojure.core$future_call$reify__8439.deref(core.clj:6974)
at clojure.core$deref.invokeStatic(core.clj:2320)
at clojure.core$deref.invoke(core.clj:2306)
***SNIP***

> Or if you'd like, I can hop in an IRC or whatever kind of chat/video session you'd like to help you get these Jepsen
testsrunning.
 

It would probably be easier to discuss this over chat or something.
I'll provide details off-list.

Thanks
-- 
Peter Geoghegan

Re: Potential G2-item cycles under serializable isolation

От

Peter Geoghegan

Дата:

04 июня 2020 г., 00:22:06

On Wed, Jun 3, 2020 at 1:08 PM Peter Geoghegan <pg@bowt.ie> wrote:
> I can almost get this to work now. Thank you for accommodating me here.

Actually, I think that this was a simple error on my part. The
Jepsen/Elle test recipe that Kyle provided now works locally for me.
The tests actually pass, too -- at least against Postgres 13.

I don't know why Kyle doesn't see the same result. I acknowledge that
there is likely a bug here that I have yet to reproduce. My guess is
that it has something to do with the configuration -- though I need to
test earlier Postgres versions, too (Postgres 12 is what Kyle worked
against). I'm using a tuned postgresql.conf, whereas Kyle probably
didn't change many of the defaults.

Here is what I got just now:

pg@bat:~/code/jepsen/stolon$ lein run test-all -w append
--max-writes-per-key 4 --concurrency 50 -r 500 --isolation
serializable --time-limit 60 --nemesis none --existing-postgres --node
localhost --no-ssh --postgres-user jepsen --postgres-password pw

*** SNIP ***

INFO [2020-06-03 14:09:47,978] jepsen worker 33 - jepsen.stolon.append
:insert [#:next.jdbc{:update-count 1}]
INFO [2020-06-03 14:09:47,979] jepsen worker 33 - jepsen.util 133 :ok
:txn [[:r 8408 nil] [:append 8369 1] [:r 8402 [1 2 3]]]
INFO [2020-06-03 14:09:47,980] jepsen worker 15 - jepsen.util 115
:invoke :txn [[:append 8412 3]]
INFO [2020-06-03 14:09:47,981] jepsen worker 15 - jepsen.stolon.append
:update #:next.jdbc{:update-count 1}
INFO [2020-06-03 14:09:47,981] jepsen worker 15 - jepsen.util 115 :ok
:txn [[:append 8412 3]]
INFO [2020-06-03 14:09:48,012] jepsen test runner - jepsen.core Run
complete, writing
INFO [2020-06-03 14:09:48,912] jepsen test runner - jepsen.core Analyzing...
INFO [2020-06-03 14:09:51,415] jepsen test runner - jepsen.core
Analysis complete
INFO [2020-06-03 14:09:51,439] jepsen results - jepsen.store Wrote
/home/pg/code/jepsen/stolon/store/stolon append S (S)
/20200603T140847.000-0700/results.edn
INFO [2020-06-03 14:09:52,245] jepsen test runner - jepsen.core {:perf
 {:latency-graph {:valid? true},
  :rate-graph {:valid? true},
  :valid? true},
 :clock {:valid? true},
 :stats
 {:valid? true,
  :count 30096,
  :ok-count 27050,
  :fail-count 2992,
  :info-count 54,
  :by-f
  {:txn
   {:valid? true,
    :count 30096,
    :ok-count 27050,
    :fail-count 2992,
    :info-count 54}}},
 :exceptions {:valid? true},
 :workload {:valid? true},
 :valid? true}


Everything looks good! ヽ(‘ー`)ノ



# Successful tests

store/stolon append S (S) /20200603T140847.000-0700

1 successes
0 unknown
0 crashed
0 failures

--
Peter Geoghegan

Re: Potential G2-item cycles under serializable isolation

От

Kyle Kingsbury

Дата:

04 июня 2020 г., 00:34:51

It looks like you're seeing a much higher txn success rate than I am--possibly due to your tuning? Might be worth adjusting --rate and/or --concurrency upwards until you see a ~50/50 ratio of successes to failures. This bug does seem linked to concurrency, so it's possible you're just not giving it enough concurrent transactions to hit the issue. Or maybe it's fixed in 13, or a tuning flag addresses it!

On Wed, Jun 3, 2020, 17:22 Peter Geoghegan <pg@bowt.ie> wrote:

On Wed, Jun 3, 2020 at 1:08 PM Peter Geoghegan <pg@bowt.ie> wrote:
> I can almost get this to work now. Thank you for accommodating me here.

Actually, I think that this was a simple error on my part. The
Jepsen/Elle test recipe that Kyle provided now works locally for me.
The tests actually pass, too -- at least against Postgres 13.

I don't know why Kyle doesn't see the same result. I acknowledge that
there is likely a bug here that I have yet to reproduce. My guess is
that it has something to do with the configuration -- though I need to
test earlier Postgres versions, too (Postgres 12 is what Kyle worked
against). I'm using a tuned postgresql.conf, whereas Kyle probably
didn't change many of the defaults.

Here is what I got just now:

pg@bat:~/code/jepsen/stolon$ lein run test-all -w append
--max-writes-per-key 4 --concurrency 50 -r 500 --isolation
serializable --time-limit 60 --nemesis none --existing-postgres --node
localhost --no-ssh --postgres-user jepsen --postgres-password pw

*** SNIP ***

INFO [2020-06-03 14:09:47,978] jepsen worker 33 - jepsen.stolon.append
:insert [#:next.jdbc{:update-count 1}]
INFO [2020-06-03 14:09:47,979] jepsen worker 33 - jepsen.util 133 :ok
:txn [[:r 8408 nil] [:append 8369 1] [:r 8402 [1 2 3]]]
INFO [2020-06-03 14:09:47,980] jepsen worker 15 - jepsen.util 115
:invoke :txn [[:append 8412 3]]
INFO [2020-06-03 14:09:47,981] jepsen worker 15 - jepsen.stolon.append
:update #:next.jdbc{:update-count 1}
INFO [2020-06-03 14:09:47,981] jepsen worker 15 - jepsen.util 115 :ok
:txn [[:append 8412 3]]
INFO [2020-06-03 14:09:48,012] jepsen test runner - jepsen.core Run
complete, writing
INFO [2020-06-03 14:09:48,912] jepsen test runner - jepsen.core Analyzing...
INFO [2020-06-03 14:09:51,415] jepsen test runner - jepsen.core
Analysis complete
INFO [2020-06-03 14:09:51,439] jepsen results - jepsen.store Wrote
/home/pg/code/jepsen/stolon/store/stolon append S (S)
/20200603T140847.000-0700/results.edn
INFO [2020-06-03 14:09:52,245] jepsen test runner - jepsen.core {:perf
{:latency-graph {:valid? true},
:rate-graph {:valid? true},
:valid? true},
:clock {:valid? true},
:stats
{:valid? true,
:count 30096,
:ok-count 27050,
:fail-count 2992,
:info-count 54,
:by-f
{:txn
{:valid? true,
:count 30096,
:ok-count 27050,
:fail-count 2992,
:info-count 54}}},
:exceptions {:valid? true},
:workload {:valid? true},
:valid? true}

Everything looks good! ヽ(‘ー`)ノ

# Successful tests

store/stolon append S (S) /20200603T140847.000-0700

1 successes
0 unknown
0 crashed
0 failures

--
Peter Geoghegan

Re: Potential G2-item cycles under serializable isolation

От

Peter Geoghegan

Дата:

04 июня 2020 г., 02:11:41

On Wed, Jun 3, 2020 at 2:35 PM Kyle Kingsbury <aphyr@jepsen.io> wrote:
> It looks like you're seeing a much higher txn success rate than I am--possibly due to your tuning? Might be worth
adjusting--rate and/or --concurrency upwards

I can see what I assume is the same problem (a failure/table flip and
a huge graph) with "--concurrency 150 -r 10000", and with autovacuum
disabled on the Postgres side (this is the same relatively tuned
Postgres configuration that I used when Jepsen passed for me). It's
difficult to run the tests, so it's hard to isolate without it taking
a long time.

BTW, the tests are kind of flappy. The Linux OOM killer just killed
Java after 20 minutes or so, for example. I assume that this is to be
expected with the settings cranked up like this -- the analysis will
take longer and use more memory, too. Any tips on limiting that? Is
there any reason to think that running the same test twice will affect
the outcome of the second test?

I also see this sometimes, even though I thought I fixed it earlier --
it seems to happen at random:

Caused by: java.lang.AssertionError: Assert failed: No transaction wrote 8363 2
t2

The fact that Kyle saw such a high number of failed transactions,
which are difficult to reproduce here seems to suggest that the issue
is related to running out of shared memory for predicate locks and/or
bloat (which tends to have the side effect of increasing the need for
predicate locks). I continue to suspect that this is related to an
edge case with predicate locks. It could be related to running out of
predicate locks -- maybe an issue with the lock escalation? That would
tend to increase the number of failures by quite a lot.

--
Peter Geoghegan

Re: Potential G2-item cycles under serializable isolation

От

Kyle Kingsbury

Дата:

04 июня 2020 г., 02:26:38

Oh this is interesting! I can say that I'm running on a 24-way Xeon with 128gb of ram, so running out of system memory doesn't immediately seem like a bottleneck--I'd suspect my config runs slower by dint of disks (older SSD), fs settings, or maybe postgres tuning (this is with the stock Debian config files).

"No process wrote x" is very surprising and means the database basically fabricated a value out of thin air--either something is very broken in postgres or (more likely, I think) there's some state left over from a prior run--I messaged you on Hangouts about this, but you might have to clear the tables by hand between runs. That's if I forgot to push up my latest commit, which clears tables in append.clj's setup! function. If that code is there... There's something deeper going wrong.

The memory consumption of jepsen during the analysis... There's probably stuff I can optimize there, but it's never been an issue before--most distributed dbs are only pushing ~100 txns/sec, not 10k, so our histories never get this big. I know this is gonna sound weird, but slowing down postgres might actually help with reproducing this bug. Another possible path is to run more (--test-count 100) tests with shorter time limits (--time-limit 20). Or maybe injecting (Thread/sleep) statements into the transactions themselves, like in apply-mop!. Not sure!

If you're having trouble sorting through results, lein run serve in the stolon/ directory will give you a little web server for browsing the store/ directory. Might come in handy!

--Kyle

On Wed, Jun 3, 2020, 19:11 Peter Geoghegan <pg@bowt.ie> wrote:

On Wed, Jun 3, 2020 at 2:35 PM Kyle Kingsbury <aphyr@jepsen.io> wrote:
> It looks like you're seeing a much higher txn success rate than I am--possibly due to your tuning? Might be worth adjusting --rate and/or --concurrency upwards

I can see what I assume is the same problem (a failure/table flip and
a huge graph) with "--concurrency 150 -r 10000", and with autovacuum
disabled on the Postgres side (this is the same relatively tuned
Postgres configuration that I used when Jepsen passed for me). It's
difficult to run the tests, so it's hard to isolate without it taking
a long time.

BTW, the tests are kind of flappy. The Linux OOM killer just killed
Java after 20 minutes or so, for example. I assume that this is to be
expected with the settings cranked up like this -- the analysis will
take longer and use more memory, too. Any tips on limiting that? Is
there any reason to think that running the same test twice will affect
the outcome of the second test?

I also see this sometimes, even though I thought I fixed it earlier --
it seems to happen at random:

Caused by: java.lang.AssertionError: Assert failed: No transaction wrote 8363 2
t2

The fact that Kyle saw such a high number of failed transactions,
which are difficult to reproduce here seems to suggest that the issue
is related to running out of shared memory for predicate locks and/or
bloat (which tends to have the side effect of increasing the need for
predicate locks). I continue to suspect that this is related to an
edge case with predicate locks. It could be related to running out of
predicate locks -- maybe an issue with the lock escalation? That would
tend to increase the number of failures by quite a lot.

--
Peter Geoghegan

Re: Potential G2-item cycles under serializable isolation

От

Peter Geoghegan

Дата:

04 июня 2020 г., 04:33:45

On Wed, Jun 3, 2020 at 4:26 PM Kyle Kingsbury <aphyr@jepsen.io> wrote:
> Oh this is interesting! I can say that I'm running on a 24-way Xeon with 128gb of ram, so running out of system
memorydoesn't immediately seem like a bottleneck--I'd suspect my config runs slower by dint of disks (older SSD), fs
settings,or maybe postgres tuning (this is with the stock Debian config files). 

I can now get it to run fairly consistently, now that I know to
consistently truncate all three tables between runs. It doesn't always
fail, but it fails often enough. And it doesn't seem to matter that my
local Postgres is so much faster, or has fewer failures. For example,
I now see the following failure on Postgres 13:

INFO [2020-06-03 18:26:50,706] jepsen test runner - jepsen.core {:perf
 {:latency-graph {:valid? true},
  :rate-graph {:valid? true},
  :valid? true},
 :clock {:valid? true},
 :stats
 {:valid? true,
  :count 30049,
  :ok-count 26792,
  :fail-count 3200,
  :info-count 57,
  :by-f
  {:txn
   {:valid? true,
    :count 30049,
    :ok-count 26792,
    :fail-count 3200,
    :info-count 57}}},
 :exceptions {:valid? true},
 :workload
 *** SNIP ***

Kyle: Could you figure out a way of setting "prepareThreshold=0" in
JDBC (i.e. disable prepared statements), please? That would make it a
bit easier to debug.

--
Peter Geoghegan

Re: Potential G2-item cycles under serializable isolation

От

Peter Geoghegan

Дата:

04 июня 2020 г., 05:15:24

On Sun, May 31, 2020 at 7:25 PM Kyle Kingsbury <aphyr@jepsen.io> wrote:
> Which typically produces, after about a minute, anomalies like the following:
>
> G2-item #1
> Let:
>    T1 = {:type :ok, :f :txn, :value [[:r 7 [1]] [:append 12 1]], :time 95024280,
> :process 5, :index 50}
>    T2 = {:type :ok, :f :txn, :value [[:append 7 2] [:r 14 nil] [:append 14 1]
> [:r 12 nil]], :time 98700211, :process 6, :index 70}
>
> Then:
>    - T1 < T2, because T1 did not observe T2's append of 2 to 7.
>    - However, T2 < T1, because T2 observed the initial (nil) state of 12, which
> T1 created by appending 1: a contradiction!

Is the format of these anomalies documented somewhere? How can I
determine what SQL each transaction generates from these values? It's
not obvious to me which of the three tables (which of txn0, txn1, and
txn2) are affected in each case.

-- 
Peter Geoghegan

Re: Potential G2-item cycles under serializable isolation

От

Kyle Kingsbury

Дата:

04 июня 2020 г., 23:34:58

On 6/3/20 10:15 PM, Peter Geoghegan wrote:
> On Sun, May 31, 2020 at 7:25 PM Kyle Kingsbury <aphyr@jepsen.io> wrote:
>> Which typically produces, after about a minute, anomalies like the following:
>>
>> G2-item #1
>> Let:
>>     T1 = {:type :ok, :f :txn, :value [[:r 7 [1]] [:append 12 1]], :time 95024280,
>> :process 5, :index 50}
>>     T2 = {:type :ok, :f :txn, :value [[:append 7 2] [:r 14 nil] [:append 14 1]
>> [:r 12 nil]], :time 98700211, :process 6, :index 70}
>>
>> Then:
>>     - T1 < T2, because T1 did not observe T2's append of 2 to 7.
>>     - However, T2 < T1, because T2 observed the initial (nil) state of 12, which
>> T1 created by appending 1: a contradiction!
>>
>> Is the format of these anomalies documented somewhere?
Unfortunately no. This is a plain-text representation emitted by Elle. You'll 
also find a corresponding diagram of the cycle in `elle/g2-item/1.svg`, which 
might be a bit easier to understand. The transactions themselves ({:type :ok 
...}) are EDN (Clojure) data structures representing the completion operations 
from Jepsen; you'll also see this format in history.edn.
>> How can I determine what SQL each transaction generates from these values? It's
>> not obvious to me which of the three tables (which of txn0, txn1, and txn2) are affected in each case.

This is a good and obvious question which I don't yet have a good answer for. 
Reading the source gives you *some* idea of what SQL's being generated, but 
there's some stuff being done by next.jdbc and JDBC itself, so I don't know how 
to show you *exactly* what goes over the wire. A terrible way to do this is to 
look at the pcap traces in wireshark--you can correlate from the timestamps in 
jepsen.log, or search for the transactions which interacted with specific keys.

One option would be to add some sort of tracing thing to the test so that it 
records the SQL statements it generates as extra metadata on operations. I can 
look into doing that for you later on. :)

--Kyle

Re: Potential G2-item cycles under serializable isolation

От

Kyle Kingsbury

Дата:

04 июня 2020 г., 23:59:42

On 6/3/20 9:33 PM, Peter Geoghegan wrote:
> On Wed, Jun 3, 2020 at 4:26 PM Kyle Kingsbury <aphyr@jepsen.io> wrote:
>> Oh this is interesting! I can say that I'm running on a 24-way Xeon with 128gb of ram, so running out of system
memorydoesn't immediately seem like a bottleneck--I'd suspect my config runs slower by dint of disks (older SSD), fs
settings,or maybe postgres tuning (this is with the stock Debian config files).
 
> I can now get it to run fairly consistently, now that I know to
> consistently truncate all three tables between runs.
The test will do that automatically for you now. :)
>   It doesn't always
> fail, but it fails often enough.

Hey, that's good!

>   And it doesn't seem to matter that my
> local Postgres is so much faster, or has fewer failures. For example,
> I now see the following failure on Postgres 13:
>
> INFO [2020-06-03 18:26:50,706] jepsen test runner - jepsen.core {:perf
>   {:latency-graph {:valid? true},
>    :rate-graph {:valid? true},
>    :valid? true},
>   :clock {:valid? true},
>   :stats
>   {:valid? true,
>    :count 30049,
>    :ok-count 26792,
>    :fail-count 3200,
>    :info-count 57,
>    :by-f
>    {:txn
>     {:valid? true,
>      :count 30049,
>      :ok-count 26792,
>      :fail-count 3200,
>      :info-count 57}}},
>   :exceptions {:valid? true},
>   :workload
>   *** SNIP ***
Aw shucks, you left out the good part! I'm guessing the workload returned 
{:valid? false} here?

> Kyle: Could you figure out a way of setting "prepareThreshold=0" in
> JDBC (i.e. disable prepared statements), please? That would make it a
> bit easier to debug.

Try out 21ae84ed: I added a --prepare-threshold option for you. :)

--Kyle

Re: Potential G2-item cycles under serializable isolation

От

Peter Geoghegan

Дата:

05 июня 2020 г., 00:29:44

On Thu, Jun 4, 2020 at 1:35 PM Kyle Kingsbury <aphyr@jepsen.io> wrote:
> >> How can I determine what SQL each transaction generates from these values? It's
> >> not obvious to me which of the three tables (which of txn0, txn1, and txn2) are affected in each case.
>
> This is a good and obvious question which I don't yet have a good answer for.
> Reading the source gives you *some* idea of what SQL's being generated, but
> there's some stuff being done by next.jdbc and JDBC itself, so I don't know how
> to show you *exactly* what goes over the wire. A terrible way to do this is to
> look at the pcap traces in wireshark--you can correlate from the timestamps in
> jepsen.log, or search for the transactions which interacted with specific keys.

I'd appreciate it if you could provide this information, so I can be
confident I didn't get something wrong. I don't really understand how
Elle detects this G2-Item anomaly, nor how it works in general.
PostgreSQL doesn't really use 2PL, even to a limited degree (unlike
Oracle), so a lot of the definitions from the "Generalized Isolation
Level Definitions"/Adya paper are not particularly intuitive to me.

That said, I find it easy to understand why the "G2-item: Item
Anti-dependency Cycles" example from the paper exhibits behavior that
would be wrong for Postgres -- even in repeatable read mode. If
Postgres exhibits this anomaly (in repeatable read more or
serializable mode), that would be a case of a transaction reading data
that isn't visible to its original transaction snapshot. The paper
supposes that this could happen when another transaction (the one that
updated the sum-of-salaries from the example) committed.

Just being able to see the SQL executed by each of the two
transactions would be compelling evidence of a serious problem,
provided the details are equivalent to the sum-of-salaries example
from the paper.

> One option would be to add some sort of tracing thing to the test so that it
> records the SQL statements it generates as extra metadata on operations. I can
> look into doing that for you later on. :)

If each Jepsen worker has its own connection for the duration of the
test (which I guess must happen already), and each connection
specified an informative and unique "application_name", it would be
possible to see Jepsen's string from the Postgres logs, next to the
SQL text. With prepared statements, you'd see the constants used,
though not in all log messages (iirc they don't appear in error
messages, but do appear in regular log messages).

See:

https://www.postgresql.org/docs/current/runtime-config-logging.html#GUC-LOG-LINE-PREFIX

If you can't figure out how to get JDBC to accept the application_name
you want to provide, then you execute a "set application_name = 'my
application name';" SQL statement within each Jepsen worker instead.
Do this at the start, I suppose. Whichever approach is easiest and
makes sense.

You might have something like this in postgresql.conf to see the
"application_name" string next to each statement from the log (This
will increase the log volume considerably, which should still be
manageable):

log_line_prefix='%p %a %l'
log_statement=all

(I also suggest further customizing log_line_prefix in whatever way
seems most useful to you.)

-- 
Peter Geoghegan

Re: Potential G2-item cycles under serializable isolation

От

Kyle Kingsbury

Дата:

05 июня 2020 г., 01:11:14

On 6/4/20 5:29 PM, Peter Geoghegan wrote:
> I'd appreciate it if you could provide this information, so I can be
> confident I didn't get something wrong. I don't really understand how
> Elle detects this G2-Item anomaly, nor how it works in general.
> PostgreSQL doesn't really use 2PL, even to a limited degree (unlike
> Oracle), so a lot of the definitions from the "Generalized Isolation
> Level Definitions"/Adya paper are not particularly intuitive to me.

Hopefully you shouldn't have to think about 2PL, because the generalized 
phenomena are defined independently of locking--though I think the paper talks 
about 2PL in order to show equivalency to the locking approaches. The gist of 
the generalized definitions is about information flow--the anomalies (well, most 
of them) correspond to cycles in the dependency graph between transactions. Elle 
works (very loosely) by inferring this dependency graph.

I think you've probably read this already, and I know it's a *lot* to throw out 
there all at once, but the Elle readme and paper might be helpful here. In 
particular, section 2 of the paper ("The Adya Formalism") gives a brief overview 
of what these dependencies are, and section 3 gives an intuition for how we can 
infer the dependency graph.

https://github.com/jepsen-io/elle

https://github.com/jepsen-io/elle/blob/master/paper/elle.pdf

> That said, I find it easy to understand why the "G2-item: Item
> Anti-dependency Cycles" example from the paper exhibits behavior that
> would be wrong for Postgres -- even in repeatable read mode.
Yeah! My understanding is that this behavior would be incorrect either under 
snapshot serializability or repeatable read, at least using the generalized 
definitions. It might be OK to do this under repeatable read given the anomaly 
interpretation of the ANSI SQL spec; not entirely sure.
> If
> Postgres exhibits this anomaly (in repeatable read more or
> serializable mode), that would be a case of a transaction reading data
> that isn't visible to its original transaction snapshot. The paper
> supposes that this could happen when another transaction (the one that
> updated the sum-of-salaries from the example) committed.

Yes! It's not always this obvious--G2-item encompasses any dependency cycle 
between transactions such that at least one dependency involves a transaction 
writing state which was not observed by some (ostensibly prior) transaction's 
read. We call these "rw dependencies" in the paper, because they involve a read 
which must have occurred before a write. Another way to think of G2-item is "A 
transaction failed to see something that happened in its logical past".

A special case of G2-item, G-single, is commonly known as read skew. In Elle, we 
tag G-single separately, so all the G2-item anomalies reported actually involve 
2+ rw dependencies, not just 1+. I haven't seen G-single yet, which is 
good--that means Postgres isn't violating SI, just SSI. Or, of course, the test 
itself could be broken--maybe the SQL statements themselves are subtly wrong, or 
our inference is incorrect.

> If each Jepsen worker has its own connection for the duration of the
> test (which I guess must happen already), and each connection
> specified an informative and unique "application_name", it would be
> possible to see Jepsen's string from the Postgres logs, next to the
> SQL text.

Give Jepsen a138843d a shot!

1553 jepsen process 27 16 LOG:  execute <unnamed>: select (val) from txn0 where 
sk = $1

1553 jepsen process 27 17 DETAIL:  parameters: $1 = '9'

Here, "process 27" is the same as the :process field you'll see in transactions. 
I'd like to be able to get a mini log of SQL statements embedded in the 
operation itself, so it'd be *right there* in the anomaly explanation, but... I 
haven't figured out how to scrape those side effects out of the guts of JDBC yet.

--Kyle

Re: Potential G2-item cycles under serializable isolation

От

Peter Geoghegan

Дата:

05 июня 2020 г., 01:47:22

On Thu, Jun 4, 2020 at 3:11 PM Kyle Kingsbury <aphyr@jepsen.io> wrote:
> Give Jepsen a138843d a shot!

Giving it a go now. I'll respond to the rest of your points separately.

FYI, at commit a138843d of Jepsen I can get runs that report roughly
the same G2-item anomaly at the end, but show a warning that flies by
-- not sure if this is new or not:

NFO [2020-06-04 15:39:46,204] jepsen worker 5 - jepsen.stolon.append
not in transaction
INFO [2020-06-04 15:39:46,204] jepsen worker 12 - jepsen.util 62
:invoke :txn [[:r 5184 nil] [:r 5184 nil] [:append 5184 4]]
INFO [2020-06-04 15:39:46,204] jepsen worker 3 - jepsen.stolon.append
txn insert failed:  ERROR: duplicate key value violates unique
constraint "txn0_pkey"
  Detail: Key (id)=(5184) already exists.
INFO [2020-06-04 15:39:46,205] jepsen worker 5 - jepsen.stolon.append
nil insert failed:  ERROR: duplicate key value violates unique
constraint "txn0_pkey"
  Detail: Key (id)=(5184) already exists.
INFO [2020-06-04 15:39:46,205] jepsen worker 3 - jepsen.stolon.append
:update #:next.jdbc{:update-count 0}
INFO [2020-06-04 15:39:46,206] jepsen worker 5 - jepsen.stolon.append
:update #:next.jdbc{:update-count 1}
INFO [2020-06-04 15:39:46,206] jepsen worker 5 - jepsen.util 205 :ok
:txn [[:append 5184 3]]
WARN [2020-06-04 15:39:46,206] jepsen worker 3 - jepsen.stolon.append
Caught ex-info
clojure.lang.ExceptionInfo: throw+: {:type
:jepsen.stolon.append/homebrew-upsert-failed, :key 5184, :element 1}
at slingshot.support$stack_trace.invoke(support.clj:201)
at jepsen.stolon.append$mop_BANG_.invokeStatic(append.clj:111)
at jepsen.stolon.append$mop_BANG_.invoke(append.clj:80)
at clojure.core$partial$fn__5828.invoke(core.clj:2638)
at clojure.core$mapv$fn__8430.invoke(core.clj:6912)
at clojure.lang.PersistentVector.reduce(PersistentVector.java:343)
at clojure.core$reduce.invokeStatic(core.clj:6827)
at clojure.core$mapv.invokeStatic(core.clj:6903)
at clojure.core$mapv.invoke(core.clj:6903)
at jepsen.stolon.append.Client$fn__1987.invoke(append.clj:173)
at next.jdbc.transaction$transact_STAR_.invokeStatic(transaction.clj:39)
at next.jdbc.transaction$transact_STAR_.invoke(transaction.clj:18)
at next.jdbc.transaction$eval1805$fn__1806.invoke(transaction.clj:86)
at next.jdbc.protocols$eval1057$fn__1058$G__1048__1067.invoke(protocols.clj:57)
at next.jdbc$transact.invokeStatic(jdbc.clj:253)
at next.jdbc$transact.invoke(jdbc.clj:245)
at jepsen.stolon.append.Client.invoke_BANG_(append.clj:171)
at jepsen.client.Validate.invoke_BANG_(client.clj:66)
at jepsen.generator.interpreter.ClientWorker.invoke_BANG_(interpreter.clj:61)
at jepsen.generator.interpreter$spawn_worker$fn__9378$fn__9379.invoke(interpreter.clj:136)
at jepsen.generator.interpreter$spawn_worker$fn__9378.invoke(interpreter.clj:119)
at clojure.core$binding_conveyor_fn$fn__5739.invoke(core.clj:2030)
at clojure.lang.AFn.call(AFn.java:18)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
INFO [2020-06-04 15:39:46,207] jepsen worker 3 - jepsen.util 3 :info
:txn [[:r 5182 nil] [:r 5184 nil] [:append 5184 1] [:r 5182 nil]]
[:ex-info "throw+: {:type
:jepsen.stolon.append/homebrew-upsert-failed, :key 5184, :element 1}"]
INFO [2020-06-04 15:39:46,207] jepsen worker 12 - jepsen.util 62 :fail
:txn [[:r 5184 nil] [:r 5184 nil] [:append 5184 4]]
[:could-not-serialize "ERROR: could not serialize access due to
concurrent update"]
INFO [2020-06-04 15:39:46,208] jepsen worker 0 - jepsen.util 50
:invoke :txn [[:r 5185 nil]]

Is this expected? Does this matter?

-- 
Peter Geoghegan

Re: Potential G2-item cycles under serializable isolation

От

Kyle Kingsbury

Дата:

05 июня 2020 г., 01:51:50

Naw, that's ok. :homebrew-upsert-failed is just letting you know that we couldn't do the update-insert-update dance. Shouldn't be any safety impact. :-)

--Kyle

On Thu, Jun 4, 2020, 18:47 Peter Geoghegan <pg@bowt.ie> wrote:

On Thu, Jun 4, 2020 at 3:11 PM Kyle Kingsbury <aphyr@jepsen.io> wrote:
> Give Jepsen a138843d a shot!

Giving it a go now. I'll respond to the rest of your points separately.

FYI, at commit a138843d of Jepsen I can get runs that report roughly
the same G2-item anomaly at the end, but show a warning that flies by
-- not sure if this is new or not:

NFO [2020-06-04 15:39:46,204] jepsen worker 5 - jepsen.stolon.append
not in transaction
INFO [2020-06-04 15:39:46,204] jepsen worker 12 - jepsen.util 62
:invoke :txn [[:r 5184 nil] [:r 5184 nil] [:append 5184 4]]
INFO [2020-06-04 15:39:46,204] jepsen worker 3 - jepsen.stolon.append
txn insert failed: ERROR: duplicate key value violates unique
constraint "txn0_pkey"
Detail: Key (id)=(5184) already exists.
INFO [2020-06-04 15:39:46,205] jepsen worker 5 - jepsen.stolon.append
nil insert failed: ERROR: duplicate key value violates unique
constraint "txn0_pkey"
Detail: Key (id)=(5184) already exists.
INFO [2020-06-04 15:39:46,205] jepsen worker 3 - jepsen.stolon.append
:update #:next.jdbc{:update-count 0}
INFO [2020-06-04 15:39:46,206] jepsen worker 5 - jepsen.stolon.append
:update #:next.jdbc{:update-count 1}
INFO [2020-06-04 15:39:46,206] jepsen worker 5 - jepsen.util 205 :ok
:txn [[:append 5184 3]]
WARN [2020-06-04 15:39:46,206] jepsen worker 3 - jepsen.stolon.append
Caught ex-info
clojure.lang.ExceptionInfo: throw+: {:type
:jepsen.stolon.append/homebrew-upsert-failed, :key 5184, :element 1}
at slingshot.support$stack_trace.invoke(support.clj:201)
at jepsen.stolon.append$mop_BANG_.invokeStatic(append.clj:111)
at jepsen.stolon.append$mop_BANG_.invoke(append.clj:80)
at clojure.core$partial$fn__5828.invoke(core.clj:2638)
at clojure.core$mapv$fn__8430.invoke(core.clj:6912)
at clojure.lang.PersistentVector.reduce(PersistentVector.java:343)
at clojure.core$reduce.invokeStatic(core.clj:6827)
at clojure.core$mapv.invokeStatic(core.clj:6903)
at clojure.core$mapv.invoke(core.clj:6903)
at jepsen.stolon.append.Client$fn__1987.invoke(append.clj:173)
at next.jdbc.transaction$transact_STAR_.invokeStatic(transaction.clj:39)
at next.jdbc.transaction$transact_STAR_.invoke(transaction.clj:18)
at next.jdbc.transaction$eval1805$fn__1806.invoke(transaction.clj:86)
at next.jdbc.protocols$eval1057$fn__1058$G__1048__1067.invoke(protocols.clj:57)
at next.jdbc$transact.invokeStatic(jdbc.clj:253)
at next.jdbc$transact.invoke(jdbc.clj:245)
at jepsen.stolon.append.Client.invoke_BANG_(append.clj:171)
at jepsen.client.Validate.invoke_BANG_(client.clj:66)
at jepsen.generator.interpreter.ClientWorker.invoke_BANG_(interpreter.clj:61)
at jepsen.generator.interpreter$spawn_worker$fn__9378$fn__9379.invoke(interpreter.clj:136)
at jepsen.generator.interpreter$spawn_worker$fn__9378.invoke(interpreter.clj:119)
at clojure.core$binding_conveyor_fn$fn__5739.invoke(core.clj:2030)
at clojure.lang.AFn.call(AFn.java:18)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
INFO [2020-06-04 15:39:46,207] jepsen worker 3 - jepsen.util 3 :info
:txn [[:r 5182 nil] [:r 5184 nil] [:append 5184 1] [:r 5182 nil]]
[:ex-info "throw+: {:type
:jepsen.stolon.append/homebrew-upsert-failed, :key 5184, :element 1}"]
INFO [2020-06-04 15:39:46,207] jepsen worker 12 - jepsen.util 62 :fail
:txn [[:r 5184 nil] [:r 5184 nil] [:append 5184 4]]
[:could-not-serialize "ERROR: could not serialize access due to
concurrent update"]
INFO [2020-06-04 15:39:46,208] jepsen worker 0 - jepsen.util 50
:invoke :txn [[:r 5185 nil]]

Is this expected? Does this matter?

--
Peter Geoghegan

Re: Potential G2-item cycles under serializable isolation

От

Peter Geoghegan

Дата:

05 июня 2020 г., 02:14:20

On Thu, Jun 4, 2020 at 3:52 PM Kyle Kingsbury <aphyr@jepsen.io> wrote:
> Naw, that's ok. :homebrew-upsert-failed is just letting you know that we couldn't do the update-insert-update dance.
Shouldn'tbe any safety impact. :-)
 

I also notice some deadlocks when inserting -- not duplicate violations:

1591311225.473 170332 1197463 INSERT waiting 5ed97b3c.2995c jepsen
process 25 LOG:  process 170332 still waiting for ShareLock on
transaction 1197460 after 1000.091 ms
1591311225.473 170332 1197463 INSERT waiting 5ed97b3c.2995c jepsen
process 25 DETAIL:  Process holding the lock: 170418. Wait queue:
170332, 170450, 170478.
1591311225.473 170332 1197463 INSERT waiting 5ed97b3c.2995c jepsen
process 25 CONTEXT:  while inserting index tuple (3,176) in relation
"txn0_pkey"
1591311225.474 170450 1197458 INSERT waiting 5ed97b5b.299d2 jepsen
process 55 LOG:  process 170450 detected deadlock while waiting for
ShareLock on transaction 1197460 after 1000.086 ms
1591311225.474 170450 1197458 INSERT waiting 5ed97b5b.299d2 jepsen
process 55 DETAIL:  Process holding the lock: 170418. Wait queue:
170332, 170478.
1591311225.474 170450 1197458 INSERT waiting 5ed97b5b.299d2 jepsen
process 55 CONTEXT:  while inserting index tuple (5,279) in relation
"txn0_pkey"

Are they expected?

-- 
Peter Geoghegan

Re: Potential G2-item cycles under serializable isolation

От

Peter Geoghegan

Дата:

05 июня 2020 г., 03:20:22

On Thu, Jun 4, 2020 at 3:11 PM Kyle Kingsbury <aphyr@jepsen.io> wrote:
> Yes! It's not always this obvious--G2-item encompasses any dependency cycle
> between transactions such that at least one dependency involves a transaction
> writing state which was not observed by some (ostensibly prior) transaction's
> read. We call these "rw dependencies" in the paper, because they involve a read
> which must have occurred before a write. Another way to think of G2-item is "A
> transaction failed to see something that happened in its logical past".

Are you familiar with the paper "Serializable Snapshot Isolation in
PostgreSQL"? You might find it helpful:

http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf

Is there a difference between "rw dependencies" as you understand the
term, and what the paper calls "rw-antidependencies"?

> A special case of G2-item, G-single, is commonly known as read skew. In Elle, we
> tag G-single separately, so all the G2-item anomalies reported actually involve
> 2+ rw dependencies, not just 1+. I haven't seen G-single yet, which is
> good--that means Postgres isn't violating SI, just SSI. Or, of course, the test
> itself could be broken--maybe the SQL statements themselves are subtly wrong, or
> our inference is incorrect.

I'm glad that you don't suspect snapshot isolation has been violated.
Frankly I'd be astonished if Postgres is found to be violating SI
here. Anything is possible, but if that happened then it would almost
certainly be far more obvious. The way MVCC works in Postgres is
relatively simple. If an xact in repeatable read mode really did
return a row that wasn't visible to its snapshot, then it's probably
just as likely to return two row versions for the same logical row, or
zero row versions. These are symptoms of various types of data
corruption that we see from time to time. Note, in particular, that
violating SI cannot happen because a transaction released a lock in an
index when it shouldn't have -- because we simply don't have those.
(Actually, we do have something called predicate locks, but those are
not at all like 2PL index value locks -- see the paper I linked to for
more.)

> Give Jepsen a138843d a shot!
>
> 1553 jepsen process 27 16 LOG:  execute <unnamed>: select (val) from txn0 where
> sk = $1
>
> 1553 jepsen process 27 17 DETAIL:  parameters: $1 = '9'

Attached is:

* A Jepsen failure of the kind we've been talking about

* Log output from Postgres that shows all log lines with the
relevant-to-failure Jepsen worker numbers, as discussed. This is
interleaved based on timestamp order.

Can you explain the anomaly with reference to the actual SQL queries
executed in the log? Is the information that I've provided sufficient?

-- 
Peter Geoghegan

Вложения

Re: Potential G2-item cycles under serializable isolation

От

Peter Geoghegan

Дата:

05 июня 2020 г., 05:03:32

On Thu, Jun 4, 2020 at 5:20 PM Peter Geoghegan <pg@bowt.ie> wrote:
> On Thu, Jun 4, 2020 at 3:11 PM Kyle Kingsbury <aphyr@jepsen.io> wrote:
> > Yes! It's not always this obvious--G2-item encompasses any dependency cycle
> > between transactions such that at least one dependency involves a transaction
> > writing state which was not observed by some (ostensibly prior) transaction's
> > read. We call these "rw dependencies" in the paper, because they involve a read
> > which must have occurred before a write. Another way to think of G2-item is "A
> > transaction failed to see something that happened in its logical past".
>
> Are you familiar with the paper "Serializable Snapshot Isolation in
> PostgreSQL"? You might find it helpful:
>
> http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf
>
> Is there a difference between "rw dependencies" as you understand the
> term, and what the paper calls "rw-antidependencies"?

"3.2 Serializability Theory" has some stuff about Adya et al, but also
talks about the work of Fekete et al -- apparently they built on the
Adya paper (both papers share one author). Are you familiar with the
Fekete paper?


-- 
Peter Geoghegan

Re: Potential G2-item cycles under serializable isolation

От

Kyle Kingsbury

Дата:

05 июня 2020 г., 15:05:23

On 6/4/20 10:03 PM, Peter Geoghegan wrote:
>> Are you familiar with the paper "Serializable Snapshot Isolation in
>> PostgreSQL"? You might find it helpful:
>>
>> http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf
It's been several years since I've read it, and it's worth revisiting. Thanks 
for the recommendation! Yes, I think a good way to understand Elle's G2-item is 
that it might be (and I'm not entirely sure about this--this was only a cursory 
read last night) one of the "dangerous structures" in the Postgres SSI paper. I 
elided this earlier to try and keep things simple, but we actually separate out 
cycles featuring *nonadjacent* rw edges and consider them a separate sort of 
anomaly in Elle, so we can distinguish between snapshot isolation and 
serializability more precisely.
>> Is there a difference between "rw dependencies" as you understand the
>> term, and what the paper calls "rw-antidependencies"?
Yes, my apologies. "anti-dependency" = "rw-antidependency" = "rw dependency" = 
"T1 read state x1, and T2 installed state x2, such that x1 immediately precedes 
x2".
>> "3.2 Serializability Theory" has some stuff about Adya et al, but also
>> talks about the work of Fekete et al -- apparently they built on the
>> Adya paper (both papers share one author). Are you familiar with the
>> Fekete paper?

Yes! Some of Elle's inferences are based in part on Fekete et al.'s work, and 
we've corresponded some during Elle's development. :)

---Kyle

Re: Potential G2-item cycles under serializable isolation

От

Kyle Kingsbury

Дата:

05 июня 2020 г., 16:02:58

On 6/4/20 8:20 PM, Peter Geoghegan wrote:
> Can you explain the anomaly with reference to the actual SQL queries
> executed in the log? Is the information that I've provided sufficient?

Elle automatically explains these anomalies, so it might be easier to interpret 
with the visualization and/or interpretation of the anomaly--you'll find those 
in store/latest/elle/. Looking at the data structure representation of the 
anomaly, here's what I see in jepsen-failure-2.txt. I'm looking at the first 
(and only) anomaly in :workload/:anomalies/:G2-item.

{:cycle
      [{:type :ok,
        :f :txn,
        :value
        [[:r 100 nil] [:append 101 1] [:r 101 [1]] [:r 96 [1 2]]],
        :time 750303448,
        :process 23,
        :index 727}
       {:type :ok,
        :f :txn,
        :value [[:append 100 1] [:r 99 nil] [:r 101 nil] [:append 4 4]],
        :time 751922048,
        :process 5,
        :index 729}
       {:type :ok,
        :f :txn,
        :value
        [[:r 100 nil] [:append 101 1] [:r 101 [1]] [:r 96 [1 2]]],
        :time 750303448,
        :process 23,
        :index 727}],
      :steps
      ({:type :rw,
        :key 100,
        :value :elle.list-append/init,
        :value' 1,
        :a-mop-index 0,
        :b-mop-index 0}
       {:type :rw,
        :key 101,
        :value :elle.list-append/init,
        :value' 1,
        :a-mop-index 2,
        :b-mop-index 1}),
      :type :G2-item}

We have a cycle involving two transactions (:cycle)--the first transaction is 
shown at the end as well to make it easier to see that it closes the loop. Let's 
call them T1 (:index 727) and T2 (index :729). The first step (:steps) is an 
anti-dependency: T1 read key 100 and observed the initial state ([:r 100 nil]), 
but T2 appended 1 to 100, so we know T1 < T2. The second step is also an 
anti-dependency: T2 read key 101 and observed the initial state ([:r 101 nil]). 
Both of these transactions completed successfully (:type :ok), which means we 
have two committed transactions, both of which failed to observe the other's 
writes. These can't be serializable, because if we choose T1 < T2, T2 fails to 
observe a write that T1 performed, and if we choose T2 < T1, T1 fails to observe 
a write that T2 performed. You'll probably see something similar to this 
explanation in store/latest/elle/G2-item.txt, and a corresponding diagram in 
store/latest/elle/G2-item/0.svg.

Elle infers, from the presence of G2-item, that this history cannot satisfy 
repeatable read (:not #{:repeatable-read}). By extension, it knows (:also-not) 
that it can't be serializable, strict serializable, or strong session 
serializable either.

Now, to the SQL. T1 was performed by process 23, and T2 by process 5--you've 
helpfully restricted the postgres logs to just those two processes, so you're 
one step ahead of me. I searched for $1 = '100' to find the read of key 100 from 
process 23:

1591312493.044 171762 0 BEGIN 5ed9806c.29ef2 jepsen process 23 LOG:  execute 
<unnamed>: BEGIN
1591312493.044 171762 0 SELECT 5ed9806c.29ef2 jepsen process 23 DETAIL:  
parameters: $1 = '100'
1591312493.044 171762 0 SELECT 5ed9806c.29ef2 jepsen process 23 LOG:  execute 
<unnamed>: select (val) from txn2 where sk = $1

So, process 23 begins a transaction and reads key 100 from table txn2, using the 
secondary key `sk`. This is new, BTW--earlier versions of the test did every 
operation by primary key; I've been making the test harder over time. That read 
returns no rows: [:r 100 nil].

Process 23, at least in this log, goes on to set the session to SERIALIZABLE. 
This is a bit weird, because that statement should be executed by JDBC when we 
started the transaction, right? Makes me wonder if maybe the log lines are out 
of order? Now that I look... these are suspiciously alphabetically ordered. If 
you sorted the file, that might explain it. :)

1591312493.044 171762 0 SET 5ed9806c.29ef2 jepsen process 23 LOG:  execute 
<unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL SERIALIZABLE
1591312493.044 171762 0 SHOW 5ed9806c.29ef2 jepsen process 23 LOG:  execute 
<unnamed>: SHOW TRANSACTION ISOLATION LEVEL

Next (in this sorted log, but not in reality), process 23 appends 1 to key 101, 
in table txn0:

1591312493.045 171762 0 INSERT 5ed9806c.29ef2 jepsen process 23 DETAIL:  
parameters: $1 = '101', $2 = '101', $3 = '1'
1591312493.045 171762 0 INSERT 5ed9806c.29ef2 jepsen process 23 LOG:  execute 
<unnamed>: insert into txn0 (id, sk, val) values ($1, $2, $3)

There should have been an update first, but it got sorted later in this file. 
Here's the update--this must have happened first, failed, and caused process 23 
to back off to an insert.

1591312493.045 171762 0 SAVEPOINT 5ed9806c.29ef2 jepsen process 23 LOG:  execute 
<unnamed>: savepoint upsert
1591312493.045 171762 0 UPDATE 5ed9806c.29ef2 jepsen process 23 DETAIL:  
parameters: $1 = '1', $2 = '101'
1591312493.045 171762 0 UPDATE 5ed9806c.29ef2 jepsen process 23 LOG:  execute 
<unnamed>: update txn0 set val = CONCAT(val, ',', $1) where id = $2

Next, process 5 begins its transaction:

1591312493.046 171734 0 BEGIN 5ed9806c.29ed6 jepsen process 5 LOG:  execute 
<unnamed>: BEGIN
1591312493.046 171734 0 SET 5ed9806c.29ed6 jepsen process 5 LOG: execute 
<unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL SERIALIZABLE
1591312493.046 171734 0 SHOW 5ed9806c.29ed6 jepsen process 5 LOG: execute 
<unnamed>: SHOW TRANSACTION ISOLATION LEVEL

... and tries to append 1 to key 100 by primary key `id`:

1591312493.046 171734 0 UPDATE 5ed9806c.29ed6 jepsen process 5 DETAIL:  
parameters: $1 = '1', $2 = '100'
1591312493.046 171734 0 UPDATE 5ed9806c.29ed6 jepsen process 5 LOG:  execute 
<unnamed>: update txn2 set val = CONCAT(val, ',', $1) where id = $2

Process 23 goes on to release its upsert savepoint, and perform reads of key 101 
and 96, observing [1] and [1 2] respectively, before committing. Again, 
out-of-order log due to sorting.

1591312493.046 171762 1197960 COMMIT 5ed9806c.29ef2 jepsen process 23 LOG:  
execute <unnamed>: COMMIT
1591312493.046 171762 1197960 RELEASE 5ed9806c.29ef2 jepsen process 23 LOG:  
execute <unnamed>: release savepoint upsert
1591312493.046 171762 1197960 SELECT 5ed9806c.29ef2 jepsen process 23 DETAIL:  
parameters: $1 = '101'
1591312493.046 171762 1197960 SELECT 5ed9806c.29ef2 jepsen process 23 DETAIL:  
parameters: $1 = '96'
1591312493.046 171762 1197960 SELECT 5ed9806c.29ef2 jepsen process 23 LOG:  
execute <unnamed>: select (val) from txn0 where id = $1
1591312493.046 171762 1197960 SELECT 5ed9806c.29ef2 jepsen process 23 LOG:  
execute <unnamed>: select (val) from txn0 where sk = $1

Process 5's update affects zero rows, so it similarly backs off to an insert 
statement:

1591312493.047 171734 0 INSERT 5ed9806c.29ed6 jepsen process 5 DETAIL:  
parameters: $1 = '100', $2 = '100', $3 = '1'
1591312493.047 171734 0 INSERT 5ed9806c.29ed6 jepsen process 5 LOG:  execute 
<unnamed>: insert into txn2 (id, sk, val) values ($1, $2, $3)
1591312493.047 171734 0 SAVEPOINT 5ed9806c.29ed6 jepsen process 5 LOG:  execute 
<unnamed>: savepoint upsert
1591312493.047 171734 1197963 RELEASE 5ed9806c.29ed6 jepsen process 5 LOG:  
execute <unnamed>: release savepoint upsert

... followed by reads of 99 and 101.

1591312493.047 171734 1197963 SELECT 5ed9806c.29ed6 jepsen process 5 DETAIL:  
parameters: $1 = '101'
1591312493.047 171734 1197963 SELECT 5ed9806c.29ed6 jepsen process 5 DETAIL:  
parameters: $1 = '99'
1591312493.047 171734 1197963 SELECT 5ed9806c.29ed6 jepsen process 5 LOG:  
execute <unnamed>: select (val) from txn0 where sk = $1
1591312493.047 171734 1197963 SELECT 5ed9806c.29ed6 jepsen process 5 LOG:  
execute <unnamed>: select (val) from txn2 where id = $1

We know the read of 101 observed no rows, which makes sense because process 5
(T2)'s transaction began during process 23's transaction, *before* the commit. 
What *doesn't* make sense is...

1591312493.048 171734 1197963 COMMIT 5ed9806c.29ed6 jepsen process 5 LOG:  
execute <unnamed>: COMMIT

Process 5 manages to *commit* despite the anti-dependency cycle! That's legal 
under SI, but not under serializability.

Phew! I know it's a lot. Does this help?

--Kyle

Re: Potential G2-item cycles under serializable isolation

От

Kyle Kingsbury

Дата:

05 июня 2020 г., 20:00:18

After staring at a few hundred of these anomalies, I'm more confident: this 
problem seems to involve a transaction which fails to observe a logically prior 
transaction's *insert*. So far we never see a cycle involving all updates rw 
deps--there's always 2+ rw deps, and at least one involves the initial state. 
Maybe this helps narrow down the search!

--Kyle

Re: Potential G2-item cycles under serializable isolation

От

Peter Geoghegan

Дата:

05 июня 2020 г., 20:04:56

On Fri, Jun 5, 2020 at 10:00 AM Kyle Kingsbury <aphyr@jepsen.io> wrote:
> After staring at a few hundred of these anomalies, I'm more confident: this
> problem seems to involve a transaction which fails to observe a logically prior
> transaction's *insert*. So far we never see a cycle involving all updates rw
> deps--there's always 2+ rw deps, and at least one involves the initial state.
> Maybe this helps narrow down the search!

It wouldn't be the first time there was such an anomaly:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=585e2a3b

(You probably vaguely remember this as the bug that people were
talking about at Strange loop that same year.)

-- 
Peter Geoghegan

Re: Potential G2-item cycles under serializable isolation

От

Peter Geoghegan

Дата:

05 июня 2020 г., 22:43:13

On Fri, Jun 5, 2020 at 6:03 AM Kyle Kingsbury <aphyr@jepsen.io> wrote:
> Process 23, at least in this log, goes on to set the session to SERIALIZABLE.
> This is a bit weird, because that statement should be executed by JDBC when we
> started the transaction, right? Makes me wonder if maybe the log lines are out
> of order? Now that I look... these are suspiciously alphabetically ordered. If
> you sorted the file, that might explain it. :)

I messed that up massively -- sorry. Attached is a revised version of
the same thing, which fixes the accidental sorting. Just the relevant
area, with the proper order.

Note also that this is the log_line_prefix used:  log_line_prefix='%n
%p %x %i %c %a '

Why is it that process 5 updates the txn1 table at the end, having
done insert/update stuff to the txn2 table earlier on in the same xact
(or what appears to be the same xact)? Note that that update is the
only statement from the entire sequence that affects txn1 (all other
statements affect either txn0 or txn2). It's as if the homebrew upsert
changed its mind about which table it's supposed to affect. I cannot
speak Clojure, so I have to ask. Maybe this just demonstrates that an
approach based on reading the logs like this just isn't workable. I
think that this original order might still be slightly off, due to the
vagaries of stdio with concurrent writes to the log file -- not sure
how much that, though.

I suspect that I should try a high fidelity way of capturing the
original execution order. I'm thinking of something like
test_decoding, which will show us a representation in LSN order, which
is precisely what we want:

https://www.postgresql.org/docs/devel/test-decoding.html

--
Peter Geoghegan

Вложения

process_23_5_revised.log

Re: Potential G2-item cycles under serializable isolation

От

Peter Geoghegan

Дата:

06 июня 2020 г., 01:44:51

On Fri, Jun 5, 2020 at 12:43 PM Peter Geoghegan <pg@bowt.ie> wrote:
> I suspect that I should try a high fidelity way of capturing the
> original execution order. I'm thinking of something like
> test_decoding

I had a better idea -- use rr debugging recorder to get a recording of
an entire execution:

https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#Recording_Postgres_using_rr_Record_and_Replay_Framework

Attached results.edn + G2-item.txt files show the anomalies that were
reported during an execution I recorded a little earlier. We can now
use this as a point of reference for further debugging.

One less obvious benefit of doing this is that rr makes the execution
of the entire Postgres instance single threaded, so we can be sure
that the log faithfully represents the order of operations during
execution, within and across PostgreSQL backends -- no more
uncertainty about the exact order, when each xact starts and ends,
etc. (Note that I had to use rr's "chaos mode" feature to reproduce
the race condition problem in the first place -- it was easy once I
remembered about that option.)

This useful property is pretty much self-evident from the logs, since
rr includes an event number. For example, the log starts like this:

[rr 316370 892] 1591388133.699-316370--0--5edaa7e5.4d3d2- LOG:
starting PostgreSQL 13beta1-dc on x86_64-pc-linux-gnu, compiled by gcc
(Ubuntu 9.3.0-10ubuntu2) 9.3.0, 64-bit
[rr 316370 1056] 1591388133.706-316370--0--5edaa7e5.4d3d2- LOG:
listening on IPv6 address "::1", port 5432
[rr 316370 1068] 1591388133.706-316370--0--5edaa7e5.4d3d2- LOG:
listening on IPv4 address "127.0.0.1", port 5432
[rr 316370 1093] 1591388133.707-316370--0--5edaa7e5.4d3d2- LOG:
listening on Unix socket "/tmp/.s.PGSQL.5432"
[rr 316371 1303] 1591388133.716-316371--0--5edaa7e5.4d3d3- LOG:
database system was shut down at 2020-06-05 13:15:18 PDT
[rr 316370 1726] 1591388133.735-316370--0--5edaa7e5.4d3d2- LOG:
database system is ready to accept connections
[rr 316448 4190] 1591388153.592-316448--0--5edaa7f9.4d420-[unknown]
LOG:  connection received: host=127.0.0.1 port=41078

The second number in each log line represents an rr event number --
notice that these are strictly monotonically increasing. I don't even
need a PID to be able to start an interactive debugging session during
execution of a query of interest, though -- all I need is an event
number (such as 4190) to reliably get an interactive debugging
session. Since the entire recorded execution is actually single
threaded, and because of Postgres' multi-process architecture, that
information alone unambiguously identifies a particular
backend/connection and query execution.

I also make available the entire log that I showed the start of just now:

https://drive.google.com/file/d/1YFTVChfEtzZncdjakXKpp22HKDNFoWKo/view?usp=sharing

This hasn't been filtered by me this time around, so anyone playing
along at home can look through everything, including the queries
pertaining to the two G2-Item issues Jepsen complains about.

Hopefully this helps you, Kyle. Would you mind repeating the exercise
from earlier with this new log file/execution?

Thanks
-- 
Peter Geoghegan

Вложения

Re: Potential G2-item cycles under serializable isolation

От

Kyle Kingsbury

Дата:

06 июня 2020 г., 15:49:56

On 6/5/20 6:44 PM, Peter Geoghegan wrote:

> Hopefully this helps you, Kyle. Would you mind repeating the exercise
> from earlier with this new log file/execution?

Here's a quick runthrough of G2-item #0:

Let:
   T1 = {:type :ok, :f :txn, :value [[:append 1468 3] [:r 1469 nil] [:append 
1439 2]], :time 22408795244, :process 49, :index 10392}
   T2 = {:type :ok, :f :txn, :value [[:append 1469 1] [:append 1456 2] [:r 1468 
nil]], :time 22565885165, :process 45, :index 10464}

Then:
   - T1 < T2, because T1 observed the initial (nil) state of 1469, which T2 
created by appending 1.
   - However, T2 < T1, because T2 observed the initial (nil) state of 1468, 
which T1 created by appending 3: a contradiction!

Classic G2: two rw-edges, where each transaction failed to observe the other's 
write. I restricted the log to just those processes:

egrep 'process (49|45)' log_rr_recording.log

I jumped to the write of 1468 with /'1468'/, then backtracked to process 45's BEGIN:

[rr 316503 580915] 1591388175.849-316503-5/231-0-SHOW-5edaa7f9.4d457-jepsen 
process 45 LOG:  execute <unnamed>: SHOW TRANSACTION ISOLATION LEVEL
[rr 316503 581192] 1591388175.858-316503-5/232-0-SET-5edaa7f9.4d457-jepsen 
process 45 LOG:  execute <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION 
ISOLATION LEVEL SERIALIZABLE
[rr 316503 581202] 1591388175.858-316503-5/233-0-BEGIN-5edaa7f9.4d457-jepsen 
process 45 LOG:  execute <unnamed>: BEGIN

45 tries to append 1 to 1469, but fails because it found zero rows:

[rr 316503 581206] 1591388175.859-316503-5/233-0-UPDATE-5edaa7f9.4d457-jepsen 
process 45 LOG:  execute <unnamed>: update txn2 set val = CONCAT(val, ',', $1) 
where id = $2
  1591388175.859-316503-5/233-0-UPDATE-5edaa7f9.4d457-jepsen process 45 DETAIL:  
parameters: $1 = '1', $2 = '1469'

45 backs off to an insert:

[rr 316503 581222] 1591388175.859-316503-5/233-0-SAVEPOINT-5edaa7f9.4d457-jepsen 
process 45 LOG:  execute <unnamed>: savepoint upsert

49 jumps in to start a transaction:

[rr 316514 581232] 1591388175.860-316514-13/401-0-SHOW-5edaa7f9.4d462-jepsen 
process 49 LOG:  execute <unnamed>: SHOW TRANSACTION ISOLATION LEVEL
[rr 316514 581242] 1591388175.860-316514-13/402-0-SET-5edaa7f9.4d462-jepsen 
process 49 LOG:  execute <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION 
ISOLATION LEVEL SERIALIZABLE
[rr 316514 581256] 1591388175.860-316514-13/403-0-BEGIN-5edaa7f9.4d462-jepsen 
process 49 LOG:  execute <unnamed>: BEGIN

49 tries to update 1468, which also fails, and backs off to an insert:

[rr 316514 581350] 1591388175.864-316514-13/403-0-UPDATE-5edaa7f9.4d462-jepsen 
process 49 LOG:  execute <unnamed>: update txn1 set val = CONCAT(val, ',', $1) 
where id = $2
  1591388175.864-316514-13/403-0-UPDATE-5edaa7f9.4d462-jepsen process 49 
DETAIL:  parameters: $1 = '3', $2 = '1468'
[rr 316514 581364] 
1591388175.864-316514-13/403-0-SAVEPOINT-5edaa7f9.4d462-jepsen process 49 LOG:  
execute <unnamed>: savepoint upsert
[rr 316514 581374] 1591388175.865-316514-13/403-0-INSERT-5edaa7f9.4d462-jepsen 
process 49 LOG:  execute <unnamed>: insert into txn1 (id, sk, val) values ($1, 
$2, $3)
  1591388175.865-316514-13/403-0-INSERT-5edaa7f9.4d462-jepsen process 49 
DETAIL:  parameters: $1 = '1468', $2 = '1468', $3 = '3'

45 performs its insert of 1469:

[rr 316503 581548] 1591388175.871-316503-5/233-0-INSERT-5edaa7f9.4d457-jepsen 
process 45 LOG:  execute <unnamed>: insert into txn2 (id, sk, val) values ($1, 
$2, $3)
  1591388175.871-316503-5/233-0-INSERT-5edaa7f9.4d457-jepsen process 45 DETAIL:  
parameters: $1 = '1469', $2 = '1469', $3 = '1'
[rr 316503 581562] 
1591388175.872-316503-5/233-1680808-RELEASE-5edaa7f9.4d457-jepsen process 45 
LOG:  execute <unnamed>: release savepoint upsert

And goes on to update 1456, which also fails, so it inserts there too:

[rr 316503 581576] 
1591388175.872-316503-5/233-1680808-UPDATE-5edaa7f9.4d457-jepsen process 45 
LOG:  execute <unnamed>: update txn0 set val = CONCAT(val, ',', $1) where id = $2
  1591388175.872-316503-5/233-1680808-UPDATE-5edaa7f9.4d457-jepsen process 45 
DETAIL:  parameters: $1 = '2', $2 = '1456'
[rr 316503 581980] 
1591388175.887-316503-5/233-1680808-SAVEPOINT-5edaa7f9.4d457-jepsen process 45 
LOG:  execute <unnamed>: savepoint upsert
[rr 316503 581994] 
1591388175.887-316503-5/233-1680808-INSERT-5edaa7f9.4d457-jepsen process 45 
LOG:  execute <unnamed>: insert into txn0 (id, sk, val) values ($1, $2, $3)
  1591388175.887-316503-5/233-1680808-INSERT-5edaa7f9.4d457-jepsen process 45 
DETAIL:  parameters: $1 = '1456', $2 = '1456', $3 = '2'
[rr 316503 582008] 
1591388175.888-316503-5/233-1680808-RELEASE-5edaa7f9.4d457-jepsen process 45 
LOG:  execute <unnamed>: release savepoint upsert

45 reads 1468, observing 0 rows:

[rr 316503 582018] 
1591388175.888-316503-5/233-1680808-SELECT-5edaa7f9.4d457-jepsen process 45 
LOG:  execute <unnamed>: select (val) from txn1 where id = $1
  1591388175.888-316503-5/233-1680808-SELECT-5edaa7f9.4d457-jepsen process 45 
DETAIL:  parameters: $1 = '1468'

And 45 commits:

[rr 316503 582028] 
1591388175.888-316503-5/233-1680808-COMMIT-5edaa7f9.4d457-jepsen process 45 
LOG:  execute <unnamed>: COMMIT

49's insert succeeded, and it goes on to read 1469, finding 0 rows as well. This 
should cause the transaction to abort, because 1469 was written by a committed 
transaction performed by process 45, *and* 45 didn't observe 49's writes.

[rr 316514 582409] 
1591388175.902-316514-13/403-1680806-RELEASE-5edaa7f9.4d462-jepsen process 49 
LOG:  execute <unnamed>: release savepoint upsert
[rr 316514 582419] 
1591388175.902-316514-13/403-1680806-SELECT-5edaa7f9.4d462-jepsen process 49 
LOG:  execute <unnamed>: select (val) from txn2 where sk = $1
  1591388175.902-316514-13/403-1680806-SELECT-5edaa7f9.4d462-jepsen process 49 
DETAIL:  parameters: $1 = '1469'

49 goes on to append to 1439, which, again, fails and backs off to insert:

[rr 316514 583918] 
1591388175.956-316514-13/403-1680806-UPDATE-5edaa7f9.4d462-jepsen process 49 
LOG:  execute <unnamed>: update txn0 set val = CONCAT(val, ',', $1) where id = $2
  1591388175.956-316514-13/403-1680806-UPDATE-5edaa7f9.4d462-jepsen process 49 
DETAIL:  parameters: $1 = '2', $2 = '1439'
[rr 316514 584036] 
1591388175.960-316514-13/403-1680806-SAVEPOINT-5edaa7f9.4d462-jepsen process 49 
LOG:  execute <unnamed>: savepoint upsert
[rr 316514 584283] 
1591388175.970-316514-13/403-1680806-INSERT-5edaa7f9.4d462-jepsen process 49 
LOG:  execute <unnamed>: insert into txn0 (id, sk, val) values ($1, $2, $3)
  1591388175.970-316514-13/403-1680806-INSERT-5edaa7f9.4d462-jepsen process 49 
DETAIL:  parameters: $1 = '1439', $2 = '1439', $3 = '2'[rr 316514 584299] 
1591388175.971-316514-13/403-1680806-RELEASE-5edaa7f9.4d462-jepsen process 49 
LOG:  execute <unnamed>: release savepoint upsert

And 49 commits, resulting in a G2-item.

[rr 316514 584309] 
1591388175.971-316514-13/403-1680806-COMMIT-5edaa7f9.4d462-jepsen process 49 
LOG:  execute <unnamed>: COMMIT

--Kyle

Re: Potential G2-item cycles under serializable isolation

От

Peter Geoghegan

Дата:

07 июня 2020 г., 01:24:35

On Sat, Jun 6, 2020 at 5:49 AM Kyle Kingsbury <aphyr@jepsen.io> wrote:
> Here's a quick runthrough of G2-item #0:
>
> Let:
>    T1 = {:type :ok, :f :txn, :value [[:append 1468 3] [:r 1469 nil] [:append
> 1439 2]], :time 22408795244, :process 49, :index 10392}
>    T2 = {:type :ok, :f :txn, :value [[:append 1469 1] [:append 1456 2] [:r 1468
> nil]], :time 22565885165, :process 45, :index 10464}
>
> Then:
>    - T1 < T2, because T1 observed the initial (nil) state of 1469, which T2
> created by appending 1.
>    - However, T2 < T1, because T2 observed the initial (nil) state of 1468,
> which T1 created by appending 3: a contradiction!
>
> Classic G2: two rw-edges, where each transaction failed to observe the other's
> write.

Here is a restatement of the same anomaly with RR event number
annotations, to make it easy for other people to match up each
observation/modification to a query from the log file I made available
yesterday:

G2-item #0
Let:
  T1 = {:type :ok, :f :txn, :value [[:append 1468 3] [:r 1469 nil]
[:append 1439 2]], :time 22408795244, :process 49, :index 10392}
  T2 = {:type :ok, :f :txn, :value [[:append 1469 1] [:append 1456 2]
[:r 1468 nil]], :time 22565885165, :process 45, :index 10464}

Then:
  - T1 < T2, because T1 observed the initial (nil) state of 1469 (at
RR 582419), which T2 created by appending 1 (at RR 581548).
  - However, T2 < T1, because T2 observed the initial (nil) state of
1468 (at RR 582018), which T1 created by appending 3 (at RR 581374):
-- a contradiction!

Possibly silly question: Doesn't "observing the initial (nil) state"
mean that no rows were returned by a SELECT statement? For example,
doesn't that just mean that the following SELECT statement returns no
rows?:

[rr 316514 582419]
1591388175.902-316514-13/403-1680806-SELECT-5edaa7f9.4d462-jepsen
process 49 LOG:  execute <unnamed>: select (val) from txn2 where sk =
$1
 1591388175.902-316514-13/403-1680806-SELECT-5edaa7f9.4d462-jepsen
process 49 DETAIL:  parameters: $1 = '1469'

--
Peter Geoghegan

Re: Potential G2-item cycles under serializable isolation

От

Peter Geoghegan

Дата:

08 июня 2020 г., 05:54:21

On Sat, Jun 6, 2020 at 3:24 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Here is a restatement of the same anomaly with RR event number
> annotations, to make it easy for other people to match up each
> observation/modification to a query from the log file I made available
> yesterday:

Attached is a tentative bug fix. With this applied, I cannot get
Jepsen/Elle to complain about a G2-Item anymore, nor any other
anomaly, even after a dozen or so attempts. The regression tests pass,
too.

The issue seems to be that heapam's mechanism for adding a "conflict
out" can get some details subtly wrong. This can happen in the event
of a concurrently updated tuple where the updating xact aborts. So for
each pair of transactions that committed in each G2-Item, there seemed
to be a third updating transaction that aborted, messing things up. I
think that the "conflict out" stuff ought to behave as if an updated
tuple was never updated when it's not visible to our xact anyway. We
should use the XID that created the original tuple for "conflict out"
purposes -- not the (possibly aborted) updater's XID (i.e. not the
HeapTupleHeaderGetUpdateXid() XID).

In the example RR log that I posted the other day, the update + abort
transaction was here (jepsen process 39/xid 1680826):

[rr 316505 584863]
1591388175.991-316505-7/407-1680826-SELECT-5edaa7f9.4d459-jepsen
process 39 LOG:  execute <unnamed>: select (val) from txn0 where id =
$1
 1591388175.991-316505-7/407-1680826-SELECT-5edaa7f9.4d459-jepsen
process 39 DETAIL:  parameters: $1 = '1470'
[rr 316505 584881]
1591388175.992-316505-7/0-1680826-ROLLBACK-5edaa7f9.4d459-jepsen
process 39 LOG:  execute <unnamed>: ROLLBACK

Kyle: Shouldn't be too hard to build Postgres from source if you want
to test it yourself. Either way, thanks for the report!

-- 
Peter Geoghegan

Вложения

0001-Fix-HeapCheckForSerializableConflictOut-bug.patch

Re: Potential G2-item cycles under serializable isolation

От

Peter Geoghegan

Дата:

08 июня 2020 г., 05:59:09

On Sun, Jun 7, 2020 at 7:54 PM Peter Geoghegan <pg@bowt.ie> wrote:
> In the example RR log that I posted the other day, the update + abort
> transaction was here (jepsen process 39/xid 1680826):
>
> [rr 316505 584863]
> 1591388175.991-316505-7/407-1680826-SELECT-5edaa7f9.4d459-jepsen
> process 39 LOG:  execute <unnamed>: select (val) from txn0 where id =
> $1
>  1591388175.991-316505-7/407-1680826-SELECT-5edaa7f9.4d459-jepsen
> process 39 DETAIL:  parameters: $1 = '1470'
> [rr 316505 584881]
> 1591388175.992-316505-7/0-1680826-ROLLBACK-5edaa7f9.4d459-jepsen
> process 39 LOG:  execute <unnamed>: ROLLBACK

I mean in the case of the G2-Item that Kyle chose to focus on
yesterday specifically --  G2-item #0 from the log_rr_recording.log
execution.

-- 
Peter Geoghegan

Re: Potential G2-item cycles under serializable isolation

От

Peter Geoghegan

Дата:

09 июня 2020 г., 04:26:01

On Sun, Jun 7, 2020 at 7:54 PM Peter Geoghegan <pg@bowt.ie> wrote:
> The issue seems to be that heapam's mechanism for adding a "conflict
> out" can get some details subtly wrong. This can happen in the event
> of a concurrently updated tuple where the updating xact aborts. So for
> each pair of transactions that committed in each G2-Item, there seemed
> to be a third updating transaction that aborted, messing things up. I
> think that the "conflict out" stuff ought to behave as if an updated
> tuple was never updated when it's not visible to our xact anyway. We
> should use the XID that created the original tuple for "conflict out"
> purposes -- not the (possibly aborted) updater's XID (i.e. not the
> HeapTupleHeaderGetUpdateXid() XID).

The functionality in question (the code from the
HeapCheckForSerializableConflictOut() case statement) was originally
discussed here, with the details finalized less than a week before SSI
was committed in 2011:

https://www.postgresql.org/message-id/flat/1296499247.11513.777.camel%40jdavis#9e407424df5f8794360b6e84de60200a

It hasn't really changed since that time.

Kevin, Jeff: What do you think of the fix I have proposed? Attached is
a revised version, with better comments and a proper commit message.
Somebody should review this patch before I proceed with commit.

Thanks
--
Peter Geoghegan

Вложения

v2-0001-Avoid-aborted-update-serialization-anomalies.patch

Re: Potential G2-item cycles under serializable isolation

От

Thomas Munro

Дата:

09 июня 2020 г., 05:00:58

On Tue, Jun 9, 2020 at 1:26 PM Peter Geoghegan <pg@bowt.ie> wrote:
> The functionality in question (the code from the
> HeapCheckForSerializableConflictOut() case statement) was originally
> discussed here, with the details finalized less than a week before SSI
> was committed in 2011:
>
> https://www.postgresql.org/message-id/flat/1296499247.11513.777.camel%40jdavis#9e407424df5f8794360b6e84de60200a
>
> It hasn't really changed since that time.

Right, the only change was to move things around a bit to suport new
table AMs.  Speaking of which, it looks like the new comment atop
CheckForSerializableConflictOut() could use some adjustment.  It says
"A table AM is reading a tuple that has been modified.  After
determining that it is visible to us, it should call this function..."
but it seems the truth is a bit more complicated than that.

Re: Potential G2-item cycles under serializable isolation

От

Peter Geoghegan

Дата:

09 июня 2020 г., 05:07:46

On Mon, Jun 8, 2020 at 7:01 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> Right, the only change was to move things around a bit to suport new
> table AMs.  Speaking of which, it looks like the new comment atop
> CheckForSerializableConflictOut() could use some adjustment.  It says
> "A table AM is reading a tuple that has been modified.  After
> determining that it is visible to us, it should call this function..."
> but it seems the truth is a bit more complicated than that.

Right. I think that you can go ahead and change it without further input here.

-- 
Peter Geoghegan

Re: Potential G2-item cycles under serializable isolation

От

Peter Geoghegan

Дата:

09 июня 2020 г., 05:12:05

On Mon, Jun 8, 2020 at 7:07 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Right. I think that you can go ahead and change it without further input here.

Also, is it necessary to have the TransactionIdEquals() tests in both
HeapCheckForSerializableConflictOut() and
CheckForSerializableConflictOut()? Apart from anything else, the test
in HeapCheckForSerializableConflictOut() occurs before we establish
the topmost XID -- it could be a subxid, in which case the test is
wrong. Though it doesn't actually matter on account of the redundancy,
IIUC.

-- 
Peter Geoghegan

Re: Potential G2-item cycles under serializable isolation

От

Andres Freund

Дата:

09 июня 2020 г., 05:17:06

Hi,

On 2020-06-08 18:26:01 -0700, Peter Geoghegan wrote:
> On Sun, Jun 7, 2020 at 7:54 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > The issue seems to be that heapam's mechanism for adding a "conflict
> > out" can get some details subtly wrong. This can happen in the event
> > of a concurrently updated tuple where the updating xact aborts. So for
> > each pair of transactions that committed in each G2-Item, there seemed
> > to be a third updating transaction that aborted, messing things up. I
> > think that the "conflict out" stuff ought to behave as if an updated
> > tuple was never updated when it's not visible to our xact anyway. We
> > should use the XID that created the original tuple for "conflict out"
> > purposes -- not the (possibly aborted) updater's XID (i.e. not the
> > HeapTupleHeaderGetUpdateXid() XID).
> 
> The functionality in question (the code from the
> HeapCheckForSerializableConflictOut() case statement) was originally
> discussed here, with the details finalized less than a week before SSI
> was committed in 2011:
> 
> https://www.postgresql.org/message-id/flat/1296499247.11513.777.camel%40jdavis#9e407424df5f8794360b6e84de60200a
> 
> It hasn't really changed since that time.
> 
> Kevin, Jeff: What do you think of the fix I have proposed? Attached is
> a revised version, with better comments and a proper commit message.
> Somebody should review this patch before I proceed with commit.

This should be testable with isolationtester, right?

Greetings,

Andres Freund

Re: Potential G2-item cycles under serializable isolation

От

Peter Geoghegan

Дата:

09 июня 2020 г., 05:30:15

On Mon, Jun 8, 2020 at 7:17 PM Andres Freund <andres@anarazel.de> wrote:
> This should be testable with isolationtester, right?

I'll try to write an isolationtester test tomorrow.

-- 
Peter Geoghegan

Re: Potential G2-item cycles under serializable isolation

От

Thomas Munro

Дата:

09 июня 2020 г., 05:54:51

On Tue, Jun 9, 2020 at 2:12 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Also, is it necessary to have the TransactionIdEquals() tests in both
> HeapCheckForSerializableConflictOut() and
> CheckForSerializableConflictOut()? Apart from anything else, the test
> in HeapCheckForSerializableConflictOut() occurs before we establish
> the topmost XID -- it could be a subxid, in which case the test is
> wrong. Though it doesn't actually matter on account of the redundancy,
> IIUC.

The double-check was present in the original commit dafaa3efb75.  It
seems like a pretty straightforward optimisation:
SubTransGetTopmostTransaction(xid) might cause I/O so it's worth the
check if you can already bail out sooner.  Admittedly the recent
splitting of that function made that a bit less clear.

Re: Potential G2-item cycles under serializable isolation

От

Peter Geoghegan

Дата:

11 июня 2020 г., 04:10:25

On Mon, Jun 8, 2020 at 7:30 PM Peter Geoghegan <pg@bowt.ie> wrote:
> I'll try to write an isolationtester test tomorrow.

Attached is v3, which has an isolationtester test.

It also fixes an assertion failure that could happen in rare cases,
which was reported to me privately by Thomas. We now explicitly don't
go ahead with an XID when it precedes TransactionXmin.

I overstated the significance of the fact that the transaction aborts
in the case that we looked at. That isn't an essential part of the
anomaly. See the isolation test.

-- 
Peter Geoghegan

Вложения

v3-0001-Avoid-update-conflict-out-serialization-anomalies.patch

Re: Potential G2-item cycles under serializable isolation

От

Kyle Kingsbury

Дата:

11 июня 2020 г., 06:14:42

On 6/10/20 9:10 PM, Peter Geoghegan wrote:
> On Mon, Jun 8, 2020 at 7:30 PM Peter Geoghegan <pg@bowt.ie> wrote:
>> I'll try to write an isolationtester test tomorrow.
> Attached is v3, which has an isolationtester test.
>
> It also fixes an assertion failure that could happen in rare cases,
> which was reported to me privately by Thomas. We now explicitly don't
> go ahead with an XID when it precedes TransactionXmin.

Thanks so much for investigating this, Peter. And thanks to everyone else here 
who helped get this sorted out--y'all have done great work. :)

I wanted to let you know that I've put together a draft of a report on these 
findings, and if you've got any comments you'd like to offer, I'd be happy to 
hear them, either on-list or privately. 
http://jepsen.io/analyses/postgresql-12.3?draft-token=Kets1Quayfs

People reading this list: hi please don't leak this to twitter/HN/etc just yet, 
I'd like to give it a few days to settle before release. :)

--Kyle

Re: Potential G2-item cycles under serializable isolation

От

Thomas Munro

Дата:

11 июня 2020 г., 06:16:13

On Thu, Jun 11, 2020 at 1:10 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Attached is v3, which has an isolationtester test.

Nice repro.  We have:

foo happened before bar, because it can't see bar_insert
bar happened before foo, because it can't see foo_insert (!)

Confirmed here that this anomaly is missed in master and detected with
the patch as shown, and check-world passes.

Explaining it to myself like I'm five:

The tuple foo_insert was concurrently inserted *and* deleted/updated.
Our coding screwed that case up.  Your fix teaches the
HEAPTUPLE_DELETE_IN_PROGRESS case to check if it's also an insert we
can't see, and then handles it the same as non-visible HEAPTUPLE_LIVE
or HEAPTUPLE_INSERT_IN_PROGRESS.  That is, by taking the xmin
(inserter), not the xmax (updater/deleter) for conflict-out checking.
(As mentioned on IM, I wondered for a bit why it's OK that we now lack
a check for conflict out to the deleter in the insert+delete case, but
I think that's OK because it certainly happened after the insert
committed.)

Then you realised that HEAPTUPLE_RECENTLY_DEAD has no reason to be
treated differently.

Minor nits (already mentioned via IM but FTR): bar_first isn't
necessary in the test, s/foo-insert/foo_insert/

Looks pretty good to me.  Tricky problem, really excellent work.

Re: Potential G2-item cycles under serializable isolation

От

Thomas Munro

Дата:

11 июня 2020 г., 08:40:55

On Thu, Jun 11, 2020 at 3:14 PM Kyle Kingsbury <aphyr@jepsen.io> wrote:
> I wanted to let you know that I've put together a draft of a report on these
> findings, and if you've got any comments you'd like to offer, I'd be happy to
> hear them, either on-list or privately.
> http://jepsen.io/analyses/postgresql-12.3?draft-token=Kets1Quayfs

What do you think about this documentation update?

I'm not experienced with biblioentry academic references; if someone
would like to show me how this is supposed to be cited, I'm all ears.

Вложения

0001-Doc-Add-references-for-SI-and-SSI.patch

Re: Potential G2-item cycles under serializable isolation

От

Thomas Munro

Дата:

11 июня 2020 г., 13:12:47

On Tue, Jun 9, 2020 at 2:08 PM Peter Geoghegan <pg@bowt.ie> wrote:
> On Mon, Jun 8, 2020 at 7:01 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > Right, the only change was to move things around a bit to suport new
> > table AMs.  Speaking of which, it looks like the new comment atop
> > CheckForSerializableConflictOut() could use some adjustment.  It says
> > "A table AM is reading a tuple that has been modified.  After
> > determining that it is visible to us, it should call this function..."
> > but it seems the truth is a bit more complicated than that.
>
> Right. I think that you can go ahead and change it without further input here.

It's only comments, but it'd still be good to get some review since
it's essentially describing the relevant contract.  Here's what I came
up with.

Вложения

0001-Improve-comments-for-Heap-CheckForSerializableConfli.patch

Re: Potential G2-item cycles under serializable isolation

От

Kyle Kingsbury

Дата:

11 июня 2020 г., 15:39:13

On 6/11/20 1:40 AM, Thomas Munro wrote:
> What do you think about this documentation update?

I think this is really helpful! You can go a little further if you like. Right 
now, SSI vs serializable and SI vs RR are both described as "differences in 
behavior", which kinda leaves it unclear as to how those levels are related. If 
you want to follow Berenson, Adya, et al.'s broad interpretation, I'd say 
something like

"Snapshot isolation prevents some anomalies, like phantoms, which repeatable 
read allows. It also allows some anomalies, like G2-item (write skew), which 
repeatable read prevents."

... and if you like, you could follow that with a discussion of how SI, thanks 
to ambiguity in the ANSI specification, could also be interpreted as stronger 
than RR. If you stick with the strict interpretation, I'd add a note here about 
how readers familiar with the SI literature might be confused, and that you're 
diverging from Berenson et al., but your position is defensible under the spec.

"Serializable Snapshot Isolation is stronger than serializability: it allows 
only serializable histories--and prohibits some additional histories in addition."

If you wanna talk about the performance rationale, that's valid too. :)

--Kyle

Re: Potential G2-item cycles under serializable isolation

От

Peter Geoghegan

Дата:

11 июня 2020 г., 20:12:02

On Wed, Jun 10, 2020 at 8:16 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> Confirmed here that this anomaly is missed in master and detected with
> the patch as shown, and check-world passes.

> Looks pretty good to me.  Tricky problem, really excellent work.

Pushed a slightly refined version of v3 to all supported branches.
Thanks for the review!

--
Peter Geoghegan

Re: Potential G2-item cycles under serializable isolation

От

Peter Geoghegan

Дата:

11 июня 2020 г., 20:56:58

On Thu, Jun 11, 2020 at 3:13 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> It's only comments, but it'd still be good to get some review since
> it's essentially describing the relevant contract.  Here's what I came
> up with.

LGTM.

-- 
Peter Geoghegan

Re: Potential G2-item cycles under serializable isolation

От

Andres Freund

Дата:

11 июня 2020 г., 22:30:23

Hi,

On 2020-06-11 17:40:55 +1200, Thomas Munro wrote:
> +   <para>
> +    The Repeatable Read isolation level is implemented using a technique
> +    known in academic database literature and in some other database products
> +    as <firstterm>Snapshot Isolation</firstterm>.  Differences in behavior
> +    may be observed when compared with systems using other implementation
> +    techniques.  For a full treatment, please see
> +    <xref linkend="berenson95"/>.
> +   </para>

Could it be worthwhile to narrow the "differences in behaviour" bit to
read-write transactions? IME the biggest reason people explicitly use RR
over RC is to avoid phantom reads in read-only transactions. Seems nicer
to not force users to read an academic paper to figure that out?


> @@ -1726,6 +1744,13 @@ SELECT pg_advisory_lock(q.id) FROM
>      see a transient state that is inconsistent with any serial execution
>      of the transactions on the master.
>     </para>
> +
> +   <para>
> +    Access to the system catalogs is not done using the isolation level
> +    of the current transaction.  This has the effect that newly created
> +    database objects such as tables become visible to concurrent Repeatable
> +    Read and Serializable transactions, even though their contents does not.
> +   </para>
>    </sect1>

Should it be pointed out that that's about *internal* accesses to
catalogs? This could be understood to apply to a SELECT * FROM pg_class;
where it actually doesn't apply?

Greetings,

Andres Freund

Re: Potential G2-item cycles under serializable isolation

От

Andres Freund

Дата:

11 июня 2020 г., 22:47:58

On 2020-06-11 12:30:23 -0700, Andres Freund wrote:
> Hi,
> 
> On 2020-06-11 17:40:55 +1200, Thomas Munro wrote:
> > +   <para>
> > +    The Repeatable Read isolation level is implemented using a technique
> > +    known in academic database literature and in some other database products
> > +    as <firstterm>Snapshot Isolation</firstterm>.  Differences in behavior
> > +    may be observed when compared with systems using other implementation
> > +    techniques.  For a full treatment, please see
> > +    <xref linkend="berenson95"/>.
> > +   </para>
> 
> Could it be worthwhile to narrow the "differences in behaviour" bit to
> read-write transactions? IME the biggest reason people explicitly use RR
> over RC is to avoid phantom reads in read-only transactions. Seems nicer
> to not force users to read an academic paper to figure that out?

But, on second thought, it might be too difficult to phrase this
concisely and correctly, given the annoying issue of SI allowing for
read-only transactions to observe violations of serializability. I don't
think that's a RR violation, but maybe it could be understood as being
about serializability too easily?

Re: Potential G2-item cycles under serializable isolation

От

Andres Freund

Дата:

11 июня 2020 г., 22:54:36

Hi,

On 2020-06-11 08:39:13 -0400, Kyle Kingsbury wrote:
> On 6/11/20 1:40 AM, Thomas Munro wrote:
> > What do you think about this documentation update?
> 
> I think this is really helpful! You can go a little further if you like.
> Right now, SSI vs serializable and SI vs RR are both described as
> "differences in behavior", which kinda leaves it unclear as to how those
> levels are related. If you want to follow Berenson, Adya, et al.'s broad
> interpretation, I'd say something like
> 
> "Snapshot isolation prevents some anomalies, like phantoms, which repeatable
> read allows. It also allows some anomalies, like G2-item (write skew), which
> repeatable read prevents."

I think the percentage of readers that understand "G2-item" or even
"write skew", but don't already know what snapshot isolation implies, is
pretty close to zero. I like the idea of having a bit more detail, but I
think it'd have to be worded differently to be beneficial.

I'd also phrase it as "some definitions of repeatable read", but ... :).

Intuitively, and that's not meant as an argument for anything, I find
prohibiting write skew in something named "repeatable read" just plain
odd.  Leaving naming aside, I think there's good practical reasons for
applications using read-committed, snapshot isolation, and
serializable. It's not obvious to me that there are cases where the
Berenson definition of RR is something that's desirable to use over
RR/SI/S.

I suspect what we really want, medium term, is some guidelines around
what isolation levels are appropriate for which use-case. But that's a
nontrivial thing to write, I think, and there's lot of room for
disagreements...

Greetings,

Andres Freund

Re: Potential G2-item cycles under serializable isolation

От

Thomas Munro

Дата:

12 июня 2020 г., 01:14:47

On Fri, Jun 12, 2020 at 7:30 AM Andres Freund <andres@anarazel.de> wrote:
> > +    Access to the system catalogs is not done using the isolation level
> > +    of the current transaction.  This has the effect that newly created
> > +    database objects such as tables become visible to concurrent Repeatable
> > +    Read and Serializable transactions, even though their contents does not.

> Should it be pointed out that that's about *internal* accesses to
> catalogs? This could be understood to apply to a SELECT * FROM pg_class;
> where it actually doesn't apply?

Yeah.  I hesitated about writing that because they don't exactly use
the current isolation level in its full glory, they just use the
current snapshot.  Specifically, predicate locks are not taken on
catalogs.  That seemed a bit beyond my personal murkiness threshold
for inclusion in the manual, but let me see if I can come up with the
right weasel words.  Attached, but also inline here for review:

+   <para>
+    Internal access to the system catalogs is not done using the isolation
+    level of the current transaction.  This means that newly created
+    database objects such as tables are visible to concurrent Repeatable Read
+    and Serializable transactions, even though the rows they contain are not.
+    In contrast, queries that explicitly examine the system catalogs don't
+    see rows representing concurrently created database objects in the higher
+    isolation levels.
+   </para>

On Fri, Jun 12, 2020 at 7:48 AM Andres Freund <andres@anarazel.de> wrote:
> On 2020-06-11 12:30:23 -0700, Andres Freund wrote:
> > Could it be worthwhile to narrow the "differences in behaviour" bit to
> > read-write transactions? IME the biggest reason people explicitly use RR
> > over RC is to avoid phantom reads in read-only transactions. Seems nicer
> > to not force users to read an academic paper to figure that out?
>
> But, on second thought, it might be too difficult to phrase this
> concisely and correctly, given the annoying issue of SI allowing for
> read-only transactions to observe violations of serializability. I don't
> think that's a RR violation, but maybe it could be understood as being
> about serializability too easily?

The goal of this new paragraph is to highlight that there is a
difference between traditional System R-style RR (as assumed but not
required by the SQL spec, see disputes) and SI.  RC vs RR etc is
covered elsewhere.

I agree that it's way over the top to go into the gory details in our
manual, as Kyle suggested, though; my goal here is just to flag up
this key point in as few words as possible.  Kyle is of course right:
it might affect someone coming from another database product or
database theory textbook.  So, we should state the proper name of the
exact technique we are using, and refer the curious to primary sources
for more.

Here's a new attempt at that.  Attached, but I'll also just include
the new paragraph here because it's short:

+   <para>
+    The Repeatable Read isolation level is implemented using a technique
+    known in academic database literature and in some other database products
+    as <firstterm>Snapshot Isolation</firstterm>.  Differences in behavior
+    and performance may be observed when compared with systems using a more
+    traditional technique that permits less concurrency, and some other
+    systems may even offer Repeatable Read and Snapshot Isolation as distinct
+    isolation levels.  The allowable phenomena that can be used to distinguish
+    the two techniques were not formalized by database researchers until after
+    the SQL Standard was developed, and are outside the scope of this manual.
+    For a full treatment, please see <xref linkend="berenson95"/>.
+   </para>

I also tweaked the new SSI para based on Kyle's feedback, to mention
performance.  Kevin and Dan's paper, and plenty more papers referenced
from there, cover S2PL vs SSI pretty well if you want to know more
about that.  So I think that external reference is the right way to go
here too.

+   <para>
+    The Serializable isolation level is implemented using a technique
+    known in academic database literature as Serializable Snapshot Isolation,
+    which builds on Snapshot Isolation by adding checks for serialization
+    anomalies.  Some differences in behavior and performance may be observed
+    when compared with systems using other techniques.  Please see
+    <xref linkend="gritt2012"/> for detailed information.
+   </para>

Вложения

v2-0001-Doc-Add-references-for-SI-and-SSI.patch

Re: Potential G2-item cycles under serializable isolation

От

Peter Geoghegan

Дата:

12 июня 2020 г., 01:51:42

On Wed, Jun 10, 2020 at 8:14 PM Kyle Kingsbury <aphyr@jepsen.io> wrote:
> I wanted to let you know that I've put together a draft of a report on these
> findings, and if you've got any comments you'd like to offer, I'd be happy to
> hear them, either on-list or privately.
> http://jepsen.io/analyses/postgresql-12.3?draft-token=Kets1Quayfs

You should mention that a fix was committed this morning:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=5940ffb221316ab73e6fdc780dfe9a07d4221ebb

You may want to draw attention to the isolation test, since it's a
perfect distillation of the bug. I'm certain that it cannot be
simplified any further.

Note that the test involves three transactions/sessions -- not two.
Elle always complained about a pair of transactions that had a cycle,
which were similar to the "foo" and "bar" sessions from the test. But
Elle never said anything about a third transaction (that played the
role of the "trouble" transaction from the test). Of course, this
makes perfect sense -- the third/"trouble" transaction *should* be
totally irrelevant. Note that the UPDATE has to come from a third
session/transaction for things to go awry. And note that the order of
each statement has to match the order from the isolation test. For
example, no problems occur if you flip the order of "trouble_update"
and "bar_select" -- because that would mean that the physical tuple
does not get physically updated before being examined by "bar_select".

You should also specifically mention that the next upcoming minor
release will be on August 13th, 2020:

https://www.postgresql.org/developer/roadmap/

Users that are concerned about this bug will be able to get a new
minor release with the fix by that date, at the latest.

Thanks
-- 
Peter Geoghegan

Re: Potential G2-item cycles under serializable isolation

От

Kyle Kingsbury

Дата:

12 июня 2020 г., 15:34:41

On 6/11/20 6:51 PM, Peter Geoghegan wrote:
> On Wed, Jun 10, 2020 at 8:14 PM Kyle Kingsbury <aphyr@jepsen.io> wrote:
>> I wanted to let you know that I've put together a draft of a report on these
>> findings, and if you've got any comments you'd like to offer, I'd be happy to
>> hear them, either on-list or privately.
>> http://jepsen.io/analyses/postgresql-12.3?draft-token=Kets1Quayfs
> You should mention that a fix was committed this morning:
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=5940ffb221316ab73e6fdc780dfe9a07d4221ebb

Ah, fantastic!

> You may want to draw attention to the isolation test, since it's a
> perfect distillation of the bug. I'm certain that it cannot be
> simplified any further.

I mentioned it in the discussion, but now I can offer a direct link. Great work. :)

> Note that the test involves three transactions/sessions -- not two.
> Elle always complained about a pair of transactions that had a cycle,
> which were similar to the "foo" and "bar" sessions from the test. But
> Elle never said anything about a third transaction (that played the
> role of the "trouble" transaction from the test). Of course, this
> makes perfect sense -- the third/"trouble" transaction *should* be
> totally irrelevant. Note that the UPDATE has to come from a third
> session/transaction for things to go awry. And note that the order of
> each statement has to match the order from the isolation test. For
> example, no problems occur if you flip the order of "trouble_update"
> and "bar_select" -- because that would mean that the physical tuple
> does not get physically updated before being examined by "bar_select".

Ah, thank you. I didn't quite understand this from the second patch commit message.

> You should also specifically mention that the next upcoming minor
> release will be on August 13th, 2020:
>
> https://www.postgresql.org/developer/roadmap/
>
> Users that are concerned about this bug will be able to get a new
> minor release with the fix by that date, at the latest.

That's great news. Will do! :)

--Kyle

Re: Potential G2-item cycles under serializable isolation

От

raf

Дата:

13 июня 2020 г., 01:11:36

Thomas Munro wrote:

> +    Access to the system catalogs is not done using the isolation level
> +    of the current transaction.  This has the effect that newly created
> +    database objects such as tables become visible to concurrent Repeatable
> +    Read and Serializable transactions, even though their contents does not.

"their contents does not" should be "their contents do not".

cheers,
raf

Re: Potential G2-item cycles under serializable isolation

От

Thomas Munro

Дата:

13 июня 2020 г., 03:20:13

On Sat, Jun 13, 2020 at 10:11 AM raf <raf@raf.org> wrote:
> "their contents does not" should be "their contents do not".

Thanks.  I'd already reworded that in the v2 patch, later in the thread.

Re: Potential G2-item cycles under serializable isolation

От

Thomas Munro

Дата:

13 июня 2020 г., 07:49:02

On Fri, Jun 12, 2020 at 10:14 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> Here's a new attempt at that.  Attached, but I'll also just include
> the new paragraph here because it's short:

Slightly improved version, bringing some wording into line with
existing documentation.  s/SQL Standard/SQL standard/, and explicitly
referring to "locking" implementations of RR and Ser (as we do already
a few paragraphs earlier, when discussing MVCC).  My intention is to
push this to all branches in a couple of days if there is no other
feedback.  I propose to treat it as a defect, because I agree that
it's weird and surprising that we don't mention SI, especially
considering the history of the standard levels.  I mean, I guess it's
basically implied by all the stuff that section says about MVCC vs
traditional locking systems, and it's a super well known fact in our
hacker community, but not using the standard term of art is a strange
omission.

In future release perhaps we could entertain ideas like accepting the
name SNAPSHOT ISOLATION, and writing some more use-friendly guidance,
and possibly even reference the Generalized Isolation Level
Definitions stuff.  I think it'd be a bad idea to stop accepting
REPEATABLE READ and inconvenience our users, though; IMHO it's
perfectly OK to stick with the current interpretation of the spec
while also acknowledging flaws and newer thinking through this new
paragraph.

Вложения

v3-0001-Doc-Add-references-for-SI-and-SSI.patch

Re: Potential G2-item cycles under serializable isolation

От

Tatsuo Ishii

Дата:

13 июня 2020 г., 08:45:34

> On Fri, Jun 12, 2020 at 10:14 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>> Here's a new attempt at that.  Attached, but I'll also just include
>> the new paragraph here because it's short:
> 
> Slightly improved version, bringing some wording into line with
> existing documentation.  s/SQL Standard/SQL standard/, and explicitly
> referring to "locking" implementations of RR and Ser (as we do already
> a few paragraphs earlier, when discussing MVCC).  My intention is to
> push this to all branches in a couple of days if there is no other
> feedback.  I propose to treat it as a defect, because I agree that
> it's weird and surprising that we don't mention SI, especially
> considering the history of the standard levels.  I mean, I guess it's
> basically implied by all the stuff that section says about MVCC vs
> traditional locking systems, and it's a super well known fact in our
> hacker community, but not using the standard term of art is a strange
> omission.
> 
> In future release perhaps we could entertain ideas like accepting the
> name SNAPSHOT ISOLATION, and writing some more use-friendly guidance,
> and possibly even reference the Generalized Isolation Level
> Definitions stuff.  I think it'd be a bad idea to stop accepting
> REPEATABLE READ and inconvenience our users, though; IMHO it's
> perfectly OK to stick with the current interpretation of the spec
> while also acknowledging flaws and newer thinking through this new
> paragraph.

+1.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: Potential G2-item cycles under serializable isolation

От

Peter Geoghegan

Дата:

13 июня 2020 г., 21:37:08

On Fri, Jun 12, 2020 at 9:49 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> Slightly improved version, bringing some wording into line with
> existing documentation.  s/SQL Standard/SQL standard/, and explicitly
> referring to "locking" implementations of RR and Ser (as we do already
> a few paragraphs earlier, when discussing MVCC).  My intention is to
> push this to all branches in a couple of days if there is no other
> feedback.  I propose to treat it as a defect, because I agree that
> it's weird and surprising that we don't mention SI, especially
> considering the history of the standard levels.

That seems reasonable to me.

The doc patch says "some other systems may even offer Repeatable Read
and Snapshot Isolation as distinct isolation levels". This is true of
SQL Server, which has an explicit snapshot isolation level, though
possibly only because SI was retrofitted to SQL Server 2005 -- I guess
that that's what you have in mind. Perhaps you should reword this a
little to convey that you mean a separate isolation mode *and* an
actually-distinct set of behaviors to go with it.

> I mean, I guess it's
> basically implied by all the stuff that section says about MVCC vs
> traditional locking systems, and it's a super well known fact in our
> hacker community, but not using the standard term of art is a strange
> omission.

Yes, it's definitely an omission.

> In future release perhaps we could entertain ideas like accepting the
> name SNAPSHOT ISOLATION, and writing some more use-friendly guidance,
> and possibly even reference the Generalized Isolation Level
> Definitions stuff.  I think it'd be a bad idea to stop accepting
> REPEATABLE READ and inconvenience our users, though; IMHO it's
> perfectly OK to stick with the current interpretation of the spec
> while also acknowledging flaws and newer thinking through this new
> paragraph.

Kyle made the following observation about "true" RR at one point: "If
you use the broad/generalized interpretation, you can trust that a
program which only accesses data by primary key, running under
repeatable read, is actually serializable". However, the simple fact
is that it just wouldn't be sensible for Postgres to offer an
according-to-Adya REPEATABLE READ mode that isn't just an alias for
SERIALIZABLE/SSI. What does the leeway that RR gives us compared to
SERIALIZABLE actually buy the implementation, performance-wise? This
is similar to the situation with READ UNCOMMITED and READ COMMITTED in
Postgres.

Even though I am sympathetic to Kyle's concerns here, and am all in
favor of this documentation patch, I believe that the current naming
scheme is the least-worst option. The point of using the standard
names is to make it easy to understand the guarantees on offer. But we
need to call the current REPEATABLE READ behavior *something*, and the
closest thing that the standard has is RR. Unless we're willing to
make no distinction between SSI and snapshot isolation -- which we're
clearly not. Or unless we're willing to exclusively use some
non-standard name instead -- which is also clearly a non-starter
(though I'd be fine with an alias).

Speaking of comparing behaviors across systems, the behavior that
MySQL calls REPEATABLE READ mode is actually READ COMMITTED, while the
behavior that DB2 calls REPEATABLE READ is actually SERIALIZABLE. The
range of guarantees that you actually get are *enormous* in reality!
ISTM that REPEATABLE READ is by far the most confusing and ambiguous
isolation level that the standard describes.

Even Elle can't tell the difference between "true" REPEATABLE READ
mode and SERIALIZABLE mode; why should Postgres users be able to?

-- 
Peter Geoghegan

Re: Potential G2-item cycles under serializable isolation

От

Thomas Munro

Дата:

15 июня 2020 г., 04:30:27

On Sun, Jun 14, 2020 at 6:37 AM Peter Geoghegan <pg@bowt.ie> wrote:
> That seems reasonable to me.

Thanks!

> The doc patch says "some other systems may even offer Repeatable Read
> and Snapshot Isolation as distinct isolation levels". This is true of
> SQL Server, which has an explicit snapshot isolation level, though
> possibly only because SI was retrofitted to SQL Server 2005 -- I guess
> that that's what you have in mind. Perhaps you should reword this a
> little to convey that you mean a separate isolation mode *and* an
> actually-distinct set of behaviors to go with it.

Ok, changed it to "distinct isolation levels with different behavior",
and pushed this.

I also made the biblio entries look more like the existing examples.

> Speaking of comparing behaviors across systems, the behavior that
> MySQL calls REPEATABLE READ mode is actually READ COMMITTED, while the

By my reading of their manual, MySQL (assuming InnoDB) uses SI for
REPEATABLE READ just like us, and it's also their default level.

https://dev.mysql.com/doc/refman/8.0/en/innodb-consistent-read.html

> behavior that DB2 calls REPEATABLE READ is actually SERIALIZABLE. The
> range of guarantees that you actually get are *enormous* in reality!
> ISTM that REPEATABLE READ is by far the most confusing and ambiguous
> isolation level that the standard describes.

I think IBM might even have changed that between System R and DB2, so
there's probably an interesting story behind that.

Re: Potential G2-item cycles under serializable isolation

От

"Daniel Verite"

Дата:

15 июня 2020 г., 15:30:23

    Thomas Munro wrote:

> > Speaking of comparing behaviors across systems, the behavior that
> > MySQL calls REPEATABLE READ mode is actually READ COMMITTED, while the
>
> By my reading of their manual, MySQL (assuming InnoDB) uses SI for
> REPEATABLE READ just like us, and it's also their default level.
>
> https://dev.mysql.com/doc/refman/8.0/en/innodb-consistent-read.html

With InnoDB, a RR transaction that just reads has the same behavior
than with Postgres, in the sense that it does consistent reads across
all tables.

But it should be noted that when concurrent writes are involved,
InnoDB does not confine the transaction to its snapshot as Postgres
does.
In particular, rows that a simple SELECT can't see because of RR
visibility rules are found by UPDATEs or INSERTs, and are somehow
incorporated into the RR transaction.
If InnoDB's RR is based on Snapshot Isolation, what it does with
it seems to be in violation of Snapshot Isolation as seen by postgres,
so it's somewhat unfair/confusing to use the same term for both.

From the user point of view, Repeatable Read in InnoDB exhibits
anomalies that are not possible with Postgres' Repeatable
Read.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite

Re: Potential G2-item cycles under serializable isolation

От

Kyle Kingsbury

Дата:

15 июня 2020 г., 15:42:38

On 6/14/20 9:30 PM, Thomas Munro wrote:
> By my reading of their manual, MySQL (assuming InnoDB) uses SI for
> REPEATABLE READ just like us, and it's also their default level.
>
> https://dev.mysql.com/doc/refman/8.0/en/innodb-consistent-read.html

That can't be right. MySQL repeatable read allows lost update, write skew, and 
at least some kinds of read skew: it's got to be weaker than SI and also weaker 
than RR. I think it's actually Monotonic Atomic View plus some read-only 
constraints?

https://github.com/ept/hermitage

<sigh> I really gotta do a report on MySQL too...

--Kyle

Re: Potential G2-item cycles under serializable isolation

От

Peter Geoghegan

Дата:

15 июня 2020 г., 20:15:59

On Mon, Jun 15, 2020 at 5:42 AM Kyle Kingsbury <aphyr@jepsen.io> wrote:
> On 6/14/20 9:30 PM, Thomas Munro wrote:
> > By my reading of their manual, MySQL (assuming InnoDB) uses SI for
> > REPEATABLE READ just like us, and it's also their default level.
> >
> > https://dev.mysql.com/doc/refman/8.0/en/innodb-consistent-read.html
>
> That can't be right. MySQL repeatable read allows lost update, write skew, and
> at least some kinds of read skew: it's got to be weaker than SI and also weaker
> than RR. I think it's actually Monotonic Atomic View plus some read-only
> constraints?

At first glance it seemed to me that MySQL's repeatable read must be
more or less the same as Postgres' repeatable read; there is only one
snapshot in each case. But it's very different in reality, since
updates and deletes don't use the transaction snapshot. Worst of all,
you can update rows that were not visible to the transaction snapshot,
thus rendering them visible (see the "Note" box in the documentation
for an example of this). InnoDB won't throw a serialization error at
any isolation level.

I think that means that InnoDB's RR mode could "render" *some* of the
rows concurrently inserted by another transaction (those that our xact
happened to update) as visible, while leaving other rows inserted by
that same other transaction invisible. That kind of behavior seems
really hazardous to me. It almost seems safer to use RC mode, since
using a new snapshot for each new statement is "more honest".
Presumably RC mode doesn't exhibit the inconsistency I described -- at
least all of the rows that were concurrently inserted by that other
xact become visible together.

-- 
Peter Geoghegan

Re: Potential G2-item cycles under serializable isolation

От

Thomas Munro

Дата:

16 июня 2020 г., 00:39:11

On Tue, Jun 16, 2020 at 5:16 AM Peter Geoghegan <pg@bowt.ie> wrote:
> At first glance it seemed to me that MySQL's repeatable read must be
> more or less the same as Postgres' repeatable read; there is only one
> snapshot in each case. But it's very different in reality, since
> updates and deletes don't use the transaction snapshot. Worst of all,
> you can update rows that were not visible to the transaction snapshot,
> thus rendering them visible (see the "Note" box in the documentation
> for an example of this). InnoDB won't throw a serialization error at
> any isolation level.

Ugh, obviously I only read the first two paragraphs of that page,
which sound an *awful* lot like a description of SI (admittedly
without naming it).  My excuse is that I arrived on that page by
following a link from
https://en.wikipedia.org/wiki/Snapshot_isolation.  Wikipedia is wrong.
Thanks for clarifying.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Potential G2-item cycles under serializable isolation

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения