Обсуждение: BUG #11732: Non-serializable outcomes under serializable isolation
The following bug has been logged on the website: Bug reference: 11732 Logged by: Peter Bailis Email address: pbailis@cs.berkeley.edu PostgreSQL version: 9.3.5 Operating system: Ubuntu 14.04.1 LTS Description: When issuing a high-contention workload under SERIALIZABLE isolation, I am able to produce non-serializable outcomes. I am able to reproduce this behavior both via a Ruby on Rails deployment and via a standalone Python script that generates concurrent transactions (https://gist.github.com/pbailis/503430309104e3f7ab79#file-pg-stress-py). BASIC WORKLOAD: The basic workload consists of a series of concurrent transactions performing read-modify-insert operations on a non-indexed varchar column. The workload repeatedly issues a set of K transactions in parallel, incrementing a sequence number for each wave. Each transaction within a wave checks whether or not the non-indexed varchar column contains the sequence number encoded as a string (via a SELECT WHERE query). If so, the query rolls back. If not, attempts to insert a new row with the sequence number. (This is effectively a poor manâs version of enforcing uniqueness constraints.) Under SERIALIZABLE isolation, only one transaction should successfully insert an entry into the varchar column per sequence number. (Reasoning: If, within a wave, a transaction T_1 observes that another transaction T_2 has already updated the column, it will not attempt to insert. If, within a wave, a transaction T_1 does not observe another transaction T_2's successful, committed insertion to the same sequence number, T_1 should abort to preserve serializability and will not insert. In the event that all transactions within a wave abort, there will be no insertions to that sequence number.) As a minimal working example, consider the following table, with the âkeyâ column holding the sequence number varchar column from above: TABLE "stress" ("id" serial primary key, "key" character varying(255)) and the following transaction, executed in parallel and parameterized by K, a varchar: BEGIN TRANSACTION; result = 'SELECT 1 AS one FROM "stress" WHERE "stress"."key" = K;' if result is not NULL: 'INSERT INTO "stress" ("key") VALUES (K) RETURNING "id";' COMMIT TRANSACTION; else: ROLLBACK TRANSACTION; (In both implementations I've built, the logic has been written in the application, and the single-quoted queries run in PostgreSQL.) EXPECTED BEHAVIOR: Under serializable isolation, there should only be one record per 'key': SELECT key, COUNT(*) from stress GROUP BY key HAVING COUNT(*) > 1; key | count -----+------- (0 rows) ACTUAL BEHAVIOR: Under serializable behavior, there are duplicates in the "key" column. SELECT key, COUNT(*) from stress GROUP BY key HAVING COUNT(*) > 1; key | count -----+------- 745 | 2 274 | 2 55 | 2 (3 rows) I realize this is a weird workload, but it's derived from a set of queries in Rails's ORM. ENVIRONMENT DETAILS: I have reproduced this behavior in two separate environments: PostgreSQL version(): PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit PostgreSQL installation method: vanilla install via apt ('postgres-9.3') OS: Ubuntu 14.04 LTS Hardware: EC2 m2.4xlarge instance with 8 cores, 68.4GB memory, and 840GB local ephemeral storage PostgreSQL version: PostgreSQL 9.3.5 on x86_64-apple-darwin13.3.0, compiled by Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn), 64-bit PostgreSQL installation method: vanilla install via brew ('postgres') OS: Apple OS X 10.9.4 Hardware: Macbook Pro, 2.8GHz Intel i5, 16GB RAM, SSD Configuration changes: I've changed the default postgresql.conf default_transaction_isolation to 'serializable' and increasing the number of connections and memory to accommodate multiple parallel connections (in the script I've provided, 128): default_transaction_isolation = 'serializable' max_connections = 200 shared_buffers = 1GB I originally thought this must be a bug in Rails, but I've now implemented the transaction logic in both Rails 4 and Python (using psycopg2) and reproduced this behavior in both environments. Autocommit is disabled, and the logs (see https://gist.github.com/pbailis/503430309104e3f7ab79#file-postgres-output-txt) demonstrate that serializable isolation is enabled (i.e., "Reason code: Canceled on identification as a pivot, during write"). This leads me to believe it is a consequence of the SSI implementation. REPRODUCING: Set up a new PostgreSQL database. The Python script assumes a database called 'stress' and a user called 'stress'. The following command will set up a new database in '/tmp/pgdata': export PGDATA=/tmp/pgdata; rm -rf $PGDATA; pg_ctl init; printf "default_transaction_isolation = 'serializable'\nshared_buffers=1GB\nmax_connections=200\n" >> $PGDATA/postgresql.conf; postgres -D /tmp/pgdata & sleep 2; psql postgres -c "CREATE USER stress WITH PASSWORD 'stress'"; psql postgres -c "CREATE DATABASE stress"; psql postgres -c "GRANT ALL PRIVILEGES ON DATABASE stress to stress"; fg Run a series of concurrent sets of transactions as described above, or just run the Python tool (https://gist.github.com/pbailis/503430309104e3f7ab79#file-pg-stress-py). Check whether the 'key' column contains duplicate entries (e.g., https://gist.github.com/pbailis/503430309104e3f7ab79#file-state-of-database-after-pg-stress-py). This may take a few tries. I'd estimate that over half of my runs in both environments (going through the whole setup -- database setup plus Python script) reproduce this behavior. The current script is a bit of a hack, leading to high variance in timing behavior. OTHER NOTES: I haven't had much luck reproducing with a non-varchar key (e.g., (key integer)) and without RETURNING "id". Putting a unique index on "key" seems to work as expected. The behavior seems to appear more often when run on a new PostgreSQL installation. If the script hangs, check the contents of the table -- there may be a duplicate. However, not all duplicates freeze the script. I'm happy to debug further, but I'm sending what I have so far.
I am also able to reproduce the behavior in PL/pgSQL given sufficiently high concurrency: CREATE FUNCTION stress_fn(target_key varchar) RETURNS varchar AS $$ DECLARE found_key varchar; BEGIN PERFORM 1 AS one FROM "stress" WHERE "stress"."key" =3D target_key; IF FOUND THEN RAISE EXCEPTION 'already inserted'; END IF; INSERT INTO "stress" ("key") VALUES (target_key) RETURNING "id" INTO found_key; RETURN found_key; END; Here's an updated script using this function (still using Python to generate concurrent transactions): https://gist.github.com/pbailis/503430309104e3f7ab79#file-pg-stress-stored-= procedure-py On Tue, Oct 21, 2014 at 12:14 AM, <pbailis@cs.berkeley.edu> wrote: > The following bug has been logged on the website: > > Bug reference: 11732 > Logged by: Peter Bailis > Email address: pbailis@cs.berkeley.edu > PostgreSQL version: 9.3.5 > Operating system: Ubuntu 14.04.1 LTS > Description: > > > When issuing a high-contention workload under SERIALIZABLE isolation, I a= m > able to produce non-serializable outcomes. I am able to reproduce this > behavior both via a Ruby on Rails deployment and via a standalone Python > script that generates concurrent transactions > (https://gist.github.com/pbailis/503430309104e3f7ab79#file-pg-stress-py). > > BASIC WORKLOAD: > > The basic workload consists of a series of concurrent transactions > performing read-modify-insert operations on a non-indexed varchar column. > The workload repeatedly issues a set of K transactions in parallel, > incrementing a sequence number for each wave. Each transaction within a > wave > checks whether or not the non-indexed varchar column contains the sequenc= e > number encoded as a string (via a SELECT WHERE query). If so, the query > rolls back. If not, attempts to insert a new row with the sequence number= . > (This is effectively a poor man=E2=80=99s version of enforcing uniqueness > constraints.) Under SERIALIZABLE isolation, only one transaction should > successfully insert an entry into the varchar column per sequence number. > > (Reasoning: If, within a wave, a transaction T_1 observes that another > transaction T_2 has already updated the column, it will not attempt to > insert. If, within a wave, a transaction T_1 does not observe another > transaction T_2's successful, committed insertion to the same sequence > number, T_1 should abort to preserve serializability and will not insert. > In > the event that all transactions within a wave abort, there will be no > insertions to that sequence number.) > > As a minimal working example, consider the following table, with the =E2= =80=9Ckey=E2=80=9D > column holding the sequence number varchar column from above: > > TABLE "stress" ("id" serial primary key, "key" character varying(255)= ) > > and the following transaction, executed in parallel and parameterized by = K, > a varchar: > > BEGIN TRANSACTION; > result =3D 'SELECT 1 AS one FROM "stress" WHERE "stress"."key" =3D K= ;' > > if result is not NULL: > 'INSERT INTO "stress" ("key") VALUES (K) RETURNING "id";' > COMMIT TRANSACTION; > else: > ROLLBACK TRANSACTION; > > (In both implementations I've built, the logic has been written in the > application, and the single-quoted queries run in PostgreSQL.) > > EXPECTED BEHAVIOR: Under serializable isolation, there should only be one > record per 'key': > > SELECT key, COUNT(*) from stress GROUP BY key HAVING COUNT(*) > 1; > key | count > -----+------- > (0 rows) > > ACTUAL BEHAVIOR: Under serializable behavior, there are duplicates in the > "key" column. > > SELECT key, COUNT(*) from stress GROUP BY key HAVING COUNT(*) > 1; > key | count > -----+------- > 745 | 2 > 274 | 2 > 55 | 2 > (3 rows) > > I realize this is a weird workload, but it's derived from a set of querie= s > in Rails's ORM. > > ENVIRONMENT DETAILS: > > I have reproduced this behavior in two separate environments: > > PostgreSQL version(): PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, > compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit > PostgreSQL installation method: vanilla install via apt ('postgres-9.3') > OS: Ubuntu 14.04 LTS > Hardware: EC2 m2.4xlarge instance with 8 cores, 68.4GB memory, and 840GB > local ephemeral storage > > PostgreSQL version: PostgreSQL 9.3.5 on x86_64-apple-darwin13.3.0, > compiled > by Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn), 64-bit > PostgreSQL installation method: vanilla install via brew ('postgres') > OS: Apple OS X 10.9.4 > Hardware: Macbook Pro, 2.8GHz Intel i5, 16GB RAM, SSD > > Configuration changes: I've changed the default postgresql.conf > default_transaction_isolation to 'serializable' and increasing the number > of > connections and memory to accommodate multiple parallel connections (in t= he > script I've provided, 128): > default_transaction_isolation =3D 'serializable' > max_connections =3D 200 > shared_buffers =3D 1GB > > I originally thought this must be a bug in Rails, but I've now implemente= d > the transaction logic in both Rails 4 and Python (using psycopg2) and > reproduced this behavior in both environments. Autocommit is disabled, an= d > the logs (see > > https://gist.github.com/pbailis/503430309104e3f7ab79#file-postgres-output= -txt > ) > demonstrate that serializable isolation is enabled (i.e., "Reason code: > Canceled on identification as a pivot, during write"). This leads me to > believe it is a consequence of the SSI implementation. > > REPRODUCING: > > Set up a new PostgreSQL database. The Python script assumes a database > called 'stress' and a user called 'stress'. The following command will se= t > up a new database in '/tmp/pgdata': > export PGDATA=3D/tmp/pgdata; rm -rf $PGDATA; pg_ctl init; printf > "default_transaction_isolation =3D > 'serializable'\nshared_buffers=3D1GB\nmax_connections=3D200\n" >> > $PGDATA/postgresql.conf; postgres -D /tmp/pgdata & sleep 2; psql postgres > -c > "CREATE USER stress WITH PASSWORD 'stress'"; psql postgres -c "CREATE > DATABASE stress"; psql postgres -c "GRANT ALL PRIVILEGES ON DATABASE stre= ss > to stress"; fg > > Run a series of concurrent sets of transactions as described above, or ju= st > run the Python tool > (https://gist.github.com/pbailis/503430309104e3f7ab79#file-pg-stress-py). > > Check whether the 'key' column contains duplicate entries (e.g., > > https://gist.github.com/pbailis/503430309104e3f7ab79#file-state-of-databa= se-after-pg-stress-py > ). > > This may take a few tries. I'd estimate that over half of my runs in both > environments (going through the whole setup -- database setup plus Python > script) reproduce this behavior. The current script is a bit of a hack, > leading to high variance in timing behavior. > > OTHER NOTES: > > I haven't had much luck reproducing with a non-varchar key (e.g., (key > integer)) and without RETURNING "id". Putting a unique index on "key" see= ms > to work as expected. > The behavior seems to appear more often when run on a new PostgreSQL > installation. > If the script hangs, check the contents of the table -- there may be a > duplicate. However, not all duplicates freeze the script. > > I'm happy to debug further, but I'm sending what I have so far. > > >
"pbailis@cs.berkeley.edu" <pbailis@cs.berkeley.edu> wrote: > I'm happy to debug further, but I'm sending what I have so far. I'm just starting to look at this, but it would help if you could try with the same conditions where you are seeing the failure, but with max_pred_locks_per_transaction = 1280 (or higher). It appears that you are pushing things to the point where it is likely to be summarizing the predicate locks of multiple transactions; the above will help to establish whether the bug is in that area of the code or somewhere else. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> > I'm just starting to look at this, but it would help if you could > try with the same conditions where you are seeing the failure, but > with max_pred_locks_per_transaction = 1280 (or higher). I just re-ran the workload with max_pred_locks_per_transaction set (in postgresql.conf) to each of 1280, 12800, and 128000 and observed the duplicate behavior under each setting.
Peter Bailis <pbailis@cs.berkeley.edu> wrote: > I just re-ran the workload with max_pred_locks_per_transaction set > (in postgresql.conf) to each of 1280, 12800, and 128000 and > observed the duplicate behavior under each setting. Thanks for checking! Looking into it.... -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
I spent some more time playing with the PL/pgSQL-based workload. I find that I'm still able to trigger the behavior if I: a.) Drop the primary key column entirely from the table and the stored procedure (basically, just perform read-then-insert on the single "key" column). b.) Change the "key" column from a varchar to an integer. I started to wonder if this suggests an issue with page-level locking on a non-indexed column. So, I recompiled PostgreSQL 9.3.5 with blocksizes of 32768 (./configure --with-blocksize=32) and 1024 and didn't have any discernible effect (was able to reproduce). I also directly modified the configure script to allow a 1MB blocksize and was again able to reproduce the behavior. Have you had any luck reproducing this behavior? Thanks, Peter On Tue, Oct 21, 2014 at 11:28 AM, Kevin Grittner <kgrittn@ymail.com> wrote: > Peter Bailis <pbailis@cs.berkeley.edu> wrote: > > > I just re-ran the workload with max_pred_locks_per_transaction set > > (in postgresql.conf) to each of 1280, 12800, and 128000 and > > observed the duplicate behavior under each setting. > Thanks for checking! > > Looking into it.... > > -- > Kevin Grittner > EDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
Peter Bailis <pbailis@cs.berkeley.edu> wrote: > Have you had any luck reproducing this behavior? Yes I have. I was able to create it using just plpgsql and psql with a bash script. Using that technique I was able to create the problem without the RETURNING clause and with a streamlining of the function you suggested. With an int column instead of a character-based column I have (so far) not been able to trigger it. I have confirmed that this bug goes all the way back to the introduction of the Serializable Snapshot Isolation technique in 9.1. Everything suggests a race condition. I haven't been able to pin down exactly where it happens, but that's just a matter of time. It is interesting that the data type of the column changes the timing enough to have such a large effect on seeing the failure. I've set it aside for the moment but expect to get back to it later this week. Thanks for bringing this to our attention, complete with a reproducible test case! -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner wrote: > Peter Bailis <pbailis@cs.berkeley.edu> wrote: > > > Have you had any luck reproducing this behavior? > Everything suggests a race condition. I haven't been able to pin > down exactly where it happens, but that's just a matter of time. > It is interesting that the data type of the column changes the > timing enough to have such a large effect on seeing the failure. > I've set it aside for the moment but expect to get back to it later > this week. Any luck here? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Kevin Grittner wrote: >> Everything suggests a race condition. I haven't been able to >> pin down exactly where it happens, but that's just a matter of >> time. It is interesting that the data type of the column >> changes the timing enough to have such a large effect on seeing >> the failure. I've set it aside for the moment but expect to get >> back to it later this week. > > Any luck here? Yes and no. I have further evidence, and have narrowed the area of investigation, but I haven't been able to pin down the cause yet. I'll post what I have "for the record" and in case someone has an idea that will help me find it. I was able to create a bash script and simple plpgsql function to demonstrate the problem. They're short enough to include in the email, but I will attach, too, in case the email formatting mangles it. Bash file "stress": #!/bin/bash pg_ctl -D Debug/data -m fast -w stop rm -fr Debug/data/* initdb Debug/data echo "default_transaction_isolation = 'serializable'" >> Debug/data/postgresql.conf echo "shared_buffers = 1GB" >> Debug/data/postgresql.conf echo "max_connections = 200" >> Debug/data/postgresql.conf pg_ctl -D Debug/data -l Debug/data/logfile -w start || ( cat Debug/data/logfile && exit 1 ) createdb test psql -f ~/stress.sql test psql -c "CREATE TABLE \"stress\" (\"id\" serial primary key, \"key\" text);" test for i in {1..1000} do echo $i for j in {1..128} do psql -c "select stress_fn('$i');" test >/dev/null 2>&1 & done wait sleep 0.3s done psql -c "select count(*) from stress;" test psql -c "select key, count(*) from stress group by key having count(*) > 1 order by key::int;" test "stress.sql" file: CREATE FUNCTION stress_fn(target_key text) RETURNS VOID LANGUAGE plpgsql VOLATILE AS $$ BEGIN PERFORM 1 AS one FROM "stress" WHERE "stress"."key" = target_key; IF FOUND THEN RAISE EXCEPTION 'already inserted'; END IF; INSERT INTO "stress" ("key") VALUES (target_key); END; $$; So far this has generated 1 to 3 duplicated values per run. I have changed the timing to 10.0 seconds and still had an occurrence. Changing the "key" column and the corresponding parameter to integer causes it to run without generating any duplicates (so far, at least). For each non-duplicated "key" value I get about 10 instances of this: ERROR: could not serialize access due to read/write dependencies among transactions DETAIL: Reason code: Canceled on identification as a pivot, during write. HINT: The transaction might succeed if retried. CONTEXT: SQL statement "INSERT INTO "stress" ("key") VALUES (target_key)" PL/pgSQL function stress_fn(text) line 8 at SQL statement STATEMENT: select stress_fn('1'); ... followed by enough of these to add up to 127 errors per value: ERROR: already inserted STATEMENT: select stress_fn('1'); For the "key" values that generate one or more duplicates I get different logging. I see about 30 of these: ERROR: could not serialize access due to read/write dependencies among transactions DETAIL: Reason code: Canceled on conflict out to pivot 9137, during read. HINT: The transaction might succeed if retried. CONTEXT: SQL statement "SELECT 1 AS one FROM "stress" WHERE "stress"."key" = target_key" PL/pgSQL function stress_fn(text) line 3 at PERFORM STATEMENT: select stress_fn('339'); ... before it switches to: ERROR: already inserted STATEMENT: select stress_fn('339'); We used a different reason code in the detail for each spot that could generate a serialization failure based on what serializable transactions add to the checking done for repeatable read transactions. This particular reason code can only be generated if a "dangerous structure" of read-write dependencies is found where the "pivot" has committed -- a fairly rare occurrence. So far in this test, I have not seen that reason code without the duplicates, nor duplicates without that reason code. For those not intimately familiar with our serializable implementation but who want to follow along to some degree or other, the only way that serialization anomalies can occur under snapshot isolation (what we have for repeatable read) is when a "pivot" transaction (Tpivot) has a read-write dependency out to a transaction (Tout) and a read-write dependency in from a transaction (Tin) (in other words, Tout writes something that would have been read by Tpivot if Tout had been committed before Tpivot began, and Tpivot writes something that would have been read by Tin if Tpivot had committed before Tin began) and Tout commits before Tin or Tpivot commit. As a further test to avoid false positives, an anomaly cannot occur if Tin is read-only (or if it commits without modifying data) and Tin started before Tout committed. Just for completeness, Tin and Tout can be the same transaction. So, to hit this code, Tin must be reading data written by Tpivot, Tpivot must have committed (or at least be well into the commit process), and Tout must have committed before Tpivot committed. Tin is executing a SELECT and checks MVCC visibility information of a row and finds that it was written by a transaction which is committed (or at least past the point where it can throw a serialization failure), *and* Tpivot had either read data that had been written by Tout or Tout had written data that predicate locks should show had already been read by Tpivot. Either way, what Tin is seeing should not be possible, because what I just described should have caused a dangerous structure for which should have caused either Tpivot to be canceled. Following from that it appears that the different reason code Tin is seeing is probably a symptom of a failure which already occurred -- otherwise how could both of the other transactions have committed? Since there is no index on the "key" column, we must do a heap scan, and that means a predicate lock on the heap relation. Somehow, two connections get that predicate lock, scan the relation and find that the key value doesn't yet exist, and insert without seeing the other transaction's predicate lock. But I'm not seeing how that could happen. Another possibility is that this is somehow happening as the result of some bug in the "commit sequence number" used by the serializable implementation. In the absence of a real CSN, and to avoid additional contention on the PROCARRAY locking, we created an approximation of a CSN which plays some tricks which assume atomic memory access where perhaps that isn't safe or needs a memory barrier. If neither Tpivot nor Tout sees the other as having committed first, we could see exactly this behavior. Those aren't the only places this race condition bug could be hiding, but they seem like the two most obvious candidates. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
Kevin Grittner <kgrittn@ymail.com> wrote: > Either way, what Tin > is seeing should not be possible, because what I just described > should have caused a dangerous structure for which should have > caused either Tpivot to be canceled. Gah! Something shiny must have come into my field of vision while I was editing that sentence. My apologies. This stuff is hard enough to understand without such mangled editing, so I'll try again: Either way, what Tin is seeing should not be possible, because what I just described should have caused a dangerous structure which should have caused either Tpivot or Tout to be canceled. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company