Обсуждение: Odd query execution behavior with extended protocol
<div dir="ltr"><div style="font-size:12.8px">Hi hackers, some odd behavior has been reported with Npgsql and I'm sure youcan help.</div><div style="font-size:12.8px"><br /></div><div style="font-size:12.8px">Npgsql supports sending multipleSQL statements in a single packet via the extended protocol. This works fine, but when the second query SELECTs avalue modified by the first's UPDATE, I'm getting a result as if the UPDATE hasn't yet occurred.</div><div style="font-size:12.8px"><br/></div><div style="font-size:12.8px">The exact messages send by Npgsql are:</div><div style="font-size:12.8px"><br/></div><div style="font-size:12.8px">Parse (UPDATE data SET name='foo' WHERE id=1), statement=unnamed</div><divstyle="font-size:12.8px">Describe (statement=unnamed)</div><div style="font-size:12.8px">Bind(statement=unnamed, portal=MQ0)</div><div style="font-size:12.8px">Parse (SELECT * FROM dataWHERE id=1), statement=unnamed</div><div style="font-size:12.8px">Describe (statement=unnamed)</div><div style="font-size:12.8px">Bind(statement=unnamed, portal=MQ1)</div><div style="font-size:12.8px">Execute (portal=MQ0)</div><divstyle="font-size:12.8px">Close (portal=MQ0)</div><div style="font-size:12.8px">Execute (portal=MQ1)</div><divstyle="font-size:12.8px">Close (portal=MQ1)</div><div style="font-size:12.8px">Sync</div><div style="font-size:12.8px"><br/></div><div style="font-size:12.8px">Instead of returning the expected 'foo' value set in thefirst command's UPDATE, I'm getting whatever value was previously there.</div><div style="font-size:12.8px">Note thatthis happen regardless of whether a transaction is already set and of the isolation level.</div><div style="font-size:12.8px"><br/></div><div style="font-size:12.8px">Is this the expected behavior, have I misunderstood theprotocol specs?</div><div style="font-size:12.8px"><br /></div><div style="font-size:12.8px">Thanks for your help, andplease let me know if you need any more info.</div><div style="font-size:12.8px"><br /></div><div style="font-size:12.8px">Shay</div></div>
Hello > Npgsql supports sending multiple SQL statements in a single packet via the extended protocol. This works fine, but whenthe second query SELECTs a value modified by the first's UPDATE, I'm getting a result as if the > UPDATE hasn't yet occurred. Looks like the first updating statement is not committed, assuming that the two statements run in different transactions. > The exact messages send by Npgsql are: > > Parse (UPDATE data SET name='foo' WHERE id=1), statement=unnamed > Describe (statement=unnamed) > Bind (statement=unnamed, portal=MQ0) > Parse (SELECT * FROM data WHERE id=1), statement=unnamed > Describe (statement=unnamed) > Bind (statement=unnamed, portal=MQ1) > Execute (portal=MQ0) > Close (portal=MQ0) > Execute (portal=MQ1) > Close (portal=MQ1) > Sync I never used Npgsql so I don't know if there is something missing there. Would you need an explicit commit before closingMQ0? Also I am not in clear what "statement=unnamed" means, but it is used twice. Is it possible that the update is overwrittenwith select before it executes? Just some thoughts, as I said I know nothing of Npgsql. BTW: Do you see the change after update in your DB if you look into it with another tool (e.g. psql)? Charles
> Npgsql supports sending multiple SQL statements in a single packet via the extended protocol. This works fine, but when the second query SELECTs a value modified by the first's UPDATE, I'm getting a result as if the
> UPDATE hasn't yet occurred.
Looks like the first updating statement is not committed, assuming that the two statements run in different transactions.
I did try to prefix the message chain with an explicit transaction BEGIN (with the several different isolation levels) without a difference in behavior.
> The exact messages send by Npgsql are:
>
> Parse (UPDATE data SET name='foo' WHERE id=1), statement=unnamed
> Describe (statement=unnamed)
> Bind (statement=unnamed, portal=MQ0)
> Parse (SELECT * FROM data WHERE id=1), statement=unnamed
> Describe (statement=unnamed)
> Bind (statement=unnamed, portal=MQ1)
> Execute (portal=MQ0)
> Close (portal=MQ0)
> Execute (portal=MQ1)
> Close (portal=MQ1)
> Sync
I never used Npgsql so I don't know if there is something missing there. Would you need an explicit commit before closing MQ0?
I guess this is exactly my question to PostgreSQL... But unless I'm misunderstanding the transaction semantics I shouldn't need to commit the first UPDATE in order to see its effect in the second SELECT...
Also I am not in clear what "statement=unnamed" means, but it is used twice. Is it possible that the update is overwritten with select before it executes?
statement=unnamed means that the destination statement is the unnamed prepared statement (as described in http://www.postgresql.org/docs/current/static/protocol-message-formats.html). Right after the Parse I bind the unnamed statement which I just parsed to cursor MQ0. In other words, Npgsql first parses the two queries and binds them to portals MQ0 and MQ1, and only then executes both portals
BTW: Do you see the change after update in your DB if you look into it with another tool (e.g. psql)?
That's a good suggestion, I'll try to check it out, thanks!
On October 4, 2015 2:50:10 PM GMT+02:00, Shay Rojansky <roji@roji.org> wrote: >> >> > Npgsql supports sending multiple SQL statements in a single packet >via >> the extended protocol. This works fine, but when the second query >SELECTs a >> value modified by the first's UPDATE, I'm getting a result as if the >> > UPDATE hasn't yet occurred. >> >> Looks like the first updating statement is not committed, assuming >that >> the two statements run in different transactions. >> > >I did try to prefix the message chain with an explicit transaction >BEGIN >(with the several different isolation levels) without a difference in >behavior. > >> The exact messages send by Npgsql are: >> > >> > Parse (UPDATE data SET name='foo' WHERE id=1), statement=unnamed >> > Describe (statement=unnamed) >> > Bind (statement=unnamed, portal=MQ0) >> > Parse (SELECT * FROM data WHERE id=1), statement=unnamed >> > Describe (statement=unnamed) >> > Bind (statement=unnamed, portal=MQ1) >> > Execute (portal=MQ0) >> > Close (portal=MQ0) >> > Execute (portal=MQ1) >> > Close (portal=MQ1) >> > Sync >> >> I never used Npgsql so I don't know if there is something missing >there. >> Would you need an explicit commit before closing MQ0? >> > >I guess this is exactly my question to PostgreSQL... But unless I'm >misunderstanding the transaction semantics I shouldn't need to commit >the >first UPDATE in order to see its effect in the second SELECT... Try adding a sync before the second execute. Andres --- Please excuse brevity and formatting - I am writing this on my mobile phone.
Shay Rojansky <roji@roji.org> writes: > Npgsql supports sending multiple SQL statements in a single packet via the > extended protocol. This works fine, but when the second query SELECTs a > value modified by the first's UPDATE, I'm getting a result as if the UPDATE > hasn't yet occurred. > The exact messages send by Npgsql are: > Parse (UPDATE data SET name='foo' WHERE id=1), statement=unnamed > Describe (statement=unnamed) > Bind (statement=unnamed, portal=MQ0) > Parse (SELECT * FROM data WHERE id=1), statement=unnamed > Describe (statement=unnamed) > Bind (statement=unnamed, portal=MQ1) > Execute (portal=MQ0) > Close (portal=MQ0) > Execute (portal=MQ1) > Close (portal=MQ1) > Sync I'm fairly sure that the query snapshot is established at Bind time, which means that this SELECT will run with a snapshot that indeed does not see the effects of the UPDATE. To my mind there is not a lot of value in performing Bind until you are ready to do Execute. The only reason the operations are separated in the protocol is so that you can do multiple Executes with a row limit on each one, to retrieve a large query result in chunks. regards, tom lane
Try adding a sync before the second execute.
I tried inserting a Sync right before the second Execute, this caused an error with the message 'portal "MQ1" does not exist'.
This seems like problematic behavior on its own, regardless of my issues here (Sync shouldn't be causing an implicit close of the portal, should it?).
I'm fairly sure that the query snapshot is established at Bind time,
which means that this SELECT will run with a snapshot that indeed
does not see the effects of the UPDATE.
To my mind there is not a lot of value in performing Bind until you
are ready to do Execute. The only reason the operations are separated
in the protocol is so that you can do multiple Executes with a row limit
on each one, to retrieve a large query result in chunks.
So you would suggest changing my message chain to send Bind right after Execute, right? This would yield the following messages:
P1/P2/D1/B1/E1/D2/B2/E2/S (rather than the current P1/D1/B1/P2/D2/B2/E1/C1/E2/C2/S)
This would mean that I would switch to using named statements and the unnamed portal, rather than the current unnamed statement
and named portals. If I recall correctly, I was under the impression that there are some PostgreSQL performance benefits to using the
unnamed statement over named statements, although I admit I can't find any documentation backing that. Can you confirm that the two
are equivalent performance-wise?
Shay
Shay Rojansky <roji@roji.org> writes: >> To my mind there is not a lot of value in performing Bind until you >> are ready to do Execute. The only reason the operations are separated >> in the protocol is so that you can do multiple Executes with a row limit >> on each one, to retrieve a large query result in chunks. > So you would suggest changing my message chain to send Bind right after > Execute, right? This would yield the following messages: > P1/P2/D1/B1/E1/D2/B2/E2/S (rather than the current > P1/D1/B1/P2/D2/B2/E1/C1/E2/C2/S) > This would mean that I would switch to using named statements and the > unnamed portal, rather than the current unnamed statement > and named portals. If I recall correctly, I was under the impression that > there are some PostgreSQL performance benefits to using the > unnamed statement over named statements, although I admit I can't find any > documentation backing that. Can you confirm that the two > are equivalent performance-wise? Hmm. I do not recall exactly what performance optimizations apply to those two cases; they're probably not "equivalent", though I do not think the difference is major in either case. TBH I was a bit surprised on reading your message to hear that the system would take that sequence at all; it's not obvious that it should be allowed to replace a statement, named or not, while there's an open portal that depends on it. I think you might have more issues with lifespans, since portals go away at commit whereas named statements don't. regards, tom lane
Shay Rojansky <roji@roji.org> writes: >> Try adding a sync before the second execute. > I tried inserting a Sync right before the second Execute, this caused an > error with the message 'portal "MQ1" does not exist'. > This seems like problematic behavior on its own, regardless of my issues > here (Sync shouldn't be causing an implicit close of the portal, should > it?). Sync results in closing the transaction, if you've not explicitly executed a BEGIN. regards, tom lane
<div dir="ltr">Thanks for the help Tom and the others, I'll modify my sequence and report if I encounter any further issues.</div><divclass="gmail_extra"><br /><div class="gmail_quote">On Sun, Oct 4, 2015 at 7:36 PM, Tom Lane <span dir="ltr"><<ahref="mailto:tgl@sss.pgh.pa.us" target="_blank">tgl@sss.pgh.pa.us</a>></span> wrote:<br /><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">ShayRojansky <<a href="mailto:roji@roji.org">roji@roji.org</a>> writes:<br /> >> To my mind there isnot a lot of value in performing Bind until you<br /> >> are ready to do Execute. The only reason the operationsare separated<br /> >> in the protocol is so that you can do multiple Executes with a row limit<br /> >>on each one, to retrieve a large query result in chunks.<br /><br /> > So you would suggest changing my messagechain to send Bind right after<br /> > Execute, right? This would yield the following messages:<br /><br /> >P1/P2/D1/B1/E1/D2/B2/E2/S (rather than the current<br /> > P1/D1/B1/P2/D2/B2/E1/C1/E2/C2/S)<br /><br /> > Thiswould mean that I would switch to using named statements and the<br /> > unnamed portal, rather than the current unnamedstatement<br /> > and named portals. If I recall correctly, I was under the impression that<br /> > there aresome PostgreSQL performance benefits to using the<br /> > unnamed statement over named statements, although I admitI can't find any<br /> > documentation backing that. Can you confirm that the two<br /> > are equivalent performance-wise?<br/><br /></span>Hmm. I do not recall exactly what performance optimizations apply to<br /> those twocases; they're probably not "equivalent", though I do not think<br /> the difference is major in either case. TBH I wasa bit surprised on<br /> reading your message to hear that the system would take that sequence at<br /> all; it's notobvious that it should be allowed to replace a statement,<br /> named or not, while there's an open portal that dependson it.<br /><br /> I think you might have more issues with lifespans, since portals go away<br /> at commit whereasnamed statements don't.<br /><br /> regards, tom lane<br /></blockquote></div><br /></div>
> So you would suggest changing my message chain to send Bind right after
> Execute, right? This would yield the following messages:
> P1/P2/D1/B1/E1/D2/B2/E2/S (rather than the current
> P1/D1/B1/P2/D2/B2/E1/C1/E2/C2/S)
> This would mean that I would switch to using named statements and the
> unnamed portal, rather than the current unnamed statement
> and named portals. If I recall correctly, I was under the impression that
> there are some PostgreSQL performance benefits to using the
> unnamed statement over named statements, although I admit I can't find any
> documentation backing that. Can you confirm that the two
> are equivalent performance-wise?
Hmm. I do not recall exactly what performance optimizations apply to
those two cases; they're probably not "equivalent", though I do not think
the difference is major in either case. TBH I was a bit surprised on
reading your message to hear that the system would take that sequence at
all; it's not obvious that it should be allowed to replace a statement,
named or not, while there's an open portal that depends on it.
One more important piece of information...
The reason Npgsql currently sends P1/D1/B1/P2/D2/B2/E1/C1/E2/C2/S is to avoid deadlocks, I've already discussed this with you in http://www.postgresql.org/message-id/CADT4RqB+fbtQpTE5YLZ0hKb-2K-nGZHM2YbVj0TMC8rQBGfUxA@mail.gmail.com.
Unfortunately, the alternative I proposed above, P1/P2/D1/B1/E1/D2/B2/E2/S, suffers from the same issue: any sequence in which a Bind is sent after a previous Execute is deadlock-prone - Execute causes PostgreSQL to start writing a potentially large dataset, while Bind means the client may be writing a potentially large parameter value.
In other words, unless I'm mistaken it seems there's no alternative but to implement non-blocking I/O at the client side - write until writing would block, switching to reading when that happens. This adds some substantial complexity, especially with .NET's SSL/TLS implementation layer.
Or does anyone see some sort of alternative which I've missed?