Обсуждение: Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]

Поиск
Список
Период
Сортировка

Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]

От
Thorsten Schöning
Дата:
Hi all,

I have an app exposing web services to generate reports. Those web
services accept multiple reports per request and calculate them
concurrently. There's one transaction spanning each entire request and
used by ALL spawned threads. The app makes sure that e.g. committing
transactions is handled by ONE thread only and individual statements
are NEVER reused by multiple threads.

There are two types of reports: Some need to read data from the DB
themself, others can work on the data provided by the former ones. The
current implementation always creates temporary tables for results of
some query, so that one can simply check if some temporary table is
available and either only query that or create it before as necessary.

I'm using "CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]" and the
associated queries can take a long time. So the following lists some
questions about executing those concurrently, even thouzgh I've
already read threads like the following:

> The bottom line is that CREATE TABLE IF NOT EXISTS doesn't pretend
> to handle concurrency issues any better than regular old CREATE
> TABLE, which is to say not very well.[...]

https://www.postgresql.org/message-id/CA+TgmoZAdYVtwBfp1FL2sMZbiHCWT4UPrzRLNnX1Nb30Ku3-gg@mail.gmail.com

When the table needs to be created, when is it visible to other
threads using the same transaction, before or after executing the
additional query?

Am I correct that with using "IF NOT EXISTS" the associated query is
only executed as well if the table needs to be created? In theory
those two things could be independent of each other, maybe resulting
in duplicated rows or stuff like that per execution. But doesn't seem
so according to my tests.

Am I correct that in my described setup I need to make sure on my own
that only one thread creates each individual temporary table and
executes the associated query? Otherwise it might happen that multiple
attempts creating the table at the same time simply fails and the only
question is if this happens before or after the associated query.

Or would creating the table itself succeeds even with many threads,
but some of them simply wouldn't read any data, becauser the
associated query is executed in a second step by that thread that
created the table in the first place? I don't think so, though.

Am I correct that because of the same transaction used by multiple
threads I need to synchronize them on web service-level? E.g. things
like advisory locks won't work because they have session or
transaction level and would be granted to all threads instantly.

Thanks for your insights!

Mit freundlichen Grüßen,

Thorsten Schöning

--
Thorsten Schöning       E-Mail: Thorsten.Schoening@AM-SoFT.de
AM-SoFT IT-Systeme      http://www.AM-SoFT.de/

Telefon...........05151-  9468- 55
Fax...............05151-  9468- 88
Mobil..............0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow




Re: Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS[...]

От
"Daniel Verite"
Дата:
    Thorsten Schöning wrote:

> I'm using "CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]" and the
> associated queries can take a long time. So the following lists some
> questions about executing those concurrently, even thouzgh I've
> already read threads like the following:

> > The bottom line is that CREATE TABLE IF NOT EXISTS doesn't pretend
> > to handle concurrency issues any better than regular old CREATE
> > TABLE, which is to say not very well.[...]
>
> https://www.postgresql.org/message-id/CA+TgmoZAdYVtwBfp1FL2sMZbiHCWT4UPrzRLNnX1Nb30Ku3-gg@mail.gmail.com


The caveat you mention about IF NOT EXISTS does not apply to
temporary tables, as they're not shared across sessions.
That is, if two concurrent transactions execute at the same time
CREATE TEMP TABLE IF NOT EXISTS foo(...)
it can't fail as described above because that creates two distinct tables,
each private to their session.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite



Re: Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]

От
Thorsten Schöning
Дата:
Guten Tag Daniel Verite,
am Montag, 8. Juni 2020 um 11:05 schrieben Sie:

> The caveat you mention about IF NOT EXISTS does not apply to
> temporary tables, as they're not shared across sessions.[...]

That's what I understood as well, but I'm creating those concurrently
WITHIN one and the same session and transaction. :-) And that's where
I wanted to make sure that I properly understood things using my
questions.

Mit freundlichen Grüßen,

Thorsten Schöning

--
Thorsten Schöning       E-Mail: Thorsten.Schoening@AM-SoFT.de
AM-SoFT IT-Systeme      http://www.AM-SoFT.de/

Telefon...........05151-  9468- 55
Fax...............05151-  9468- 88
Mobil..............0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow




Re: Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS[...]

От
"Daniel Verite"
Дата:
    Thorsten Schöning wrote:

> > The caveat you mention about IF NOT EXISTS does not apply to
> > temporary tables, as they're not shared across sessions.[...]
>
> That's what I understood as well, but I'm creating those concurrently
> WITHIN one and the same session and transaction. :-)

But a SQL session on the server takes its statements from a FIFO queue
and processes them serially, so there's no intra-session concurrency.
In fact multi-threaded SQL clients *must* make sure that they don't
send concurrent queries to the same connection. The best they can
do in terms of throughput is to queue up a new query while
the server is busy executing a previous one, but that's pipelining,
not parallelism.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite



Re: Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS[...]

От
Ravi Krishna
Дата:

> That's what I understood as well, but I'm creating those concurrently
> WITHIN one and the same session and transaction. :-) 

Did I interpret this as "two different sessions via application threads within the same transactions of PG".
Does the thread create its own PG session for each thread or share it across threads.

I would be wary of running concurrent threads within the same transactions. RDBMSs are not designed
to allow multiple SQLstatements running concurrently within the same session, if at all possible.

Re: Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS[...]

От
Ron
Дата:
On 6/8/20 3:14 AM, Thorsten Schöning wrote:
> Hi all,
>
> I have an app exposing web services to generate reports. Those web
> services accept multiple reports per request and calculate them
> concurrently. There's one transaction spanning each entire request and
> used by ALL spawned threads. The app makes sure that e.g. committing
> transactions is handled by ONE thread only and individual statements
> are NEVER reused by multiple threads.
>
> There are two types of reports: Some need to read data from the DB
> themself, others can work on the data provided by the former ones. The
> current implementation always creates temporary tables for results of
> some query, so that one can simply check if some temporary table is
> available and either only query that or create it before as necessary.
>
> I'm using "CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]" and the
> associated queries can take a long time.

Would a set of GLOBAL temporary tables be a better fit for your problem 
(eliminating the need to create temporary tables each time)?

-- 
Angular momentum makes the world go 'round.



Re: Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]

От
Thorsten Schöning
Дата:
Guten Tag Daniel Verite,
am Montag, 8. Juni 2020 um 12:07 schrieben Sie:

> But a SQL session on the server takes its statements from a FIFO queue
> and processes them serially, so there's no intra-session concurrency.
> In fact multi-threaded SQL clients *must* make sure that they don't
> send concurrent queries to the same connection.[...]

I was looking at JDBC-docs and those explicitly allow sharing one
connection by multiple threads, while at the same time not explicitly
forbidding concurrent access.

https://docs.oracle.com/javadb/10.8.3.0/devguide/cdevconcepts23499.html
https://docs.oracle.com/javadb/10.8.3.0/devguide/cdevconcepts89498.html

Therefore I had assumed that concurrent access is available as long as
one follows the recommened practices, which is what I do. Though, that
concurrent access seems to not be available because of additional
synchronization in the JDBC-driver of Postgres.

https://stackoverflow.com/a/52798543/2055163

https://github.com/pgjdbc/pgjdbc/blob/ecbc48ff6ad293a2555465d4f7fb6a33a403af26/pgjdbc/src/main/java/org/postgresql/core/v3/QueryExecutorImpl.java#L276

Is that the case with all JDBC-drivers most likely or simply some
implementation detail of some and others might allow truly concurrent
access within one session?

Nevertheless, I think I can at least partly answer some of my
questions now. So thanks for forcing digging deeper into this than
I obviously did before. :-)

Mit freundlichen Grüßen,

Thorsten Schöning

--
Thorsten Schöning       E-Mail: Thorsten.Schoening@AM-SoFT.de
AM-SoFT IT-Systeme      http://www.AM-SoFT.de/

Telefon...........05151-  9468- 55
Fax...............05151-  9468- 88
Mobil..............0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow




Re: Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]

От
Thorsten Schöning
Дата:
Guten Tag Ron,
am Montag, 8. Juni 2020 um 15:35 schrieben Sie:

> Would a set of GLOBAL temporary tables be a better fit for your problem
> (eliminating the need to create temporary tables each time)?

Do you mean plain tables simply created using CREATE TABLE or is there
some additional concept of temporary tables I'm not aware of? Because I
thought truly temporary tables are always session-specific:

> Temporary tables are automatically dropped at the end of a session,
> or optionally at the end of the current transaction (see ON COMMIT
> below).

https://www.postgresql.org/docs/9.5/sql-createtable.html

It totally depends on runtime and users if those tables are necessary
at all in theory. So the benefit of using session-private once
automatically maintained in my opinion is that I can introduce them
easily as necessary without customizing my schema. My schema and all
changes to it are versioned and I would like to avoid the maintenance
overhead adding temp-tables which can easily change with the
implementation using them.

OTOH, if they are necessary at all at some point, the schema might be
a good place to make aware of them, document their usage and stuff. i
will think about that, so thanks for the hint!

Mit freundlichen Grüßen,

Thorsten Schöning

--
Thorsten Schöning       E-Mail: Thorsten.Schoening@AM-SoFT.de
AM-SoFT IT-Systeme      http://www.AM-SoFT.de/

Telefon...........05151-  9468- 55
Fax...............05151-  9468- 88
Mobil..............0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow




Re: Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS [...]

От
Thorsten Schöning
Дата:
Guten Tag Thorsten Schöning,
am Montag, 8. Juni 2020 um 10:14 schrieben Sie:

> When the table needs to be created, when is it visible to other
> threads using the same transaction, before or after executing the
> additional query?

There is a misconception here: Multiple concurrent exec doesn't seem
to be implemented at all, even though the JDBC-docs seem to allow the
sharing of connections. Execution of statements seem to lock the
underlying connection, preventing truly concurrent access by multiple
threads.

https://stackoverflow.com/a/52798543/2055163

https://github.com/pgjdbc/pgjdbc/blob/ecbc48ff6ad293a2555465d4f7fb6a33a403af26/pgjdbc/src/main/java/org/postgresql/core/v3/QueryExecutorImpl.java#L276
https://docs.oracle.com/javadb/10.8.3.0/devguide/cdevconcepts23499.html
https://docs.oracle.com/javadb/10.8.3.0/devguide/cdevconcepts89498.html

So while things don't work as I originally expected, this makes
answering my question easy: Because "CREATE TEMP TABLE IF NOT EXISTS
[...] AS [...]" is ONE statement only forwarded to the driver, its
execution should be atomic with respect to other threads using the
same connection.

It shouldn't make any difference how long the calculation of the
associated query takes, because all other threads need to wait for the
whole statement anyway. And if some other thread comes with the query
subsequently, "IF NOT EXISTS" comes into play and the query should
succeed instantly.

> Am I correct that with using "IF NOT EXISTS" the associated query is
> only executed as well if the table needs to be created?[...]

An answer to that question would still be of interest to me.

> Am I correct that in my described setup I need to make sure on my own
> that only one thread creates each individual temporary table and
> executes the associated query?[...]

No, not unless truly concurrent access is available in the
JDBC-driver. If it would be OTOH, I guess the answer would be yes.

> Am I correct that because of the same transaction used by multiple
> threads I need to synchronize them on web service-level?[...]

Yes and this should be the most performant implementation anyway. In
case of really one and the same connection, everything is handled by
the same JVM already, so can be synchronized within that JVM without
roundtrips to Postgres as well.

[1]: https://stackoverflow.com/a/24089729/2055163

Mit freundlichen Grüßen,

Thorsten Schöning

--
Thorsten Schöning       E-Mail: Thorsten.Schoening@AM-SoFT.de
AM-SoFT IT-Systeme      http://www.AM-SoFT.de/

Telefon...........05151-  9468- 55
Fax...............05151-  9468- 88
Mobil..............0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow




Re: Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS[...]

От
Ron
Дата:
On 6/8/20 9:23 AM, Thorsten Schöning wrote:
> Guten Tag Ron,
> am Montag, 8. Juni 2020 um 15:35 schrieben Sie:
>
>> Would a set of GLOBAL temporary tables be a better fit for your problem
>> (eliminating the need to create temporary tables each time)?
> Do you mean plain tables simply created using CREATE TABLE or is there
> some additional concept of temporary tables I'm not aware of? Because I
> thought truly temporary tables are always session-specific:

Hmm...  Postgres doesn't support global temporary tables.  Never mind. :)

-- 
Angular momentum makes the world go 'round.



Re: Concurrenctly running CREATE TEMP TABLE IF NOT EXISTS [...] AS[...]

От
Adrian Klaver
Дата:
On 6/8/20 3:31 PM, Ron wrote:
> On 6/8/20 9:23 AM, Thorsten Schöning wrote:
>> Guten Tag Ron,
>> am Montag, 8. Juni 2020 um 15:35 schrieben Sie:
>>
>>> Would a set of GLOBAL temporary tables be a better fit for your problem
>>> (eliminating the need to create temporary tables each time)?
>> Do you mean plain tables simply created using CREATE TABLE or is there
>> some additional concept of temporary tables I'm not aware of? Because I
>> thought truly temporary tables are always session-specific:
> 
> Hmm...  Postgres doesn't support global temporary tables.  Never mind. :)
> 

There is a newly released extension that offers that:

https://www.postgresql.org/message-id/4a390375-0a69-8901-fc5a-4a0336c5c6b4%40darold.net

-- 
Adrian Klaver
adrian.klaver@aklaver.com