Обсуждение: How to create a cursor that is independent of transactions and doesn't calculated when created ?

Поиск
Список
Период
Сортировка

How to create a cursor that is independent of transactions and doesn't calculated when created ?

От
boraldomaster
Дата:
I wonder how quick is cursor without hold.
I have a data set with 10^7 rows.

*begin;
declare mycursor cursor for select * from z;
commit;
* - this takes 3 ms.

*begin;
declare mycursor cursor with hold for select * from z;
commit;
* - this takes 3 s.

Thus, holdable cursor is getting calculated, obviously.
I cannot use cursor without hold as it requires a connection per query +
client.
But I also don't understand - why cursor with hold should start so much
longer than cursor without hold.

If there any possiblity to have a cursor that as fast as cursor without hold
and as transaction-independent as cursor with hold ?



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/How-to-create-a-cursor-that-is-independent-of-transactions-and-doesn-t-calculated-when-created-tp5762401.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


boraldomaster wrote
> If there any possiblity to have a cursor that as fast as cursor without
> hold and as transaction-independent as cursor with hold ?

Depends...

From the documentation - the first source of potential understanding:

http://www.postgresql.org/docs/9.0/interactive/sql-declare.html

"In the current implementation, the rows represented by a held cursor are
copied into a temporary file or memory area so that they remain available
for subsequent transactions."

Can you make that copy take considerably less time OR come up with a better
implementation?

I guess, in theory, you could reduce the startup cost by only creating the
temporary file at commit instead of at the declare.  In the case where you
scroll through the whole cursor once during the transaction as each row is
emitted it could also be cached at that point so only un-emitted rows would
have to be spooled before the commit could finish.

If you provide more of the how/why of what you are doing people may be able
to suggest alternative solutions - though 3 seconds for a 10-million row
cursor/temporary-table does not seem that hideous.

David J.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/How-to-create-a-cursor-that-is-independent-of-transactions-and-doesn-t-calculated-when-created-tp5762401p5762403.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


My use-case is just creating paginated list for a large table.
The first obvious option is offset limit but it works too slow for great
offset.
A lot of topics propose using cursors for that, so I am learning this
possibility.
You will say that there are other possibilities. Yes - but I am trying to
compare all them, therefore learning cursors at the moment. If cursors don't
fit - I will learn other options.

So - for now I see the following.
Unheld cursors are not usable at all for this purpose, as they require a
connection per client + table.
Held cursor - is a bit better.
But bad things.
1. It is created as long as creating temp table. 3s is not an acceptable
reponse time. Even for the 1st query.
2. Held cursor is visible only to connection. So I need to create it for
every connection. This means 3s per connection.
3. To ensure effective cursor usage I need to ensure that any web-client
session wroks with the same DB connection while listing table pages.

Besides - cursor is obviously shows not-up-time results as this is in fact
some old snapshot of data.

All this moves me away from using cursors.

But if held cursor was created as fast as unheld - I could change my
opinion.
I don't understand why is this really impossible.
When I create unheld cursor - it takes 1 ms. Why cannot held cursor do the
same (but store in session - or even better in whole db - anything it stores
in transaction when being unheld).
Even algorythmically - this should be possible.
If I make *select * from z* - it actually shouldn't fetch anything - just
save this query.
When I do *fetch 10 from mycursor*  - it should fetch first 10 records but
not more.
And so on.
So - why is this really impossible ?



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/How-to-create-a-cursor-that-is-independent-of-transactions-and-doesn-t-calculated-when-created-tp5762401p5762543.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


boraldomaster wrote
> But if held cursor was created as fast as unheld - I could change my
> opinion.
> I don't understand why is this really impossible.
> When I create unheld cursor - it takes 1 ms. Why cannot held cursor do the
> same (but store in session - or even better in whole db - anything it
> stores in transaction when being unheld).
> Even algorythmically - this should be possible.
> If I make
*
> select * from z
*
>  - it actually shouldn't fetch anything - just save this query.
> When I do
*
> fetch 10 from mycursor
*
>   - it should fetch first 10 records but not more.
> And so on.
> So - why is this really impossible ?

Your original examples only create the cursor and do not actually use it.
You should be comparing how long it takes both examples to fetch the first
10 pages of records to get a meaningful comparison.  It won't matter if the
DECLARE only takes 3ms in the non-hold case if retrieving the first page of
records take 2s.

Very few things are impossible, and this likely is not one of them, but
unless you don't want to go live with your solution for another couple of
years it likely does not matter whether the held cursor performance can be
improved by re-engineering the code.  The current implementation saves the
hold cursor's results to a temporary area as a snapshot so that it can be
used consistently outside of transactions without the hassle of creating an
actual persistent temporary table.  Basically you save having to send the
entire result-set to the caller but instead cache the results and feed only
a fraction of them at each request.  It does not seem intended to solve the
problem of dynamic fetching in that if it hasn't executed the query then how
is it supposed to know what the "next 10" records are?  This also ignores
the fact that held cursor wants to return the same data that existed at the
time of declaration - which is why a non-held cursor can only work in a
transaction where that state information is saved.  Since that state is
discarded the held version has to compromise by creating a snapshot of all
the data and persist it.

In theory the held cursor could save the meaningful state information and
during fetch the system could pretend it is operating some time in the past.
You are welcome to try and convince someone to explore this and other
theories - I am not that person nor do I have the time/need to perform the
convincing.

I do not have any meaningful experience with your scenario but if a page is
a very small fraction of the total table size then using indexes, limits,
and offsets should give you solid performance probably 98%+ of the time.
Yes, large offsets can be problematic but reverse ordering can help AND in
most use cases the frequency of high page numbers compared to lower ones is
significantly less.

David J.








--
View this message in context:
http://postgresql.1045698.n5.nabble.com/How-to-create-a-cursor-that-is-independent-of-transactions-and-doesn-t-calculated-when-created-tp5762401p5762567.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


David Johnston wrote
> Your original examples only create the cursor and do not actually use it.
> You should be comparing how long it takes both examples to fetch the first
> 10 pages of records to get a meaningful comparison.  It won't matter if
> the DECLARE only takes 3ms in the non-hold case if retrieving the first
> page of records take 2s.

Certainly -
*close all;
begin;
declare mycursor cursor for select * from z order by name;
fetch 10 from mycursor;
commit;
*
 - takes 3 ms all.
Each *fetch 10 from mycursor* takes 1-2 ms.
So this is perfectly optimized in transaction. And I wonder why isn't it
optimized for a whole session.



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/How-to-create-a-cursor-that-is-independent-of-transactions-and-doesn-t-calculated-when-created-tp5762401p5762592.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


boraldomaster wrote
>
> David Johnston wrote
>> Your original examples only create the cursor and do not actually use it.
>> You should be comparing how long it takes both examples to fetch the
>> first 10 pages of records to get a meaningful comparison.  It won't
>> matter if the DECLARE only takes 3ms in the non-hold case if retrieving
>> the first page of records take 2s.
> Certainly -
*
> close all;
> begin;
> declare mycursor cursor for select * from z order by name;
> fetch 10 from mycursor;
> commit;
*
>  - takes 3 ms all.
> Each
*
> fetch 10 from mycursor
*
>  takes 1-2 ms.
> So this is perfectly optimized in transaction. And I wonder why isn't it
> optimized for a whole session.

I am sorry I cannot answer this kind of "why was it implemented in
such-and-such way" question.  Maybe someone from the core programming team
will chime in.

David J.







--
View this message in context:
http://postgresql.1045698.n5.nabble.com/How-to-create-a-cursor-that-is-independent-of-transactions-and-doesn-t-calculated-when-created-tp5762401p5762623.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.