Обсуждение: refreshRow is slow

Поиск
Список
Период
Сортировка

refreshRow is slow

От
"John T. Dow"
Дата:
I am having no luck getting responses to a question I posted earlier about optimistic locking with multiple rows.

I'll ask a more targeted question. How do I efficiently refresh a single row in a resultset?

Suppose I allow users to view multiple records with TYPE_SCROLL_INSENSITIVE, CONCUR_READ_ONLY.

The user might select one row or 1,000 rows and scroll back and forth at will.

The user might choose to change one of the rows. I do this with a separate FOR UPDATE query.

The update query will fetch the current row, which might be different from the row in the original resultset. My
programmingcompares the two rows and can report to the user the differences. After the row has been updated, I need to
refreshthat row in the result set so that if the user continues to scroll back and forth, it will see the updated
information.

I am using refreshRow. I do setFetchSize(1) first. This works, but sometimes it is very slow. It can take many seconds
overa remote connection. Can I fix that? What's going on? 

Note: I don't want to use TYPE_SCROLL_SENSITIVE because it would update the row in the result set silently and I
wouldn'tbe able to tell the user that somebody had changed anything. (Besides, I understand that TYPE_SCROLL_SENSITIVE
doesn'twork reliably.) 


Thanks

John


Re: refreshRow is slow

От
Oliver Jowett
Дата:
John T. Dow wrote:
> I am having no luck getting responses to a question I posted earlier about optimistic locking with multiple rows.
>
> I'll ask a more targeted question. How do I efficiently refresh a single row in a resultset?
>
> Suppose I allow users to view multiple records with TYPE_SCROLL_INSENSITIVE, CONCUR_READ_ONLY.
>
> The user might select one row or 1,000 rows and scroll back and forth at will.
>
> The user might choose to change one of the rows. I do this with a separate FOR UPDATE query.
>
> The update query will fetch the current row, which might be different from the row in the original resultset. My
programmingcompares the two rows and can report to the user the differences. After the row has been updated, I need to
refreshthat row in the result set so that if the user continues to scroll back and forth, it will see the updated
information.
>
> I am using refreshRow. I do setFetchSize(1) first. This works, but sometimes it is very slow. It can take many
secondsover a remote connection. Can I fix that? What's going on? 

What's your original query?
What's the query that the driver ends up synthesizing when you call
refreshRow()?
Have you run that query separately under EXPLAIN ANALYZE etc?

Do you have knowledge of the structure of the original query (i.e. are
you building it yourself, or is it user-supplied?) If you do, have you
considered running suitable update/refresh query yourself directly? (The
driver has limited knowledge of your database structure and may not be
able to produce 'good' queries for update/refresh via the resultset
interface)

> Note: I don't want to use TYPE_SCROLL_SENSITIVE because it would update the row in the result set silently and I
wouldn'tbe able to tell the user that somebody had changed anything. (Besides, I understand that TYPE_SCROLL_SENSITIVE
doesn'twork reliably.) 

 From memory the driver doesn't implement TYPE_SCROLL_SENSITIVE at all
(that type of resultset is quite strange..)

-O


Re: refreshRow is slow

От
"John T. Dow"
Дата:
On Fri, 15 Jan 2010 10:59:00 +1300, Oliver Jowett wrote:

>John T. Dow wrote:
>> I am having no luck getting responses to a question I posted earlier about optimistic locking with multiple rows.
>>
>> I'll ask a more targeted question. How do I efficiently refresh a single row in a resultset?
>>
>> Suppose I allow users to view multiple records with TYPE_SCROLL_INSENSITIVE, CONCUR_READ_ONLY.
>>
>> The user might select one row or 1,000 rows and scroll back and forth at will.
>>
>> The user might choose to change one of the rows. I do this with a separate FOR UPDATE query.
>>
>> The update query will fetch the current row, which might be different from the row in the original resultset. My
programmingcompares the two rows and can report to the user the differences. After the row has been updated, I need to
refreshthat row in the result set so that if the user continues to scroll back and forth, it will see the updated
information.
>>
>> I am using refreshRow. I do setFetchSize(1) first. This works, but sometimes it is very slow. It can take many
secondsover a remote connection. Can I fix that? What's going on? 
>



>What's your original query?
>What's the query that the driver ends up synthesizing when you call
>refreshRow()?
>Have you run that query separately under EXPLAIN ANALYZE etc?
>
>Do you have knowledge of the structure of the original query (i.e. are
>you building it yourself, or is it user-supplied?) If you do, have you
>considered running suitable update/refresh query yourself directly? (The
>driver has limited knowledge of your database structure and may not be
>able to produce 'good' queries for update/refresh via the resultset
>interface)




The original query is user supplied. It can range from a query using a primary key that selects a single row to a query
selectinghalf a dozen rows using their primary keys to a complicated query testing ranges on several columns and
resultingin a thousand rows.  If I knew the query would only return a small number of rows, I'd just redo the query.
(That'swhat I have to do with Derby.) But you'd think that if there are a thousand rows, refreshing one row would be
betterthan rereading all of them. That's the theory anyway. That's why I'm hoping refreshrow would be one solution that
fitsall user supplied queries. 

Each row in a postgres database has a unique identifier. I would think that refreshrow could use that to target the
specificrow to be refreshed. 

I realize that I only want to refresh one row, and that refreshrow actually uses the fetch size and may get more than
oneif required. Refreshing many rows using their unique identifiers might not be desirable. 

My program of course knows what the refreshed row should look like. Too bad there's not a way to switch out a row in
theresult set. 



>
>> Note: I don't want to use TYPE_SCROLL_SENSITIVE because it would update the row in the result set silently and I
wouldn'tbe able to tell the user that somebody had changed anything. (Besides, I understand that TYPE_SCROLL_SENSITIVE
doesn'twork reliably.) 
>
> From memory the driver doesn't implement TYPE_SCROLL_SENSITIVE at all
>(that type of resultset is quite strange..)

I kind of got that impression regarding postgres. For another database I was reading about (Oracle?), it was said to
notbe reliable as to whether it would pick up all changes. 


>
>-O
>



Re: refreshRow is slow

От
Oliver Jowett
Дата:
John T. Dow wrote:

> Each row in a postgres database has a unique identifier. I would think that refreshrow could use that to target the
specificrow to be refreshed. 

refreshRow() uses the primary key of the table to synthesize a SELECT to
pick out just the one row, yes. If you explicitly select a column called
"oid" in your query, it'll use that instead of the PK columns.

(I looked at that code and there are a few ways to confuse it. That's
why I want to know what the actual query that's causing problems is.)

>
> I realize that I only want to refresh one row, and that refreshrow actually uses the fetch size and may get more than
oneif required. Refreshing many rows using their unique identifiers might not be desirable. 

Huh? Fetch size should have no effect on refreshRow() - it should only
be selecting a single row anyway.

Again: What is the actual query being synthesized by refreshRow(), and
have you tried analysing that query to work out why it's slow?

Also, if you just want to do an UPDATE and have the freshly updated data
appear in your resultset, why don't you use the ResultSet update
interface? updateRow() populates the ResultSet row with the data sent to
the backend as part of the update.

-O


Re: refreshRow is slow

От
"John T. Dow"
Дата:
I observe the following in the jdbc log.

There's the original read only query, with 722 rows. The actual query was "SELECT * FROM doctor WHERE 1 = 1". Yes, I
knowit could just be "SELECT * FROM doctor" but the query is generated based on user input, and that's my version of a
nullwhere clause. 

After the first row of the result set was displayed, I changed a field and updated the table. The log shows the query
"forupdate" and later a commit. 

Following that, where my program would do the refreshRow on the original resultset, I find a 21 instances of this block
(somelines omitted): 

17:25:50.812 (1) simple execute, handler=org.postgresql.jdbc2.AbstractJdbc2Statement$StatementResultHandler@1142196,
maxRows=0,fetchSize=0, flags=17 
17:25:50.812 (1)  FE=> Parse(stmt=null,query="SELECT attname FROM pg_catalog.pg_attribute WHERE attrelid = $1 AND
attnum= $2",oids={23,23}) 
17:25:50.812 (1)  FE=> Bind(stmt=null,portal=null,$1=<16404>,$2=<1>)
17:25:50.812 (1)  FE=> Describe(portal=null)
17:25:50.812 (1)  FE=> Execute(portal=null,limit=0)
17:25:50.812 (1)  FE=> Sync
17:25:50.812 (1)  <=BE ParseComplete [null]
17:25:50.812 (1)  <=BE BindComplete [null]
17:25:50.812 (1)  <=BE RowDescription(1)
17:25:50.812 (1)  <=BE DataRow
17:25:50.812 (1)  <=BE CommandStatus(SELECT)
17:25:50.812 (1)  <=BE ReadyForQuery(I)

The only difference between the 21 blocks is that the $2 parameter goes 1, 2, 3, 4, ... 21

After the 21st block comes this, and that ends the log.

17:38:47.031 (1) selecting select id, address, citystatezip, combined, degree, email, fax, firstname, groupname,
lastname,nfiid, pabsid, practicetype, ssn, telephone, insertby, insertdate, inserttime, updateby, updatedate,
updatetimefrom doctor where id= ? 
17:38:47.046 (1) simple execute, handler=org.postgresql.jdbc2.AbstractJdbc2Statement$StatementResultHandler@d3c6a3,
maxRows=0,fetchSize=0, flags=17 
17:38:47.046 (1)  FE=> Parse(stmt=null,query="select id, address, citystatezip, combined, degree, email, fax,
firstname,groupname, lastname, nfiid, pabsid, practicetype, ssn, telephone, insertby, insertdate, inserttime, updateby,
updatedate,updatetime from doctor where id= $1",oids={1043}) 
17:38:47.046 (1)  FE=> Bind(stmt=null,portal=null,$1=<0984>)
17:38:47.046 (1)  FE=> Describe(portal=null)
17:38:47.046 (1)  FE=> Execute(portal=null,limit=0)
17:38:47.046 (1)  FE=> Sync
17:38:47.046 (1)  <=BE ParseComplete [null]
17:38:47.046 (1)  <=BE BindComplete [null]
17:38:47.046 (1)  <=BE RowDescription(21)
17:38:47.046 (1)  <=BE DataRow
17:38:47.046 (1)  <=BE CommandStatus(SELECT)
17:38:47.046 (1)  <=BE ReadyForQuery(I)
17:38:47.046 (1) done updates
17:38:49.765 (1)  FE=> Terminate

The value 0984 is the primary key value for the row I changed.  I repeated this with the same original resultset, but
movedto the 25th row before making a change. Again it did 21 instances of this block (the primary key value was of
coursedifferent). 

To me, who knows nothing, the mystery is why 21 repetitions?

I repeated this with another table. The original query had 792 rows. I went to the last row and made a change. This
timethere were 408 repetitions. Again, it used the primary key for the $1 parameter shown above. 

So why 408 repetitions this time?

John


Re: refreshRow is slow

От
Oliver Jowett
Дата:
John T. Dow wrote:

> Following that, where my program would do the refreshRow on the original resultset, I find a 21 instances of this
block(some lines omitted): 
>
> 17:25:50.812 (1) simple execute, handler=org.postgresql.jdbc2.AbstractJdbc2Statement$StatementResultHandler@1142196,
maxRows=0,fetchSize=0, flags=17 
> 17:25:50.812 (1)  FE=> Parse(stmt=null,query="SELECT attname FROM pg_catalog.pg_attribute WHERE attrelid = $1 AND
attnum= $2",oids={23,23}) 
> 17:25:50.812 (1)  FE=> Bind(stmt=null,portal=null,$1=<16404>,$2=<1>)

This is a metadata query, getting the canonical name of each
field/column in your resultset so that updateRow() knows how to build
the refresh query.

> 17:38:47.031 (1) selecting select id, address, citystatezip, combined, degree, email, fax, firstname, groupname,
lastname,nfiid, pabsid, practicetype, ssn, telephone, insertby, insertdate, inserttime, updateby, updatedate,
updatetimefrom doctor where id= ? 
> 17:38:47.046 (1) simple execute, handler=org.postgresql.jdbc2.AbstractJdbc2Statement$StatementResultHandler@d3c6a3,
maxRows=0,fetchSize=0, flags=17 
> 17:38:47.046 (1)  FE=> Parse(stmt=null,query="select id, address, citystatezip, combined, degree, email, fax,
firstname,groupname, lastname, nfiid, pabsid, practicetype, ssn, telephone, insertby, insertdate, inserttime, updateby,
updatedate,updatetime from doctor where id= $1",oids={1043}) 
> 17:38:47.046 (1)  FE=> Bind(stmt=null,portal=null,$1=<0984>)
> 17:38:47.046 (1)  FE=> Describe(portal=null)
> 17:38:47.046 (1)  FE=> Execute(portal=null,limit=0)
> 17:38:47.046 (1)  FE=> Sync

And this is the actual refreshRow() query. I assume that this WHERE
clause looks correct for your table and that query should be pretty fast?

> To me, who knows nothing, the mystery is why 21 repetitions?

21 columns.

> I repeated this with another table. The original query had 792 rows. I went to the last row and made a change. This
timethere were 408 repetitions. Again, it used the primary key for the $1 parameter shown above. 
>
> So why 408 repetitions this time?

408 columns. Is this a very wide SELECT?

I can see that 408 roundtrips would slow things down a lot over a
high-latency connection such as over the internet. Perhaps that's the
cause of the slowdown you see..

-O


Re: refreshRow is slow

От
Oliver Jowett
Дата:
Oliver Jowett wrote:

> This is a metadata query, getting the canonical name of each
> field/column in your resultset so that updateRow() knows how to build
> the refresh query.

Sorry, I meant refreshRow() not updateRow()

-O

Re: refreshRow is slow

От
"John T. Dow"
Дата:
>> I repeated this with another table. The original query had 792 rows. I went to the last row and made a change. This
timethere were 408 repetitions. Again, it used the primary key for the $1 parameter shown above. 
>>
>> So why 408 repetitions this time?
>
>408 columns. Is this a very wide SELECT?
>
>I can see that 408 roundtrips would slow things down a lot over a
>high-latency connection such as over the internet. Perhaps that's the
>cause of the slowdown you see..


While your email was on its way to me, I was looking at the source for refreshrow and came to the conclusion that it's
thenumber of columns. (Actually, 408 is a bit high, but close. Not important.) 

Yes, it is a wide select. Virtually every column in a table is displayable so they all have to be retrieved.

The user who has been complaining (virtually identical code) has a table with only 93 columns, but still that's a lot
ofround trips. 

Certainly that piece of code can be made more efficient. One query ought to be able to return all the primary keys.

I guess something like this won't work, unless "pkey" is reserved: select * from pg_indexes where tablename = 'patinfo'
andindexname = 'patinfo_pkey'. That returns indexdev, which can be parsed to get the columns if needed. 

John


Re: refreshRow is slow

От
Oliver Jowett
Дата:
John T. Dow wrote:

> Certainly that piece of code can be made more efficient. One query ought to be able to return all the primary keys.

It's not the PKs it is after (it's already worked those out earlier),
it's the column names to select to populate the rest of the resultset.

But yes, it could be more efficient. We'd need a static String[]
Field.getColumnNames(Connection,Field[]) method or similar.

-O

Re: refreshRow is slow

От
"John T. Dow"
Дата:
On Sat, 16 Jan 2010 14:23:33 +1300, Oliver Jowett wrote:

>John T. Dow wrote:
>
>> Certainly that piece of code can be made more efficient. One query ought to be able to return all the primary keys.
>
>It's not the PKs it is after (it's already worked those out earlier),
>it's the column names to select to populate the rest of the resultset.



I realized that when I looked more carefully at the code.



>But yes, it could be more efficient. We'd need a static String[]
>Field.getColumnNames(Connection,Field[]) method or similar.


That would be really nice.

Another approach is to reread the entire resultset, but with hundreds of rows that's not pretty. Probably faster than
manyround trips to get the column names. I could have my programming look at the size of the resultset and the number
ofcolumns and decide whether to execute the query again or refreshRow. 

John


Re: refreshRow is slow

От
"John T. Dow"
Дата:
Having decided that refreshRow is going to be too slow sometimes (because it does a query for each column to see if
it'sa primary key), I am for the time being revising my standard application logic to execute the original query again. 

A small complication has developed. There is no guarantee that the rows will be returned in the same order, unless the
orderis explicitly stated in the query. Since my queries are usually user-defined (that is, generated by the
applicationbased on user requests in a non-SQL syntax), I'll have to do something like sort on the primary keys in all
cases.Not necessarily a bad thing to do I suppose. 

But another complication might be confusing to the users. Suppose someone else has added a row. Executing the original
querybrings up more rows than originally. If the user has been scrolling back and forth through the result set,
suddenlya new row will appear. Also possible, someone deletes a row, so it disappears. That is likely to generate a
phonecall to me ("What happened?"). 

It sure would be nice if refreshRow was more efficient in learning which are the primary keys.

John


refreshRow is slow - revisited

От
"John T. Dow"
Дата:
Several months ago, on January 14, I posted a question about JDBC's refreshRow being slow.

The problem is that the resultset code is accessing the server for each column. It needs to know the column names so it
canbuild a query to get the values for that single row. 

It knows the position of the resultset columns in the actual table, but not the column names. Oliver Jowett suggested
thata method like this might help because it would get all column names at once: static String[]
Field.getColumnNames(Connection,Field[])

Having poked around in the code, I notice that not only refreshing rows but also updating column values suffers from
thisproblem, namely a visit to the server for each column to be updated. 

I should think that the proper fix would be to get the names of all the columns when the resultset is created so that
anynumber of updates and refreshes could be performed without accessing the server for names. 

In the meantime, since the names returned by the result set agree with the actual table names if there are no aliases
inthe original list of columns, I am about to try a small modification -- to use the column label (which is in memory)
insteadof the base column name (which isn't). It seems to work in the lab, haven't tried it in the real world yet. 

Question: should I expect something as simple as this to work, or am I overlooking something?

John


Re: refreshRow is slow

От
Kris Jurka
Дата:

On Sat, 16 Jan 2010, Oliver Jowett wrote:

> John T. Dow wrote:
>
>> Certainly that piece of code can be made more efficient. One query ought to
>> be able to return all the primary keys.
>
> It's not the PKs it is after (it's already worked those out earlier), it's
> the column names to select to populate the rest of the resultset.
>
> But yes, it could be more efficient. We'd need a static String[]
> Field.getColumnNames(Connection,Field[]) method or similar.
>

I've adjusted the fetching of all of the ResultSetMetaData attributes to
happen in one query for the entire ResultSet instead of one attribute of
one column at a time.  I've adjusted the updateable ResultSet code to use
ResultSetMetaData to take advantage of this, so refreshRow should be much
faster now.

Kris Jurka