Обсуждение: Firebird 1.0 released
The Firebird guys have gotten around to releasing 1.0. If you read this front page spiel, you'll notice that they use MVCC, but with an overwriting storage manager. http://www.ibphoenix.com/ibp_act_db.html The relevant extract: "Multi-version concurrency control uses back versions of modified and deleted records to maintain a consistent view of data for read transactions. Each record version is tagged with the identifier of the transaction that created it. When a record is modified, the old version of the record is reduced to a "delta record" - a set of differences from the new version - and written to a new location, ordinarily on the same page where it was originally stored. Then the new record overwrites the old. The new record points to the old record. Unless the values of indexed fields are changed, there's no need to update the index. Even if the values have changed, the old values remain in the index to keep the record available to older transactions. The transaction identifier also permits update transactions to recognize updates by concurrent transactions and allows Firebird to dispense with write locks on records. When a transaction encounters a record updated by a concurrent transaction, it waits for the other transaction to complete. If the competing transaction commits, the waiting transaction gets an error. If the competing transaction rolls back, the waiting transaction succeeds. If the competing transaction attempts to update a record that the waiting transaction has modified, a deadlock exists and one or the other will receive an error. Multi-version concurrency replaces a before-image (rollback) log with versions stored in the database. When a transaction fails, its changes remain in the database. The next transaction that reads that record recognizes that the record version is invalid. Depending on the version of Firebird and architecture, that transaction either replaces the invalid record version with its back version or invokes a garbage collect thread. " Chris
"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes: > The Firebird guys have gotten around to releasing 1.0. If you read this > front page spiel, you'll notice that they use MVCC, but with an overwriting > storage manager. Yup. I've had a couple of long chats with Ann Harrison at the recent "OSDB summit" meetings. I think we each came away enlightened about the other implementation, but not in any large hurry to change our own. I did steal at least one idea from her, though. (rummages in CVS logs) ah, here's a hit: 2001-09-29 19:49 tgl * src/backend/access/nbtree/nbtinsert.c: Tweak btree page splitlogic so that when splitting a page that is rightmost on itstreelevel, we split 2/3 to the left and 1/3 to the new right page,rather than the even split we use elsewhere. The ideais that whenfaced with a steadily increasing series of inserted keys (such assequence or timestamp values), we'll endup with a btree that'sabout 2/3ds full not 1/2 full, which is much closer to the desiredsteady-state load for a btree. Per suggestion from Ann Harrison ofIBPhoenix. regards, tom lane
Hi, I was interested in this: Firebird's indexes are very dense because they compress both the prefix and the suffix of each key. Suffix compression is simply the elimination of trailing blanks or zeros, depending on the data type. Suffix compression is performed on each segment of a segmented key. Prefix compression removes the leading bytes that match the previous key. Thus a duplicate value has no key stored at all. Dense storage in indexes minimizes the depth of the btrees, eliminating the advantage of other index types for most data. Do we do this? How feasible is this? On Tuesday 16 April 2002 00:35, Christopher Kings-Lynne wrote: > The Firebird guys have gotten around to releasing 1.0. If you read this > front page spiel, you'll notice that they use MVCC, but with an overwriting > storage manager. -- Denis
I just had an interesting idea. It sounds too easy to beleve, but hear me out and correct me if I'm wrong. Currently, during update, PostgreSQL takes the existing record, modifyies it, and adds it as a new row. The previous record has a pointer to the new version. If the row is updated twice, the original row is hit first, followed by the next version, then the last version. Do I understand this correctly? Now, what if we did it another way, copy the old version of the row into the new row and update the tuple in place? (Space permitting, of course.) That way, performance does not degrade over time, also Vacuum should be easier and less combersome because it simply lops off the end of the list, and mark tuples which are not in any transaction path. Is this a lot of work, is it inherently wrong?
mlw <markw@mohawksoft.com> writes: > Now, what if we did it another way, copy the old version of the row into the > new row and update the tuple in place? I don't think we can get away with moving the extant tuple. If we did, a concurrent scan that should have found the old tuple might miss it. (This is why VACUUM FULL needs exclusive lock to move tuples.) It's fairly unclear whether this would actually buy any performance gain, anyway. In the case of a seqscan I don't see that it makes any difference on average, and in the case of an indexscan what matters is the index ordering not the physical location. (In this connection, btree indexes already do the "right thing", cf comments for _bt_insertonpg.) regards, tom lane
Denis Perchine <dyp@perchine.com> writes: > I was interested in this: > Firebird's indexes are very dense because they compress both the prefix and > the suffix of each key. Suffix compression is simply the elimination of > trailing blanks or zeros, depending on the data type. Suffix compression is > performed on each segment of a segmented key. Prefix compression removes the > leading bytes that match the previous key. Thus a duplicate value has no key > stored at all. Dense storage in indexes minimizes the depth of the btrees, > eliminating the advantage of other index types for most data. > Do we do this? How feasible is this? No, and it seems very bogus to me. With a storage scheme like that, you could not do a random-access search --- the only way to know the true value of each key is if you are doing a linear scan of the page, so that you can keep track of the previous key value by dead reckoning. (This assumes that the first key on each page is stored in full; otherwise it's even worse.) Another problem is that key inserts and deletes get more complex since you have to recompute the following tuple. Life will get interesting if the following tuple expands; you might have to split the page to hold it. (Hmm, is it possible for a *delete* to force a page split under the Firebird scheme? Not sure.) The actual value of leading-byte suppression seems very data-dependent to me, anyway. For example, for an integer key column on a little-endian machine, leading-byte suppression would buy you nothing at all. (Perhaps Firebird's implementation has enough datatype-specific knowledge to trim integer keys at the proper end; I don't know. But I wouldn't want to see us try to push datatype dependencies into our index access methods.) regards, tom lane
On 7.1.x it definitely gets slower even for indexscans. e.g. 60 updates/sec dropping to 30 then to 20 over time. Is this fixed for 7.2? If not, is it possible to make the pointer point to the latest row instead of the most obsolete one, and having the newer rows point to the older ones, instead of the other way round (which seems to be happening with 7.1)? I suppose this could make updates slower - have to update indexes? But selects would be faster (other than cases where there are a lot of uncommitted updates outstanding). If that is not possible (or updating the index too painful), how about having the first pointer point to first row which then points to latest row, which then points to subsequent older rows. That way the miss penalty is reduced. It seems reasonable to me that the newer rows should be more visible- unless more people update rows and then rollback rather than update and then commit. I'm missing something out right? :) Regards, Link. At 09:15 AM 4/16/02 -0400, Tom Lane wrote: >mlw <markw@mohawksoft.com> writes: > > Now, what if we did it another way, copy the old version of the row > into the > > new row and update the tuple in place? > >I don't think we can get away with moving the extant tuple. If we did, >a concurrent scan that should have found the old tuple might miss it. >(This is why VACUUM FULL needs exclusive lock to move tuples.) > >It's fairly unclear whether this would actually buy any performance >gain, anyway. In the case of a seqscan I don't see that it makes any >difference on average, and in the case of an indexscan what matters is >the index ordering not the physical location. (In this connection, >btree indexes already do the "right thing", cf comments for >_bt_insertonpg.)
While it is true that you can't do binary searches on compressed indexes you may get a large payoff with compressed indexes since the index fits in fewer pages and so may be more efficiently cached in the buffer pool. Even a small reduction in io load may compensate for the higher computational demands of a compressed index. Note also that insertion of a key can never cause following entries to increase in size, only remain the same or decrease. Deletion of an entry may cause the following entry to increase in size but never more than the size of the entry deleted so deletes can't cause page splits. -regards richt -----Original Message----- From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org]On Behalf Of Tom Lane Sent: Tuesday, April 16, 2002 9:48 AM To: Denis Perchine Cc: Hackers Subject: Re: [HACKERS] Firebird 1.0 released Denis Perchine <dyp@perchine.com> writes: > I was interested in this: > Firebird's indexes are very dense because they compress both the prefix and > the suffix of each key. Suffix compression is simply the elimination of > trailing blanks or zeros, depending on the data type. Suffix compression is > performed on each segment of a segmented key. Prefix compression removes the > leading bytes that match the previous key. Thus a duplicate value has no key > stored at all. Dense storage in indexes minimizes the depth of the btrees, > eliminating the advantage of other index types for most data. > Do we do this? How feasible is this? No, and it seems very bogus to me. With a storage scheme like that, you could not do a random-access search --- the only way to know the true value of each key is if you are doing a linear scan of the page, so that you can keep track of the previous key value by dead reckoning. (This assumes that the first key on each page is stored in full; otherwise it's even worse.) Another problem is that key inserts and deletes get more complex since you have to recompute the following tuple. Life will get interesting if the following tuple expands; you might have to split the page to hold it. (Hmm, is it possible for a *delete* to force a page split under the Firebird scheme? Not sure.) The actual value of leading-byte suppression seems very data-dependent to me, anyway. For example, for an integer key column on a little-endian machine, leading-byte suppression would buy you nothing at all. (Perhaps Firebird's implementation has enough datatype-specific knowledge to trim integer keys at the proper end; I don't know. But I wouldn't want to see us try to push datatype dependencies into our index access methods.) regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster