diff -c -r -N doc.orig/src/sgml/config.sgml doc/src/sgml/config.sgml
*** doc.orig/src/sgml/config.sgml Wed Sep 13 13:57:21 2006
--- doc/src/sgml/config.sgml Wed Sep 13 13:57:44 2006
***************
*** 2172,2178 ****
!
--- 2172,2191 ----
!
!
! gin_fuzzy_search_limit (integer)
!
! gin_fuzzy_search_limit> configuration parameter
!
!
!
! Soft upper limit of the size of the returned set by GIN index. For more
! information see .
!
!
!
!
diff -c -r -N doc.orig/src/sgml/filelist.sgml doc/src/sgml/filelist.sgml
*** doc.orig/src/sgml/filelist.sgml Wed Sep 13 13:57:19 2006
--- doc/src/sgml/filelist.sgml Wed Sep 13 13:57:44 2006
***************
*** 78,83 ****
--- 78,84 ----
+
diff -c -r -N doc.orig/src/sgml/geqo.sgml doc/src/sgml/geqo.sgml
*** doc.orig/src/sgml/geqo.sgml Wed Sep 13 13:57:19 2006
--- doc/src/sgml/geqo.sgml Wed Sep 13 13:57:44 2006
***************
*** 49,56 ****
methods (e.g., nested loop, hash join, merge join in
PostgreSQL) to process individual joins
and a diversity of indexes (e.g.,
! B-tree, hash, GiST in PostgreSQL) as access
! paths for relations.
--- 49,56 ----
methods (e.g., nested loop, hash join, merge join in
PostgreSQL) to process individual joins
and a diversity of indexes (e.g.,
! B-tree, hash, GiST and GIN in PostgreSQL) as
! access paths for relations.
diff -c -r -N doc.orig/src/sgml/gin.sgml doc/src/sgml/gin.sgml
*** doc.orig/src/sgml/gin.sgml Thu Jan 1 03:00:00 1970
--- doc/src/sgml/gin.sgml Wed Sep 13 13:57:44 2006
***************
*** 0 ****
--- 1,231 ----
+
+
+
+ GIN Indexes
+
+
+ index
+ GIN
+
+
+
+ Introduction
+
+
+ GIN stands for Generalized Inverted Index. It is
+ an index structure storing a set of (key, posting list) pairs, where
+ 'posting list' is a set of rows in which the key occurs. The
+ row may contains a lot of keys.
+
+
+
+ It is generalized in the sense that a GIN index
+ does not need to be aware of the operation that it accelerates.
+ Instead, it uses custom strategies defined for particular data types.
+
+
+
+ One advantage of GIN is that it allows the development
+ of custom data types with the appropriate access methods, by
+ an expert in the domain of the data type, rather than a database expert.
+ This is much the same advantage as using GiST.
+
+
+
+ The GIN
+ implementation in PostgreSQL is primarily
+ maintained by Teodor Sigaev and Oleg Bartunov, and there is more
+ information on their
+ website.
+
+
+
+
+
+ Extensibility
+
+
+ The GIN interface has a high level of abstraction,
+ requiring the access method implementer to only implement the semantics of
+ the data type being accessed. The GIN layer itself
+ takes care of concurrency, logging and searching the tree structure.
+
+
+
+ All it takes to get a GIN access method working
+ is to implement four user-defined methods, which define the behavior of
+ keys in the tree. In short, GIN combines extensibility
+ along with generality, code reuse, and a clean interface.
+
+
+
+
+
+ Implementation
+
+
+ Internally, GIN consists of a B-tree index constructed
+ over keys, where each key is an element of the indexed value
+ (element of array, for example) and where each tuple in a leaf page is
+ either a pointer to a B-tree over heap pointers (PT, posting tree), or a
+ list of heap pointers (PL, posting list) if the tuple is small enough.
+
+
+
+ There are four methods that an index operator class for
+ GIN must provide (prototypes are in pseudocode):
+
+
+
+
+ int compare( Datum a, Datum b )
+
+
+ Compares keys (not indexed values!) and returns an integer less than
+ zero, zero, or greater than zero, indicating whether the first key is
+ less than, equal to, or greater than the second.
+
+
+
+
+
+ Datum* extractValue(Datum inputValue, uint32 *nkeys)
+
+
+ Returns an array of keys of value to be indexed, nkeys should
+ contain the number of returned keys.
+
+
+
+
+
+ Datum* extractQuery(Datum query, uint32 nkeys,
+ StrategyNumber n)
+
+
+ Returns an array of keys of the query to be executed. n contains
+ strategy number of operation (see ).
+ Depending on n, query may be different type.
+
+
+
+
+
+ bool consistent( bool check[], StrategyNumber n, Datum query)
+
+
+ Returns TRUE if indexed value satisfies query qualifier with strategy n
+ (or may satisfy in case of RECHECK mark in operator class).
+ Each element of the check array is TRUE if indexed value has a
+ corresponding key in the query: if (check[i] == TRUE ) the i-th key of
+ the query is present in the indexed value.
+
+
+
+
+
+
+
+
+
+ GIN tips and trics
+
+
+
+ Create vs insert
+
+
+ In most cases, insertion into GIN index is slow enough
+ due to a lot keys should be inserted per one value. So, for bulk upload
+ data in table it will be useful to drop index and create it
+ after finishing upload.
+
+
+
+
+
+ gin_fuzzy_search_limit
+
+
+ The primary goal of development GIN indices was
+ support for highly scalable, full-text search in
+ PostgreSQL and there are often situations when
+ a full-text search returns a very large set of results. Since reading
+ tuples from the disk and sorting them could take a lot of time, this is
+ unacceptable for production. (Note that the index search itself is very
+ fast.)
+
+
+ Such queries usually contain very frequent words, so the results are not
+ very helpful. To facilitate execution of such queries
+ GIN has a configurable soft upper limit of the size
+ of the returned set, determined by the
+ gin_fuzzy_search_limit GUC variable. It is set to 0 by
+ default (no limit).
+
+
+ If a non-zero search limit is set, then the returned set is a subset of
+ the whole result set, chosen at random.
+
+
+ "Soft" means that the actual number of returned results could slightly
+ differ from the specified limit, depending on the query and the quality
+ of the system's random number generator.
+
+
+
+
+
+
+
+
+ Limitations
+
+
+ GIN doesn't support full scan of index due to it's
+ extremely inefficiency: because of a lot of keys per value,
+ each heap pointer will returned several times.
+
+
+
+ When extractQuery returns zero number of keys, GIN will
+ emit a error: for different opclass and strategy semantic meaning of void
+ query may be different (for example, any array contains void array,
+ but they aren't overlapped with void one), and GIN can't
+ suggest reasonable answer.
+
+
+
+ GIN searches keys only by equality matching. This may
+ be improved in future.
+
+
+
+ Examples
+
+
+ The PostgreSQL source distribution includes
+ GIN classes for one-dimensional arrays of all internal
+ types. The following
+ contrib> modules also contain GIN
+ operator classes:
+
+
+
+
+ intarray
+
+ Enhanced support for int4[]
+
+
+
+
+ tsearch2
+
+ Support for inverted text indexing. This is much faster for very
+ large, mostly-static sets of documents.
+
+
+
+
+
diff -c -r -N doc.orig/src/sgml/indices.sgml doc/src/sgml/indices.sgml
*** doc.orig/src/sgml/indices.sgml Wed Sep 13 13:57:19 2006
--- doc/src/sgml/indices.sgml Wed Sep 13 13:57:44 2006
***************
*** 115,121 ****
PostgreSQL provides several index types:
! B-tree, Hash, and GiST. Each index type uses a different
algorithm that is best suited to different types of queries.
By default, the CREATE INDEX command will create a
B-tree index, which fits the most common situations.
--- 115,121 ----
PostgreSQL provides several index types:
! B-tree, Hash, GIN and GiST. Each index type uses a different
algorithm that is best suited to different types of queries.
By default, the CREATE INDEX command will create a
B-tree index, which fits the most common situations.
***************
*** 236,241 ****
--- 236,272 ----
Many other GiST operator
classes are available in the contrib> collection or as separate
projects. For more information see .
+
+
+
+ index
+ GIN
+
+
+ GIN
+ index
+
+ GIN is a inverted index and it's usable for values which have more
+ than one key, arrays for example. Like to GiST, GIN may support
+ many different user-defined indexing strategies and the particular
+ operators with which a GIN index can be used vary depending on the
+ indexing strategy.
+ As an example, the standard distribution of
+ PostgreSQL includes GIN operator classes
+ for one-dimentional arrays, which support indexed
+ queries using these operators:
+
+
+ <@
+ @>
+ =
+ &&
+
+
+ (See for the meaning of
+ these operators.)
+ Another GIN operator classes are available in the contrib>
+ tsearch2 and intarray modules. For more information see .
diff -c -r -N doc.orig/src/sgml/mvcc.sgml doc/src/sgml/mvcc.sgml
*** doc.orig/src/sgml/mvcc.sgml Wed Sep 13 13:57:20 2006
--- doc/src/sgml/mvcc.sgml Wed Sep 13 13:57:44 2006
***************
*** 987,992 ****
--- 987,1007 ----
+
+
+
+ GIN indexes
+
+
+
+ Short-term share/exclusive page-level locks are used for
+ read/write access. Locks are released immediately after each
+ index row is fetched or inserted. But note, that GIN index
+ usually requires produce several inserts per one row, so,
+ GIN makes more work per one value's insertion.
+
+
+
***************
*** 995,1001 ****
applications; since they also have more features than hash
indexes, they are the recommended index type for concurrent
applications that need to index scalar data. When dealing with
! non-scalar data, B-trees are not useful, and GiST indexes should
be used instead.
--- 1010,1016 ----
applications; since they also have more features than hash
indexes, they are the recommended index type for concurrent
applications that need to index scalar data. When dealing with
! non-scalar data, B-trees are not useful, and GiST or GIN indexes should
be used instead.
diff -c -r -N doc.orig/src/sgml/ref/create_opclass.sgml doc/src/sgml/ref/create_opclass.sgml
*** doc.orig/src/sgml/ref/create_opclass.sgml Wed Sep 13 13:57:17 2006
--- doc/src/sgml/ref/create_opclass.sgml Wed Sep 13 13:57:44 2006
***************
*** 192,198 ****
The data type actually stored in the index. Normally this is
the same as the column data type, but some index methods
! (only GiST at this writing) allow it to be different. The
STORAGE> clause must be omitted unless the index
method allows a different type to be used.
--- 192,198 ----
The data type actually stored in the index. Normally this is
the same as the column data type, but some index methods
! (GIN and GiST for now) allow it to be different. The
STORAGE> clause must be omitted unless the index
method allows a different type to be used.
diff -c -r -N doc.orig/src/sgml/xindex.sgml doc/src/sgml/xindex.sgml
*** doc.orig/src/sgml/xindex.sgml Wed Sep 13 13:57:21 2006
--- doc/src/sgml/xindex.sgml Wed Sep 13 14:01:06 2006
***************
*** 243,248 ****
--- 243,286 ----
+ GIN indexes are similar to GiST in flexibility: it hasn't a fixed set
+ of strategies. Instead, the consistency> support routine
+ interprets the strategy numbers accordingly with operator class
+ definition. As an example, strategies of operator class over arrays
+ is shown in .
+
+
+
+ GiST Two-Dimensional R-tree> Strategies
+
+
+
+ Operation
+ Strategy Number
+
+
+
+
+ overlap
+ 1
+
+
+ contains
+ 2
+
+
+ is contained by
+ 3
+
+
+ equal
+ 4
+
+
+
+
+
+
Note that all strategy operators return Boolean values. In
practice, all operators defined as index method strategies must
return type boolean, since they must appear at the top
***************
*** 349,380 ****
! consistent
1
! union
2
! compress
3
! decompress
4
! penalty
5
! picksplit
6
! equal
7
--- 387,465 ----
! consistent - determine whether key satifies the
! query qualifier
1
! union - compute union of of a set of given keys
2
! compress - computes a compressed representation of a key or value
! to be indexed
3
! decompress - computes a decompressed representation of a
! compressed key
4
! penalty - compute penalty for inserting new key into subtree
! with given subtree's key
5
! picksplit - determine which entries of a page are to be moved
! to the new page and compute the union keys for resulting pages
6
! equal - compare two keys and returns true if they are equal
!
7
+
+
+
+
+
+
+ GIN indexes require four support functions,
+ shown in .
+
+
+
+ GIN Support Functions
+
+
+
+ Function
+ Support Number
+
+
+
+
+
+ compare - Compare two keys and return an integer less than zero, zero, or
+ greater than zero, indicating whether the first key is less than, equal to,
+ or greater than the second.
+
+ 1
+
+
+ extractValue - extract keys from value to be indexed
+ 2
+
+
+ extractQuery - extract keys from query
+ 3
+
+
+ consistent - determine whether value matches by the
+ query
+ 4
+