There are several types of indexes in PostgreSQL. Beginners or those who have little to do with the databases use the index that PostgreSQL creates by default (B-tree) or do not create them at all. It is necessary to know what indexes are used in which situations to speed up the execution of queries. In this post, I will try to examine this briefly and simply, without going into details of the internal structure of indexes.
The hash index stores the hash value of the indexed field VALUE, and only supports equal-value queries.
Usage
The hash index is particularly suitable for scenarios where the field VALUE is very long (not suitable for b-tree indexing, because a PAGE of b-tree needs to store at least 3 ENTRY, so it does not support particularly long VALUE), such as very long strings, and users Only equivalent search is required, and hash index is recommended.
Example
B-tree
Usage
B-tree is suitable for all data types, supports sorting, and supports searches greater than, less than, equal to, greater than or equal to, less than or equal to.
The combination of indexing and recursive queries also enables fast sparse retrieval.
Example
GIN
Gin is an inverted index that stores the VALUE or VALUE element of the indexed field, and the list or tree of the row number.
Usage
When you need to search for VALUE in a multi-valued type, it is suitable for multi-valued types, such as arrays, full-text search, TOKEN. (Depending on the type, search for intersect, contain, greater than, left, right, etc.)
When the user’s data is sparse, if you want to search for a value of VALUE, you can adapt the type supported by btree_gin to support ordinary btree. (Supports btree operators)
When the user needs to search by any column, gin supports multi-column expansion to separately establish index fields, and at the same time supports bitmapAnd and bitmapOr merge of internal multi-domain indexes to quickly return the requested data by any column.
Example. Multi-value type search.
Example. Single value sparse data search.
Example. Multi-column arbitrary search.
GIST
GiST stands for Generalized Search Tree.
It is a balanced, tree-structured access method, that acts as a base template in which to implement arbitrary indexing schemes.
B-trees, R-trees and many other indexing schemes can be implemented in GiST.
Usage
GiST is a universal indexing interface. You can use GiST to implement b-tree, r-tree, and other index structures.
Different types support different index retrievals. E.g:
Geometry type, support location search (include, intersect, up, down, left, etc.), sort by distance.
range type, support location search (include, intersect, in left and right, etc.).
IP type, support location search (include, intersect, left and right, etc.).
Spatial type (PostGIS), support location search (include, intersect, up, down, left, etc.), sorted by distance.
scalar type, supports sorting by distance
Example. Geometry type retrieval.
Example. Sort by scalar type
SP-GIST
SP-GiST is an abbreviation for space-partitioned GiST.
SP-GiST supports partitioned search trees, which facilitate development of a wide range of different non-balanced data structures, such as quad-trees, k-d trees, and radix trees (tries).
The common feature of these structures is that they repeatedly divide the search space into partitions that need not be of equal size.
Searches that are well matched to the partitioning rule can be very fast.
SP-GiST is similar to GiST and is a general indexing interface, but SP-GIST uses the method of spatial partitioning, making SP-GiST better support unbalanced data structures, such as quad-trees, k-d tree, radis tree.
Usage
Geometry type, support location search (include, intersect, up, down, left, etc.), sort by distance.
Range type, support location search (include, intersect, in left and right, etc.).
IP type, support location search (include, intersect, left and right, etc.).
Example. Range type search.
BRIN
BRIN index is a block-level index. Unlike B-TREE and other indexes, BRIN records do not record the index details in units of line numbers but record the statistical information of each data block or each continuous data block. Therefore, the BRIN index space occupation is particularly small, and the impact on data writing, updating, and deleting is also small.
Usage
BRIN is a LOSSLY index. When the value of the indexed column is strongly related to physical storage, the BRIN index works very well.
For example, time-series data, creating a BRIN index on a time or sequence field, and performing an equivalent value and range query are very effective.
Example
RUM
Rum is an indexing plug-in, open-source by Postgrespro, suitable for full-text retrieval, and belongs to an enhanced version of GIN.
Enhancements include:
In the RUM index, the position information of lexeme is stored, so when ranking is calculated, there is no need to return to the table query (and GIN needs to return to the table query).
RUM supports phrase search, but GIN cannot.
In a RUM index, users are allowed to store fields VALUE other than ctid (line number) in the posting tree, such as timestamps.
This makes RUM not only support the full-text search supported by GIN, but also supports the calculation of text similarity values, sorting by similarity, etc. At the same time support position matching, for example (speed and passion, you can use “speed” <2> “passion” to match, but GIN index can not do it)
Example
Bloom
The bloom index interface is an index interface constructed by PostgreSQL based on bloom filters. It belongs to the loss index and can converge the result set (exclude results that absolutely do not meet the conditions, and then select the results that meet the conditions from the remaining results). Supports equivalent query for any combination of columns.
Bloom stores signatures. The larger the signature, the more space it consumes, but the more accurate the exclusion. There are pros and cons.
Bloom provides an index access method based on Bloom filters.
A Bloom filter is a space-efficient data structure that is used to test whether an element is a member of a set. In the case of an index access method, it allows fast exclusion of non-matching tuples via signatures whose size is determined at index creation.
This type of index is most useful when a table has many attributes and queries test arbitrary combinations of them.
Usage
The bloom index is suitable for multiple columns and any combination of queries.
Example
Summary
In this post, we looked at the following PostgreSql indexes: