PostgreSQL 索引扫描排序运算符
最近看到一个问题,说是在 ExecIndexBuildScanKeys()
函数中的 isorderby
参数在有 ORDER BY
字句时,IndexScan
中的 indexorderby
也为空。例如下面两个查询:
1 | SELECT col FROM table_name ORDER BY index_key op const LIMIT 5; |
这两个查询在执行 ExecIndexBuildScanKeys()
时 IndexScan
结构中的 indexorderby
均为空。然而,IndexScan->indexorderby
的注释为 list of index ORDER BY exprs
,即 ORDER BY
语句后面的索引表达式,第一个 SQL 语句明显有 ORDER BY
字句,那么为什么此时的 indexorderby
为空呢?
分析
我们首先来看看 IndexScan
结构体的定义(代码比较长,仅截取部分内容):
1 | /* ---------------- |
从上面来看好像没有什么问题,ORDER BY
字句中的表达式包含索引时,那么 indexorderby
就不应该为空。那么这个是不是一个 bug 呢?我们接着分析。既然这个地方 indexorderby
为空,那么我们来看看何时创建的 IndexScan
结构,针对 indexorderby
又进行了什么处理。
通过调试,我发现在 create_indexscan_plan()
函数中有处理,如下所示:
1 | static Scan * |
函数 fix_indexorderby_references()
定义如下:
1 | /* |
然而,这里的 index_path->indexorderbys
和 index_path->indexorderbycols
均为空,我们接着分析 index_path
。IndexPath
定义如下所示:
1 | /* |
从上面我们可以看到一个关键的信息 ordering operators
,在官网上搜索 ordering operators
关键字,我们可以找到 Index Scanning 和 Interfacing Extensions to Indexes 里面有相关的描述。
Some access methods return index entries in a well-defined order, others do not. There are actually two different ways that an access method can support sorted output:
Access methods that always return entries in the natural ordering of their data (such as btree) should set amcanorder to true. Currently, such access methods must use btree-compatible strategy numbers for their equality and ordering operators.
Access methods that support ordering operators should set amcanorderbyop to true. This indicates that the index is capable of returning entries in an order satisfying ORDER BY index_key operator constant. Scan modifiers of that form can be passed to amrescan as described previously.
一些访问方法以明确定义的顺序返回索引条目,而另一些则没有。实际上,访问方法可以支持排序输出有两种不同的方式:
- 始终以数据的自然顺序返回条目的访问方法(例如 btree)应将
amcanorder
设置为true
。目前,此类访问方法必须使用与 btree 兼容的策略编号作为其相等和排序运算符 - 支持排序运算符的访问方法应将
amcanorderbyop
设置为true
。这表明索引能够以满足ORDER BY index_key
运算符常量的顺序返回条目。
Some index access methods (currently, only GiST and SP-GiST) support the concept of ordering operators. What we have been discussing so far are search operators. A search operator is one for which the index can be searched to find all rows satisfying
WHERE indexed_column operator constant
. Note that nothing is promised about the order in which the matching rows will be returned. In contrast, an ordering operator does not restrict the set of rows that can be returned, but instead determines their order. An ordering operator is one for which the index can be scanned to return rows in the order represented byORDER BY indexed_column operator constant
. The reason for defining ordering operators that way is that it supports nearest-neighbor searches, if the operator is one that measures distance. For example, a query likeSELECT * FROM places ORDER BY location <-> point '(101,456)' LIMIT 10;
finds the ten places closest to a given target point. A GiST index on the location column can do this efficiently because <-> is an ordering operator.
While search operators have to return Boolean results, ordering operators usually return some other type, such as float or numeric for distances. This type is normally not the same as the data type being indexed. To avoid hard-wiring assumptions about the behavior of different data types, the definition of an ordering operator is required to name a B-tree operator family that specifies the sort ordering of the result data type. As was stated in the previous section, B-tree operator families define PostgreSQL’s notion of ordering, so this is a natural representation. Since the point
<->
operator returns float8, it could be specified in an operator class creation command like this:OPERATOR 15 <-> (point, point) FOR ORDER BY float_ops
where
float_ops
is the built-in operator family that includes operations on float8. This declaration states that the index is able to return rows in order of increasing values of the<->
operator.
一些索引访问方法(目前只有 GiST 和 SP-GiST)支持排序运算符的概念。搜索运算符是一种可以搜索索引以找到满足 WHERE indexed_column operator constant
的所有行的运算符。对于返回的记录其顺序可能是随机的。相反,排序运算符不限制可以返回的记录,而是确定它们的顺序。排序运算符是一种可以扫描其索引以按 ORDER BY indexed_column operator constant
表示的顺序返回行的运算符。以这种方式定义排序运算符的原因是,如果运算符是测量距离的运算符,它可以支持最近邻搜索。例如:
1 | SELECT * FROM places ORDER BY location <-> point '(101,456)' LIMIT 10; |
找到最接近给定目标点的十个地点。location
列上的 GiST 索引可以有效地做到这一点,因为 <->
是一个排序运算符。
搜索运算符必须返回 boolean
类型的值,排序运算符通常返回其它类型,如代表距离的 float
或 numeric
类型。这种类型通常与被索引的数据类型不同。
分析到这里我们大致可以知道为什么之前的 ORDER BY index_key op constant
这里在 IndexScan
中的 indexorderby
为空了,因为之前的示例中的索引上面的操作并不支持排序运算符,因此为空。我们可以通过下面的示例来进行验证。
1 | CREATE TABLE tbl01(id int primary key, location point); |
基于上面的示例,当我们执行 SELECT * FROM tbl01 ORDER BY location <-> point '(0, 0)'
查询时,IndexScan
中的 indexorderby
将不为空。
总结
按照文档来说,PostgreSQL 索引有查找运算符(Search Operator)和排序运算符(Ordering Operator)。我们通常使用到的是查找运算符,排序运算符接触的比较少(至少对于我来说是这样的)。
- 查找运算符:用于查询符合条件的记录,返回记录的顺序是随机的,即有多个记录的值等于查找键,这些记录在第一次可能以
a, b, c
的顺序返回,第二次则以c, a, b
的顺序返回。查找运算符的返回值是boolean
类型。 - 排序运算符:用于对记录进行排序,返回记录的顺序是确定的,多次执行都是一样的。排序运算符的返回值通常是其它类型,如代表距离的
float
或numeric
类型。
参考
[1] https://www.postgresql.org/docs/current/index-scanning.html
[2] https://www.postgresql.org/docs/current/xindex.html
[3] 《PostgreSQL 修炼之道 - 从小工到专家》
一监生过国学门,闻祭酒方盛怒两生而治之,问门上人者,然则打欤?罚欤?镦锁欤?
答曰:“出题考文。”
生即咈然,曰:“咦,罪不至此。”