网站建设开题报告ppt,wordpress title description,wordpress服务器搬迁,做网站江门标签 PostgreSQL , Oracle , index skip scan , 非驱动列条件 , 递归查询 , 子树 背景 对于输入条件在复合索引中为非驱动列的#xff0c;如何高效的利用索引扫描#xff1f; 在Oracle中可以使用index skip scan来实现这类CASE的高效扫描#xff1a; INDEX跳跃扫描一般用在W… 标签 PostgreSQL , Oracle , index skip scan , 非驱动列条件 , 递归查询 , 子树 背景 对于输入条件在复合索引中为非驱动列的如何高效的利用索引扫描 在Oracle中可以使用index skip scan来实现这类CASE的高效扫描 INDEX跳跃扫描一般用在WHERE条件里面没有使用到引导列但是用到了引导列以外的其他列并且引导列的DISTINCT值较少的情况。 在这种情况下数据库把这个复合索引逻辑上拆散为多个子索引依次搜索子索引中非引导列的WHERE条件里面的值。 使用方法如下: /* INDEX_SS ( [ qb_name ] tablespec [ indexspec [ indexspec ]... ] ) */ The INDEX_SS hint instructs the optimizer to perform an index skip scan for the specified table. If the statement uses an index range scan, then Oracle scans the index entries in ascending order of their indexed values. In a partitioned index, the results are in ascending order within each partition.Each parameter serves the same purpose as in INDEX Hint. For example: SELECT /* INDEX_SS(e emp_name_ix) */ last_name FROM employees e WHERE first_name Steven; 下面是来自ORACLE PERFORMANCE TUNING里的原文 Index skip scans improve index scans by nonprefix columns. Often, scanning index blocks is faster than scanning table data blocks. Skip scanning lets a composite index be split logically into smaller subindexes. In skip scanning, the initial column of the composite index is not specified in the query. In other words, it is skipped. The number of logical subindexes is determined by the number of distinct values in the initial column. Skip scanning is advantageous if there are few distinct values in the leading column of the composite index and many distinct values in the nonleading key of the index. Example 13-5 Index Skip Scan Consider, for example, a table employees(
sex,
employee_id,
address
) with a composite index on (sex, employee_id). Splitting this composite index would result in two logical subindexes, one for M and one for F. For this example, suppose you have the following index data: (F,98)(F,100)(F,102)(F,104)(M,101)(M,103)(M,105) The index is split logically into the following two subindexes: The first subindex has the keys with the value F. The second subindex has the keys with the value M The column sex is skipped in the following query: SELECT * FROM employeesWHERE employee_id 101; A complete scan of the index is not performed, but the subindex with the value F is searched first, followed by a search of the subindex with the value M. PostgreSQL 非skip scan PostgreSQL支持非驱动列的索引扫描但是需要扫描整个索引。 例子 1、创建测试表 postgres# create table t(id int, c1 int);
CREATE TABLE 2、写入1000万测试数据 postgres# insert into t select random()*1 , id from generate_series(1,10000000) id;
INSERT 0 10000000 3、创建多列索引 postgres# create index idx_t on t(id,c1);
CREATE INDEX 4、非驱动列查询测试如下 index only scan postgres# explain (analyze,verbose,timing,costs,buffers) select * from t where c11; QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------- Index Only Scan using idx_t on public.t (cost10000000000.43..10000105164.89 rows1 width8) (actual time0.043..152.288 rows1 loops1) Output: id, c1 Index Cond: (t.c1 1) Heap Fetches: 0 Buffers: shared hit27326 Execution time: 152.328 ms
(6 rows) index scan postgres# explain (analyze,verbose,timing,costs,buffers) select * from t where c11; QUERY PLAN
----------------------------------------------------------------------------------------------------------------------- Index Scan using idx_t on public.t (cost0.43..105165.99 rows1 width8) (actual time0.022..151.845 rows1 loops1) Output: id, c1 Index Cond: (t.c1 1) Buffers: shared hit27326 Execution time: 151.881 ms
(5 rows) bitmap scan postgres# explain (analyze,verbose,timing,costs,buffers) select * from t where c11; QUERY PLAN
------------------------------------------------------------------------------------------------------------------------ Bitmap Heap Scan on public.t (cost105164.88..105166.00 rows1 width8) (actual time151.731..151.732 rows1 loops1) Output: id, c1 Recheck Cond: (t.c1 1) Heap Blocks: exact1 Buffers: shared hit27326 - Bitmap Index Scan on idx_t (cost0.00..105164.88 rows1 width0) (actual time151.721..151.721 rows1 loops1) Index Cond: (t.c1 1) Buffers: shared hit27325 Execution time: 151.777 ms
(9 rows) seq scan(全表扫描) postgres# explain (analyze,verbose,timing,costs,buffers) select * from t where c11; QUERY PLAN
--------------------------------------------------------------------------------------------------------- Seq Scan on public.t (cost0.00..169248.41 rows1 width8) (actual time0.014..594.535 rows1 loops1) Output: id, c1 Filter: (t.c1 1) Rows Removed by Filter: 9999999 Buffers: shared hit44248 Execution time: 594.568 ms
(6 rows) 使用索引扫因为不需要FILTER同时扫描的BLOCK更少所以性能比全表扫略好。但是还是扫了整个索引的PAGE所以并不能算skip scan。 那么如何让PostgreSQL支持index skip scan呢 PostgreSQL skip scan 实际上原理和Oracle类似可以输入驱动列条件然后按多个条件扫描这样就能达到SKIP SCAN的效果。即多颗子树扫描。 同样也更加适合于驱动列DISTINCT值较少的情况。 用PostgreSQL的递归查询语法可以实现这样的加速效果。这种方法也被用于获取count(distinct), distinct值等。 《distinct xx和count(distinct xx)的变态递归优化方法 - 索引收敛(skip scan)扫描》 例如我们通过这个方法可以快速的得到驱动列的唯一值 with recursive skip as ( ( select min(t.id) as id from t where t.id is not null ) union all ( select (select min(t.id) as id from t where t.id s.id and t.id is not null) from skip s where s.id is not null ) -- 这里的where s.id is not null 一定要加,否则就死循环了.
)
select id from skip ; 然后封装到如下SQL实现skip scan的效果 explain (analyze,verbose,timing,costs,buffers) select * from t where id in
(
with recursive skip as ( ( select min(t.id) as id from t where t.id is not null ) union all ( select (select min(t.id) as id from t where t.id s.id and t.id is not null) from skip s where s.id is not null ) -- 这里的where s.id is not null 一定要加,否则就死循环了.
)
select id from skip
) and c11
union all
select * from t where id is null and c11; 或者 explain (analyze,verbose,timing,costs,buffers) select * from t where id any(array
(
with recursive skip as ( ( select min(t.id) as id from t where t.id is not null ) union all ( select (select min(t.id) as id from t where t.id s.id and t.id is not null) from skip s where s.id is not null ) -- 这里的where s.id is not null 一定要加,否则就死循环了.
)
select id from skip
)) and c11
union all
select * from t where id is null and c11; 看执行计划 效果好多了 QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Append (cost55.00..215.22 rows2 width8) (actual time0.127..0.138 rows1 loops1) Buffers: shared hit21 - Nested Loop (cost55.00..213.64 rows1 width8) (actual time0.126..0.127 rows1 loops1) Output: t.id, t.c1 Buffers: shared hit18 - HashAggregate (cost54.57..55.58 rows101 width4) (actual time0.108..0.109 rows3 loops1) Output: skip.id Group Key: skip.id Buffers: shared hit11 - CTE Scan on skip (cost51.29..53.31 rows101 width4) (actual time0.052..0.102 rows3 loops1) Output: skip.id Buffers: shared hit11 CTE skip - Recursive Union (cost0.46..51.29 rows101 width4) (actual time0.050..0.099 rows3 loops1) Buffers: shared hit11 - Result (cost0.46..0.47 rows1 width4) (actual time0.049..0.049 rows1 loops1) Output: $1 Buffers: shared hit4 InitPlan 3 (returns $1) - Limit (cost0.43..0.46 rows1 width4) (actual time0.045..0.046 rows1 loops1) Output: t_3.id Buffers: shared hit4 - Index Only Scan using idx_t on public.t t_3 (cost0.43..205165.21 rows10000033 width4) (actual time0.045..0.045 rows1 loops1) Output: t_3.id Index Cond: (t_3.id IS NOT NULL) Heap Fetches: 0 Buffers: shared hit4 - WorkTable Scan on skip s (cost0.00..4.88 rows10 width4) (actual time0.015..0.015 rows1 loops3) Output: (SubPlan 2) Filter: (s.id IS NOT NULL) Rows Removed by Filter: 0 Buffers: shared hit7 SubPlan 2 - Result (cost0.46..0.47 rows1 width4) (actual time0.018..0.019 rows1 loops2) Output: $3 Buffers: shared hit7 InitPlan 1 (returns $3) - Limit (cost0.43..0.46 rows1 width4) (actual time0.018..0.018 rows0 loops2) Output: t_2.id Buffers: shared hit7 - Index Only Scan using idx_t on public.t t_2 (cost0.43..76722.42 rows3333344 width4) (actual time0.017..0.017 rows0 loops2) Output: t_2.id Index Cond: ((t_2.id s.id) AND (t_2.id IS NOT NULL)) Heap Fetches: 0 Buffers: shared hit7 - Index Only Scan using idx_t on public.t (cost0.43..1.56 rows1 width8) (actual time0.005..0.005 rows0 loops3) Output: t.id, t.c1 Index Cond: ((t.id skip.id) AND (t.c1 1)) Heap Fetches: 0 Buffers: shared hit7 - Index Only Scan using idx_t on public.t t_1 (cost0.43..1.56 rows1 width8) (actual time0.010..0.010 rows0 loops1) Output: t_1.id, t_1.c1 Index Cond: ((t_1.id IS NULL) AND (t_1.c1 1)) Heap Fetches: 0 Buffers: shared hit3 Execution time: 0.256 ms
(56 rows) 从150多毫秒降低到了0.256毫秒 内核层面优化 与Oracle做法类似或者说与递归的做法类似。 使用这种方法来改进优化器可以达到index skip scan的效果而且不用改写SQL。 参考 《distinct xx和count(distinct xx)的变态递归优化方法 - 索引收敛(skip scan)扫描》