深入浅出Greenplum Bitmap Index（上篇）

荣光因缘来

发布于 2022-8-3 17:45

8913浏览

0收藏

位图（bitmap）索引是 Greenplum 中所特有（对比 PostgreSQL）的一种索引类型，非常适用于大数据量且数据修改需求不大的数据分析场景（OLAP）中使用。Bitmap 索引可以保证在提供优良查询速度的前提下，使用更小的空间开销，能够有效节省大数据量环境的硬盘空间使用，从而降低系统运行成本。

什么是 Bitmap Index?

Bitmap Index 非常类似于 Reverted Index (转置索引，或倒排索引)，是一种反向索引，使用位图（bitmap）结构来记录某个唯一 Key 在表中的所有位置。通过读取该位图索引，即可一次性获得查询键的所有出现位置。

以下表为例，这是一张人物表，由 ID，姓名，性别和工作城市组成:

深入浅出Greenplum Bitmap Index（上篇）-鸿蒙开发者社区

如果我们在 Sex 列和 City 列上建立 Bitmp 索引的话，则会抽取该列所有的唯一值，并使用 bitmap 记录下其出现的位置:

Sex-M: 11110011
Sex-F: 00001100

City-Shanghai: 10000000
City-Beijing:  01101000
City-Chengdu:  00010101
City-Shenzhen: 000000101.
2.
3.
4.
5.
6.
7.

以 Sex-F 为例，女性员工只出现在了第 5 行和第 6 行，因此在位图中只有第 5 个和第 6 个 bit 被置为 1，其余位则为 0，表示这些位置所保存的 tuple 的性别均为非女性。

当我们查找在北京工作的女性员工:

select name from person where city = 'beijing' and sex = 'F';
1.

Bitmap 索引即可直接将这 2 个查询值所对应的 bitmap 索引项进行求交集操作，一次性得到所有符合查询条件的 tuple 位置位图:

City-Beijing & Sex-F = 01101000 & 00001100 = 00001000
1.

因此，符合查询条件的数据只有 1 条，并且是第 5 条数据 (ID=5)。

Bitmap 索引的每一位 (bit) 都按序对应了表中的一个数据元组，如果该位被设置，则对应的数据元组匹配该键值。另外 Bitmap 索引对于在 where 子句中包含多个条件的查询非常有效: 只满足部分查询条件的数据元组在访问数据表之前就会被过滤掉，这大大提升了查询的运行速度。

因此，相较于传统的 B-Tree 索引而言，当索引建的唯一值数量在 100 到 10000 时，Bitmap Index 将会有更好的性能表现。我们可以通过下面这个简单的例子来对比 B-Tree Index 和 Bitmap Index:

首先建立完全相同的两张表，该表上具有 4 个字段: id 为主键；msg 字段存储随机字符串；foo 字段则随机填入 [0, 100) 之间的整数，bar 字段则随机填入 [0, 1000) 之间的整数。插入测试数据后分别在 foo 字段上和 bar 字段上建立 B-Tree 索引和 Bitmap 索引:

create table test_bitmap(
    id int, msg text, foo int, bar int
) distributed by (id);

insert into test_bitmap (id, msg, foo, bar)
select g, md5(g::text), random() * 100, random() * 1000 from generate_series(1, 10000000) g;

create table test_btree(
    id int, msg text, foo int, bar int
) distributed by (id);
insert into test_btree select * from test_bitmap;

-- create index
create index on test_bitmap using bitmap(foo);
create index on test_bitmap using bitmap(bar);

create index on test_btree using btree(foo);
create index on test_btree using btree(bar);1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.

而后，我们分别在 test_bitmap 和 test_btree 上运行相同的查询:

postgres=# explain (analyze, costs off) select * from test_bitmap where foo = 52 or bar = 520; 
                                              QUERY PLAN                                              
------------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3) (actual time=5.303..987.683 rows=110011 loops=1)
   ->  Bitmap Heap Scan on test_bitmap (actual time=1.176..1052.844 rows=36869 loops=1)
         Recheck Cond: ((foo = 52) OR (bar = 520))
         ->  BitmapOr (actual time=0.888..0.895 rows=1 loops=1)
               ->  Bitmap Index Scan on test_bitmap_foo_idx (actual time=0.634..0.641 rows=1 loops=1)
                     Index Cond: (foo = 52)
               ->  Bitmap Index Scan on test_bitmap_bar_idx (actual time=0.237..0.260 rows=1 loops=1)
                     Index Cond: (bar = 520)
 Planning Time: 0.207 ms
   (slice0)    Executor memory: 34K bytes.
   (slice1)    Executor memory: 2167K bytes avg x 3 workers, 2167K bytes max (seg0).
 Memory used:  128000kB
 Optimizer: Postgres query optimizer
 Execution Time: 1420.923 ms
(14 rows)

postgres=# explain (analyze, costs off) select * from test_btree where foo = 52 or bar = 520; 
                                                 QUERY PLAN                                                  
-------------------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3) (actual time=195.580..1272.439 rows=110011 loops=1)
   ->  Bitmap Heap Scan on test_btree (actual time=219.470..1352.689 rows=36869 loops=1)
         Recheck Cond: ((foo = 52) OR (bar = 520))
         Rows Removed by Index Recheck: 1740525
         ->  BitmapOr (actual time=216.796..216.803 rows=1 loops=1)
               ->  Bitmap Index Scan on test_btree_foo_idx (actual time=180.544..180.552 rows=33551 loops=1)
                     Index Cond: (foo = 52)
               ->  Bitmap Index Scan on test_btree_bar_idx (actual time=30.162..30.169 rows=3351 loops=1)
                     Index Cond: (bar = 520)
 Planning Time: 0.224 ms
   (slice0)    Executor memory: 98K bytes.
   (slice1)    Executor memory: 197132K bytes avg x 3 workers, 197132K bytes max (seg0).
 Memory used:  128000kB
 Optimizer: Postgres query optimizer
 Execution Time: 1726.559 ms
(15 rows)1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.

可以看到，在相同数据并且具有相同的查询下，Bitmap Index 要比 B-Tree Index 更快返回结果。同时，Bitmap Index 将占据更小的磁盘空间:

postgres=# \di+ test_bitmap_foo_idx  
                                 List of relations
 Schema |        Name         | Type  | Owner |    Table    |  Size  | Description 
--------+---------------------+-------+-------+-------------+--------+-------------
 public | test_bitmap_foo_idx | index | smart | test_bitmap | 102 MB | 
(1 row)

postgres=# \di+ test_btree_foo_idx
                                List of relations
 Schema |        Name        | Type  | Owner |   Table    |  Size  | Description 
--------+--------------------+-------+-------+------------+--------+-------------
 public | test_btree_foo_idx | index | smart | test_btree | 213 MB | 
(1 row)1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.

Bitmap Index 存储结构

从原理上来看，Bitmap Index 非常简单。当建立 Bitmap Index 时只需要遍历该列的所有数据，并对所有的唯一值都建立一个位图即可。在查询时，也只需要根据查询的键值找到对应的位图，并逐一读取至内存中。

但是 Bitmap 索引的总体大小是取决于相关数据列上唯一值的数目和总数据的个数，假如说我们在一个邮编列（约 10 万唯一值）上创建索引，当数据量为 1000 万时，朴素实现的 Bitmap 索引将占据 100000 * 1000000 / 8 = 125G 空间，这是一个非常庞大的空间开销。因此，我们必须要对 bitmap 进行压缩，以减少硬盘空间的使用。同时我们也将会看到，正是因为有了 bitmap 的压缩，才使得 Bitmap 索引在实现上有着极高的复杂度。

2.1
HRL 压缩编码

Greenplum Bitmap 索引中的位图采用了 HRL (Hybrid Run Length) 编码方法进行存储。在 HRL 编码中，bitmap 位图被划分成 2 部分: 头信息部分（Header）和值部分（Content）。

头信息中的每一个 bit 表示了值部分对应的“字”是否被压缩，1 表示压缩，0 则表示未压缩。

值部分则存储了压缩字和非压缩字。对于压缩字而言，它的第一位表示了压缩前的值是 0 还是 1，其余各位表示了压缩前的值的原始长度（长度单位为 1 个字长）。

假设我们的字长为 8 位，来看一个实际的例子:

00000000 0000000 01000000 01110100 11111111 11111111 11111111
1.

上述 bitmap 采用 HRL 编码后的结果为:

Header Section:  1001
Content Section: 00000010 01000000 01110100 100000111.
2.

HRL 会对连续的 11111111 或者 00000000，即一个字长的全 1 或者全 0 进行压缩。根据 Header 信息可知 Content 部分的第 1 个和第 4 个字是压缩的，其余字是非压缩的。第 1 个压缩字的首位为 0，其余位的值为 2，因此表示压缩了 2 个 00000000。而第 2 个压缩字的首位为 14，其余位的值为 3，因此表示压缩了 3 个 11111111。Content 中的其余的非压缩字即表示原有数据，无需额外处理。

深入浅出Greenplum Bitmap Index（上篇）-鸿蒙开发者社区

由此可见，对于具有大量的连续 0 或者 1 的 bitmap 向量，HRL 编码将具有非常好的压缩效果。而在具体的实现中，Greenplum 的一个字长为 64 位，即对于一个压缩字来说，可以表示 $2^{63}$ 个长度为 64 的连续 1 或者是连续 0，这足以覆盖现实情况的全部场景。

2.2
索引存储结构

在 Greenplum 中，不管是索引数据还是表数据，都是按页进行存储的，Bitmap 索引也不例外。在实际的存储中，数据以 HRL 编码的方式存储在每一页中。同时由于一个 bitmap 可能非常长，一个 Page 无法保存，因此需要将较长的 bitmap 拆分成较小的 bitmap，并以链表的方式进行连接。

因此一个 Page 除了 PostgreSQL 标准的 PageHeaderData 之外，还需要额外的保存下一个 Page 的指针，以及当前 Page 到底保存了多少个 Word，以此来区分 Header 部分和 Content 部分。这部分的信息封装在 BMBitmapOpaqueData 中:

typedef struct BMBitmapOpaqueData  {
  uint32    bm_hrl_words_used;      /* 当前 page 保存了多少个 words */
  BlockNumber  bm_bitmap_next;         /* 下一页的指针，即 Block Number */
  uint64    bm_last_tid_location;   /* 当前页最后一个 bit 所表示的 TID Location */
} BMBitmapOpaqueData;
typedef BMBitmapOpaqueData *BMBitmapOpaque;1.
2.
3.
4.
5.
6.

这里需要对 TID Location 解释一下。在前面的例子中，我们都在使用第几个第几个数据，比如对于 00010001 来说，第 4 个和第 8 个数据符合要求。但是在实际的 Heap Table 中，tuple 是无序存储的，并且一个 Heap Page 中存储的 tuple 数量也不确定，因此没有办法在 Bitmap Index 中使用“第几个 tuple”这样的表示方法。而唯一能够确定一个 tuple 位置的就只有 TID 了，即 (BlockNumber, Offset)。所以，我们使用如下公式将二维的 TID 转换成一维的 TID Location，并以此作为该 tuple 在 Bitmap Vector 中的具体位置:

TID Location = BlockNumber * BM_MAX_TUPLES_PER_PAGE + Offset
1.

现在来看看 Bitmap Index Page 样子，一共可划分为 4 部分:

PostgreSQL 标准 Page Header (包含 LSN、校验和等信息);
HRL 压缩编码的 Header Section；
HRL 压缩编码的 Content Section；
Sepcial 数据，保存了下一个 Page 指针和当前 Page 保存的 words 个数，封装在 BMBitmapOpaqueData 中。

深入浅出Greenplum Bitmap Index（上篇）-鸿蒙开发者社区

程序使用 read_words() 这一方法将 Bitmap Index Page 中的内容读取到结构体 BMBitmapOpaque 和 BMBitmap 中。BMBitmapOpaque 前面已经介绍过，不再赘述。BMBitmap 只有两个字段，即 Header 和 Content:

/* A page of a compressed bitmap */
typedef struct BMBitmapData {
  BM_HRL_WORD hwords[BM_NUM_OF_HEADER_WORDS];
  BM_HRL_WORD cwords[BM_NUM_OF_HRL_WORDS_PER_PAGE];
} BMBitmapData;
typedef BMBitmapData *BMBitmap;1.
2.
3.
4.
5.
6.

Bitmap Index Page 同样受 BufferPool 的管理，即读取数据时首先将磁盘上的页读取至 BufferPool 中，然后再从 BufferPool 中获取数据。更新索引时也首先更新 Buffer Page，写入 WAL 日志后再由系统决定何时 flush 到硬盘中。

2.3
Bitmap Index 整体存储结构

对于一个 Bitmap 索引来说，每一个唯一值都会有一个 Bitmap List，比如说 City-Shanghai 和 City-Shenzhen，这是两个不同的列值，因此在 Bitmap Index 文件中就至少需要存储 2 个 Bitmap List，如下图所示:

深入浅出Greenplum Bitmap Index（上篇）-鸿蒙开发者社区

我们可以通过 Next Page 指针快速地找到一个 Bitmap Page 的下一页，但是我们如何找到这个“链表”的起始部分呢? 我们可以把每个 Bitmap List 的起始部分收集起来，放入到一个链表中，当我们需要查找某一个键值对应的 Bitmap 时，遍历这个链表即可。但是当我们索引中的唯一值越来越多时，链表遍历的方式就会比较低效，那么我们就可以考虑把键值和其对应的 Bitmap List Header 位置放到一棵 B-Tree 中，加速查找过程。

但是由于 PostgreSQL 中只有 Heap Table 的 B-Tree API 是现成的，也就是说我们需要再建立一张 heap table，然后保存 key->Bitmap Index First Page 的映射，再在 key 上建立 B-Tree Index。同时，为了优化 Bitmap Index 的写入效率以及更好的管理 Bitmap Index，Greenplum 额外引入了另一个结构体: BMLOVItemData。

LOV 是 List Of Value 缩写，也就是字段上的唯一值数组。一个 LOVItem 表示唯一值数组中的某一个具体唯一值，比如 City-Shanghai。在 BMLOVItemData 中保存了 Bitmap Index 的第一页和最后一页，同时保存了 Bitmap Index 的最后一个字，用来优化尾部更新的效率。因此在 heap table 中，保存的是 key->LOV Item 之间的映射关系，而在 LOV Item 中才实际保存了指向 Bitmap Index First Page 的指针，B-Tree 则用于优化 Key 到 LOV Item 的查找效率。整体结构如下图所示:

深入浅出Greenplum Bitmap Index（上篇）-鸿蒙开发者社区也就是说，当我们使用 Bitmap Index 来执行

select * from person where city = 'Shenzhen';
1.

时，首先会使用 “Shenzhen” 这个 Key 在 B-Tree Index 中查找 LOV 项的 TID，然后以此 TID 在 heap table 中找到具体的 tuple，从而取出 LOV 项，再根据 LOV 项的 Bitmap List Header 找到 “Shenzhen” 这个 Key 所在的第一个索引页，然后开始读取 Bitmap Index Page。这里依旧遵循火山模型的”One-Tuple-At-A-Time”，每次向上层节点返回一个 TID。当该页面所有的 words 都被读完时，根据 Next Page 指针找到下一页继续读取，直到没有更多的索引页为止。

2.4
至关重要的 LOV Item

LOV 项在 Bitmap 索引中起到了非常重要的作用，它不仅仅保存了每个 Bitmap List 的起始位置，同时还起到了优化查询和插入速度的作用，现在就来看看其具体实现:

typedef struct BMLOVItemData  {
    /* Bitmap List 的起始和结束页面 BlockNumber */
    BlockNumber    bm_lov_head;
    BlockNumber   bm_lov_tail;

    /* Bitmap Index 的最后一个完整字，由于优化尾部插入 */
    BM_HRL_WORD    bm_last_compword;
    /* Bitmap Index 的最后一个字，由于优化尾部插入 */
    BM_HRL_WORD    bm_last_word;

    /* bm_last_compword 中最后一位(bit)对应的 tid location */
    uint64      bm_last_tid_location;

    /* 最后一个设置为 1 的位对应的 tid location */
    uint64      bm_last_setbit;

    /* bm_last_compword 和 bm_last_word 的压缩字标识 */
    uint8      lov_words_header;
   
} BMLOVItemData;
typedef BMLOVItemData *BMLOVItem;1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.

部分字段的详细解释如下:

bm_last_compword 和 bm_last_word: a. 这两个字段是用来优化 Bitmap Index 的尾部插入（实际上为更新）的。对于 OLAP 型的应用来说，数据极少被删除，因此 tuple 的 tid 是递增的。那么相较于更新 bitmap page，直接更新 LOV 项的效率更高（无需打开 Bitmap Index 文件）。因此，在读取 bitmap 向量时，最后还要把 LOV 项的这两个字一起读上，否则数据将不完整。b. 随着尾部插入的进行，当 bm_last_word 变成一个完整的字时，将会被合并进 bm_last_compword 中。
bm_last_setbit: 在 bitmap 向量尾部插入的时候，可以通过这个字段来直接确定中间有多少个 0（如果可能，这些 0 将会被进行压缩处理）
lov_words_header: 表示 bm_last_word 和 bm_last_compword 是否为压缩字。其值为 1 时表示 bm_last_word 是一个压缩字，值为 2 时则表示 bm_last_compword 是压缩字。

Bitmap Index 读取过程

Bitmap Index 的读取过程要比写入过程简单许多，从读取形式上可以分为两种: 每次返回一个 tuple，以及返回一个 bitmap stream。每次返回一个 tuple 用于索引扫描，即 Index Scan；返回 bitmap stream 则用于 Bitmap Index Scan。我们可以用两个简单的查询语句来触发 Index Scan 和 Bitmap Index Scan:

-- Index Scan
postgres=# explain (analyze, costs off, timing off, summary off) 
select * from test_bitmap where foo = 52;
                                      QUERY PLAN                                       
---------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3) (actual rows=99938 loops=1)
   ->  Index Scan using test_bitmap_foo_idx on test_bitmap (actual rows=33488 loops=1)
         Index Cond: (foo = 52)
 Optimizer: Postgres query optimizer
(4 rows)


-- Bitmap Index Scan
postgres=# explain (analyze, costs off, timing off, summary off) 
select * from test_bitmap where foo = 52 or bar = 520;
                                     QUERY PLAN                                     
------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3) (actual rows=109896 loops=1)
   ->  Bitmap Heap Scan on test_bitmap (actual rows=36763 loops=1)
         Recheck Cond: ((foo = 52) OR (bar = 520))
         ->  BitmapOr (actual rows=1 loops=1)
               ->  Bitmap Index Scan on test_bitmap_foo_idx (actual rows=1 loops=1)
                     Index Cond: (foo = 52)
               ->  Bitmap Index Scan on test_bitmap_bar_idx (actual rows=1 loops=1)
                     Index Cond: (bar = 520)
 Optimizer: Postgres query optimizer
(9 rows)1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.

深入浅出Greenplum Bitmap Index（下篇）

文章转载自公众号：Greenplum中文社区

分类

数据库

标签

Greenplum

数据库

已于2022-8-3 17:45:54修改

51CTO

51CTO博客

51CTO学堂

深入浅出Greenplum Bitmap Index（上篇）

订阅鸿蒙技术特刊，精选内容抢先看