【我和openGauss的故事】openGauss 列存、Ustore、MOT、并行测试

老老老JR老北

发布于 2023-9-6 15:18

浏览

0收藏

一、列存储

数据库列存储（columnar storage）与传统的行存储（row-based storage）相比，有以下几个优势：

数据压缩效率更高：列存储方式按列存储数据值，而不是按行存储数据记录。这种存储方式更适合数据压缩，因为相同值的数据会在列中重复存储，从而大大提高了数据的压缩效率。这样可以减少存储空间的需求，并且可以在更少的 I/O 操作中读取更多的数据，提高查询性能。

查询性能更优：当查询涉及到多列时，列存储方式可以更有效地利用所需数据的物理局部性，即将相邻的数据保持在存储器的相邻位置，从而减少磁盘 I/O 操作的次数，提高查询性能。此外，列存储方式也可以更好地支持列过滤、列排序、分组或聚合等操作，这些操作的效率比行存储方式更高。

更好的可扩展性：列存储方式可以更有效地处理大规模数据集。当数据规模不断扩大时，列存储方式可以更好的处理多个并发查询，因为每个查询只需要读取所需的列数据，而不需要读取完整的行数据。这样可以提高查询的并发度和吞吐量，并为数据库的水平扩展提供更好的支持。

虽然列存储方式在某些情况下可以提供更好的性能和可扩展性，但它在一些方面也有一些限制，例如对于更新和删除操作的性能、对于单行查询的性能等方面会有一定的影响。因此，在选择数据库存储引擎和数据压缩方式时，需要综合考虑不同的场景和需求，选择最适合的方式来存储和处理数据。

opengauss开发列存功能可以很好的应对olap应用场景。

1、准备数据

PanWeiDB=# create table t3 WITH (ORIENTATION = COLUMN) as select * from pg_tables where 1=2;
ERROR:  type "name" is not supported in column store1.
2.

对与列存表，对数据类型有很多限制，需要注意。使用列存表，需要在建表时指定存储方式。

anWeiDB=# create table t3 WITH (ORIENTATION = COLUMN) as select * from t1 where 1=2;
INSERT 0 0
PanWeiDB=# insert into t3 select * from t1;
INSERT 0 2
PanWeiDB=# insert into t3 select * from t3;
INSERT 0 21.
2.
3.
4.
5.
6.

测试发现无法通过CTAS方式将行存表直接转换为列存表，不过可以通过CTAS方式创建表结构然后通过insert into的方式插入数据，也算是一个将行存表转换为列存表的方式吧。

2、对比

t1为行存表，t3为列存表

PanWeiDB=# explain select count(*) from t1;
                             QUERY PLAN                             
--------------------------------------------------------------------
 Aggregate  (cost=671391.64..671391.65 rows=1 width=8)
   ->  Seq Scan on t1  (cost=0.00..609663.51 rows=24691251 width=0)
(2 rows)
   
PanWeiDB=# explain select count(*) from t3;
                                 QUERY PLAN                                  
-----------------------------------------------------------------------------
 Row Adapter  (cost=341186.03..341186.03 rows=1 width=8)
   ->  Vector Aggregate  (cost=341186.02..341186.03 rows=1 width=8)
         ->  CStore Scan on t3  (cost=0.00..173413.86 rows=67108864 width=0)
(3 rows)1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.

对于count函数，t1行存表cost为671391.64，t3列存为341186.03，效率提升一倍。

PanWeiDB=# explain select sum(id) from t1;
                             QUERY PLAN                              
---------------------------------------------------------------------
 Aggregate  (cost=671391.64..671391.65 rows=1 width=64)
   ->  Seq Scan on t1  (cost=0.00..609663.51 rows=24691251 width=32)
(2 rows)

PanWeiDB=# explain select sum(id) from t3;
                                  QUERY PLAN                                  
------------------------------------------------------------------------------
 Row Adapter  (cost=341186.03..341186.03 rows=1 width=43)
   ->  Vector Aggregate  (cost=341186.02..341186.03 rows=1 width=43)
         ->  CStore Scan on t3  (cost=0.00..173413.86 rows=67108864 width=11)
(3 rows)1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.

对于sum函数，效率依然提升一倍

PanWeiDB=#  explain select * from t1;
                          QUERY PLAN                           
---------------------------------------------------------------
 Seq Scan on t1  (cost=0.00..609663.51 rows=24691251 width=90)
(1 row)

PanWeiDB=#  explain select * from t3;
                               QUERY PLAN                               
------------------------------------------------------------------------
 Row Adapter  (cost=173413.86..173413.86 rows=67108864 width=13)
   ->  CStore Scan on t3  (cost=0.00..173413.86 rows=67108864 width=13)
(2 rows)1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.

全表扫描情况下，列存表明显效率不如行存数据。

二、Ustore

Ustore存储引擎，又名In-place Update存储引擎（原地更新），是openGauss 内核新增的一种存储模式。此前的版本使用的行存储引擎是Append Update（追加更新）模式

Append Update（追加更新）和 In-place Update（原地更新）是两种不同的存储引擎策略，用于处理数据库中的更新操作。

Append Update：Append Update 存储引擎策略将更新操作视为一种追加操作，即将新的数据追加到已有的数据之后。这种方式适合于写操作频率较高、更新操作较少的场景。在 Append Update 中，旧数据不直接被修改或删除，而是继续存储，新数据将追加到数据集的末尾。这样可以避免数据的移动和重建，提高写入的性能，并且可以实现快速的回滚和历史数据的查询。

In-place Update：In-place Update 存储引擎策略将更新操作视为一种就地修改操作，即直接在原有位置上进行数据的更新。这种方式适用于需要频繁更新和随机访问的场景。在 In-place Update 中，数据库系统会在原有位置上修改被更新的数据，而不是追加新的数据。这可以减少存储空间的占用，并且支持更高的并发性能。然而，In-place Update 可能涉及到数据的移动和重建，特别是在更新操作导致数据大小变化时，可能需要重新分配和调整存储空间。

总结而言，Append Update 和 In-place Update 存储引擎策略之间的主要差距在于对于更新操作的处理方式。Append Update 适用于写操作频率较高、更新操作较少的场景，并且支持快速的回滚和历史数据查询；而 In-place Update 适用于需要频繁更新和随机访问的场景，并且可以减少存储空间的占用。

1、建表

USTORE存储引擎含有undo log，创建USTORE存储引擎表的时候需要提前在postgresql.conf中配置undo_zone_count的值，该参数代表的时候undo log的一种资源个数，建议配置为16384，即“undo_zone_count=16384”，配置完成后要重启数据库。

PanWeiDB=# create table t4 with (storage_type=ustore) as select * from t1 where 1=2;
INSERT 0 01.
2.

2、开启默认

enable_default_ustore_table=on1.

3、创建索引

create index ubt_idx on test(age);1.

如果表使用ustore,创建索引默认使用ustore

create index ubt_idx on test using ubtree(age);1.

指定存储方式

三、MOT

MOT类似与oracle的in memory功能，同时它围绕并发内存使用管理进行了优化。数据存储、访问和处理算法从头开始设计，以利用内存和高并发计算的最新先进技术。可以考虑将操作频繁表使用MOT功能提升效率。

PanWeiDB=# GRANT USAGE ON FOREIGN SERVER mot_server TO cy;
GRANT

create FOREIGN table test(x int) server mot_server;

create foreign table t6(id number,name char(20)) server mot_server;


PanWeiDB=> create foreign table t6(id number,name char(20)) server mot_server;
ERROR:  Cannot create MOT tables while incremental checkpoint is enabled.

PanWeiDB=> gs_guc set -N all -I all -c "enable_incremental_checkpoint=off"



PanWeiDB=> insert into t6 select * from t4;
ERROR:  Cross storage engine query is not supported
PanWeiDB=> insert into t6 select * from t3;
ERROR:  Cross storage engine query is not supported
PanWeiDB=> insert into t6 values(1,'A');
INSERT 0 1


PanWeiDB=> \d+
                                                     List of relations
 Schema |  Name   |     Type      | Owner |    Size    |                       Storage                        | Description 
--------+---------+---------------+-------+------------+------------------------------------------------------+-------------
 cy     | t6      | foreign table | cy    | 127 kB     |                                                      | 
 public | student | table         | omm   | 8192 bytes | {orientation=row,compression=no}                     | 
 public | t1      | table         | omm   | 2835 MB    | {orientation=row,compression=no}                     | 
 public | t2      | table         | omm   | 605 MB     | {orientation=row,compression=no}                     | 
 public | t3      | table         | omm   | 18 MB      | {orientation=column,compression=low}                 | 
 public | t4      | table         | omm   | 1821 MB    | {orientation=row,storage_type=ustore,compression=no} | 
 public | t5      | table         | omm   | 2693 MB    | {orientation=row,compression=no}                     | 
(7 rows)

drop FOREIGN table test;

create index  text_index1 on test(x) ;1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.

四、并行查询

sql优化三大法宝之一，并行技术，目前看已经支持并行查询，使用前需要会话级开启query_dop，类似与oracle中的“alter session force parallel query parallel xxx”

openGauss=# SET query_dop = 4;
openGauss=# SELECT COUNT(*) FROM t1 GROUP BY a;
......
openGauss=# SET query_dop = 1;1.
2.
3.
4.

文章转载自公众号：openGauss

分类

数据库

标签

openGauss

已于2023-9-6 15:18:16修改

51CTO

51CTO博客

51CTO学堂