技术分享 | check(col_name<>'')为何把空格拒之门外

ywz888
发布于 2022-8-29 15:35
浏览
0收藏

1、问题描述

 

前两天在群里看到同事反馈一个空格问题,大致现象如下:

mysql> select @@version;
+-----------+
| @@version |
+-----------+
| 8.0.25    |
+-----------+
1 row in set (0.00 sec)

mysql> create table t1(
    -> c1 int,
    -> c2 varchar(4) check(c2<>'')  #单引号之间无空格
    -> )engine=innodb;
Query OK, 0 rows affected (0.21 sec)

mysql> insert into t1 select 1,'  ';  #c2字段插入两个空格
ERROR 3819 (HY000): Check constraint 't1_chk_1' is violated.
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.

check定义c2<>'',往c2字段插入空格,提示违反check约束。

 

为什么insert语句中的' '(单引号之间有一个或多个空格)会被判断为''(单引号之间无空格),导致插入失败?

 

2、涉及知识


2.1、Stored and Retrieved

 

When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed unless the PAD_CHAR_TO_FULL_LENGTH SQL mode is enabled. 

VARCHAR values are not padded when they are stored. Trailing spaces are retained when values are stored and retrieved, in conformance with standard SQL.

参考官网说明:https://dev.mysql.com/doc/refman/8.0/en/char.html

 

CHAR(N):当插入的字符数小于N,它会在字符串的右边补充空格,直到总字符数达到N再进行存储;当查询返回数据时默认会将字符串尾部的空格去掉,除非SQL_MODE设置PAD_CHAR_TO_FULL_LENGTH(手册显示8.0.13 deprecated,8.0.25还能使用)。

 

VARCHAR(N):当插入的字符数小于N,它不会在字符串的右边补充空格,insert内容原封不动的进行存储;如果原本字符串右边有空格,在存储和查询返回时都会保留空格。

 

2.2、Collation Pad Attribute

 

Values in CHAR, VARCHAR, and TEXT columns are sorted and compared according to the character set collation assigned to the column.

MySQL collations have a pad attribute of PAD SPACE, other than Unicode collations based on UCA 9.0.0 and higher, which have a pad attribute of NO PAD.

参考官网说明:https://dev.mysql.com/doc/refman/8.0/en/char.html

 

对于CHAR、VARCHAR、TEXT字段,排序和比较运算依赖字段上的Collation,Collation的Pad属性控制字符串尾部空格处理方式。

 

可以通过INFORMATION_SCHEMA.COLLATIONS表,查看Collation所使用的Pad属性:

mysql> select collation_name,pad_attribute from information_schema.collations;
+----------------------------+---------------+
| collation_name             | pad_attribute |
+----------------------------+---------------+
| armscii8_general_ci        | PAD SPACE     |
...
| utf8mb4_0900_bin           | NO PAD        |
+----------------------------+---------------+
272 rows in set (0.01 sec)
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.

 

2.3、Trailing Space Handling in Comparisons

 

For nonbinary strings (CHAR, VARCHAR, and TEXT values), the string collation pad attribute determines treatment in comparisons of trailing spaces at the end of strings:

• For PAD SPACE collations, trailing spaces are insignificant in comparisons; strings are compared without regard to trailing spaces. 

• NO PAD collations treat trailing spaces as significant in comparisons, like any other character. 

"Comparison" in this context does not include the LIKE pattern-matching operator, for which trailing spaces are significant, regardless of collation.

 

参考官网说明:https://dev.mysql.com/doc/refman/8.0/en/charset-binary-collations.html#charset-binary-collations-trailing-space-comparisons

 

PAD SPACE:在排序和比较运算中,忽略字符串尾部空格。

 

NO PAD:在排序和比较运算中,字符串尾部空格当成普通字符,不能忽略。

 

3、问题解决


以下操作基于MySQL 8.0.25 社区版

 
3.1、查看字段使用的Collation

mysql> show full fields in t1;
+-------+------------+--------------------+------+-----+---------+-------+---------------------------------+---------+
| Field | Type       | Collation          | Null | Key | Default | Extra | Privileges                      | Comment |
+-------+------------+--------------------+------+-----+---------+-------+---------------------------------+---------+
| c1    | int        | NULL               | YES  |     | NULL    |       | select,insert,update,references |         |
| c2    | varchar(4) | utf8mb4_unicode_ci | YES  |     | NULL    |       | select,insert,update,references |         |
+-------+------------+--------------------+------+-----+---------+-------+---------------------------------+---------+
2 rows in set (0.00 sec)
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.

 

c2列的Collation是utf8mb4_unicode_ci。

 

3.2、查看Collation的Pad属性

mysql> select COLLATION_NAME,PAD_ATTRIBUTE from INFORMATION_SCHEMA.COLLATIONS where COLLATION_NAME in('utf8mb4_unicode_ci','utf8mb4_0900_ai_ci');
+--------------------+---------------+
| COLLATION_NAME     | PAD_ATTRIBUTE |
+--------------------+---------------+
| utf8mb4_0900_ai_ci | NO PAD        |
| utf8mb4_unicode_ci | PAD SPACE     |
+--------------------+---------------+
2 row in set (0.00 sec)
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.

 

utf8mb4_unicode_ci的Pad属性是PAD SPACE,由2.3可知c2列在排序和比较运算中,忽略字符串尾部空格。

 

因此check比较时,会将插入的' '中的空格忽略,显然忽略空格后和check约束存在冲突,插入失败。

mysql> SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci;
Query OK, 0 rows affected (0.00 sec)

mysql> select '  ' = '';
+--------+
| ' '='' |
+--------+
|      1 |
+--------+
1 row in set (0.00 sec)
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.

 

3.3、如何让check约束按常规逻辑生效


这里的常规是指空格就是空格,不应该把空格忽略。

 

只需将c2字段修改为NO PAD的Collation后,就能将空格正常插入:

mysql> insert into t1 select 1,'  ';  #c2字段插入两个空格
ERROR 3819 (HY000): Check constraint 't1_chk_1' is violated.

mysql> alter table t1 modify c2 varchar(4) collate utf8mb4_0900_ai_ci;  #修改为NO PAD的Collation
Query OK, 0 rows affected (0.15 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> insert into t1 select 1,'  ';  #c2字段插入两个空格
Query OK, 1 row affected (0.12 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> insert into t1 select 1,'';  #''之间无空格
ERROR 3819 (HY000): Check constraint 't1_chk_1' is violated.

mysql> select c1,c2,hex(c2) from t1;
+------+------+---------+
| c1   | c2   | hex(c2) |
+------+------+---------+
|    1 |      | 2020    |
+------+------+---------+
1 row in set (0.01 sec)
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.

 

4、扩展


4.1、如果c2列是CHAR类型,和前面的问题表现一样吗


一样。CHAR、VARCHAR、TEXT在做排序和比较运算时,都是依据列的Collation的Pad属性处理字符串尾部的空格。此时拿来做比较运算的字符串是insert中的内容。

 

4.2、WHERE条件中表现形式是怎样的


创建一张新表并插入数据

mysql> create table t3(
    -> c1 int,
    -> c2 char(4) collate utf8mb4_unicode_ci,
    -> c3 char(4) collate utf8mb4_0900_ai_ci,
    -> c4 varchar(4) collate utf8mb4_unicode_ci,
    -> c5 varchar(4) collate utf8mb4_0900_ai_ci
    -> )engine=innodb;
Query OK, 0 rows affected (0.29 sec)

mysql> insert into t3 select 1,'a','a','a','a';
Query OK, 1 row affected (0.09 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> insert into t3 select 2,'a ','a ','a ','a ';  #各列包含1个空格
Query OK, 1 row affected (0.20 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> insert into t3 select 3,'a   ','a   ','a  ','a  ';  #前两列3个空格,后两列2个空格
Query OK, 1 row affected (0.17 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> insert into t3 select 4,'a  ','a  ','a   ','a   ';  #前两列2个空格,后两列3个空格
Query OK, 1 row affected (0.14 sec)
Records: 1  Duplicates: 0  Warnings: 0
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.

 

观察WHERE条件返回结果,CHAR类型的返回受PAD_CHAR_TO_FULL_LENGTH影响(参考2.1)

mysql> set sql_mode='';
Query OK, 0 rows affected (0.00 sec)

mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c2='a';
+------+------+------+------+------+---------+---------+----------+----------+
| c1   | c2   | c3   | c4   | c5   | hex(c2) | hex(c3) | hex(c4)  | hex(c5)  |
+------+------+------+------+------+---------+---------+----------+----------+
|    1 | a    | a    | a    | a    | 61      | 61      | 61       | 61       |
|    2 | a    | a    | a    | a    | 61      | 61      | 6120     | 6120     |
|    3 | a    | a    | a    | a    | 61      | 61      | 612020   | 612020   |
|    4 | a    | a    | a    | a    | 61      | 61      | 61202020 | 61202020 |
+------+------+------+------+------+---------+---------+----------+----------+
4 rows in set (0.00 sec)
c2 char->返回数据去掉字符串尾部的空格
c2 utf8mb4_unicode_ci->PAD SPACE->排序和比较运算,忽略字符串尾部空格

mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c3='a';
+------+------+------+------+------+---------+---------+----------+----------+
| c1   | c2   | c3   | c4   | c5   | hex(c2) | hex(c3) | hex(c4)  | hex(c5)  |
+------+------+------+------+------+---------+---------+----------+----------+
|    1 | a    | a    | a    | a    | 61      | 61      | 61       | 61       |
|    2 | a    | a    | a    | a    | 61      | 61      | 6120     | 6120     |
|    3 | a    | a    | a    | a    | 61      | 61      | 612020   | 612020   |
|    4 | a    | a    | a    | a    | 61      | 61      | 61202020 | 61202020 |
+------+------+------+------+------+---------+---------+----------+----------+
4 rows in set (0.01 sec)
c3 char->返回数据去掉字符串尾部的空格
c3 utf8mb4_0900_ai_ci->NO PAD->排序和比较运算,字符串尾部空格当成普通字符

mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c4='a';
+------+------+------+------+------+---------+---------+----------+----------+
| c1   | c2   | c3   | c4   | c5   | hex(c2) | hex(c3) | hex(c4)  | hex(c5)  |
+------+------+------+------+------+---------+---------+----------+----------+
|    1 | a    | a    | a    | a    | 61      | 61      | 61       | 61       |
|    2 | a    | a    | a    | a    | 61      | 61      | 6120     | 6120     |
|    3 | a    | a    | a    | a    | 61      | 61      | 612020   | 612020   |
|    4 | a    | a    | a    | a    | 61      | 61      | 61202020 | 61202020 |
+------+------+------+------+------+---------+---------+----------+----------+
4 rows in set (0.00 sec)
c4 varchar->返回数据保留插入时的空格
c4 utf8mb4_unicode_ci->PAD SPACE->排序和比较运算,忽略字符串尾部空格

mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c5='a';
+------+------+------+------+------+---------+---------+---------+---------+
| c1   | c2   | c3   | c4   | c5   | hex(c2) | hex(c3) | hex(c4) | hex(c5) |
+------+------+------+------+------+---------+---------+---------+---------+
|    1 | a    | a    | a    | a    | 61      | 61      | 61      | 61      |
+------+------+------+------+------+---------+---------+---------+---------+
1 row in set (0.00 sec)
c5 varchar->返回数据保留插入时的空格
c5 utf8mb4_0900_ai_ci->NO PAD->排序和比较运算,字符串尾部空格当成普通字符


mysql> set sql_mode='PAD_CHAR_TO_FULL_LENGTH';
Query OK, 0 rows affected, 1 warning (0.01 sec)

mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c2='a';
+------+------+------+------+------+----------+----------+----------+----------+
| c1   | c2   | c3   | c4   | c5   | hex(c2)  | hex(c3)  | hex(c4)  | hex(c5)  |
+------+------+------+------+------+----------+----------+----------+----------+
|    1 | a    | a    | a    | a    | 61202020 | 61202020 | 61       | 61       |
|    2 | a    | a    | a    | a    | 61202020 | 61202020 | 6120     | 6120     |
|    3 | a    | a    | a    | a    | 61202020 | 61202020 | 612020   | 612020   |
|    4 | a    | a    | a    | a    | 61202020 | 61202020 | 61202020 | 61202020 |
+------+------+------+------+------+----------+----------+----------+----------+
4 rows in set (0.00 sec)
c2 char->PAD_CHAR_TO_FULL_LENGTH->返回数据字符串右边补充空格
c2 utf8mb4_unicode_ci->PAD SPACE->排序和比较运算,忽略字符串尾部空格

mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c3='a';
Empty set (0.00 sec)
c3 char->PAD_CHAR_TO_FULL_LENGTH->返回数据字符串右边补充空格
c3 utf8mb4_0900_ai_ci->NO PAD->排序和比较运算,字符串尾部空格当成普通字符
1~4行c3列返回值都包含空格,且c3列的Collation是NO PAD,字符串尾部空格不能忽略,where过滤找不到记录

mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c4='a';
+------+------+------+------+------+----------+----------+----------+----------+
| c1   | c2   | c3   | c4   | c5   | hex(c2)  | hex(c3)  | hex(c4)  | hex(c5)  |
+------+------+------+------+------+----------+----------+----------+----------+
|    1 | a    | a    | a    | a    | 61202020 | 61202020 | 61       | 61       |
|    2 | a    | a    | a    | a    | 61202020 | 61202020 | 6120     | 6120     |
|    3 | a    | a    | a    | a    | 61202020 | 61202020 | 612020   | 612020   |
|    4 | a    | a    | a    | a    | 61202020 | 61202020 | 61202020 | 61202020 |
+------+------+------+------+------+----------+----------+----------+----------+
4 rows in set (0.00 sec)
c4 varchar->返回数据保留插入时的空格
c4 utf8mb4_unicode_ci->PAD SPACE->排序和比较运算,忽略字符串尾部空格

mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c5='a';
+------+------+------+------+------+----------+----------+---------+---------+
| c1   | c2   | c3   | c4   | c5   | hex(c2)  | hex(c3)  | hex(c4) | hex(c5) |
+------+------+------+------+------+----------+----------+---------+---------+
|    1 | a    | a    | a    | a    | 61202020 | 61202020 | 61      | 61      |
+------+------+------+------+------+----------+----------+---------+---------+
1 row in set (0.00 sec)
c5 varchar->返回数据保留插入时的空格
c5 utf8mb4_0900_ai_ci->NO PAD->排序和比较运算,字符串尾部空格当成普通字符
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
  • 66.
  • 67.
  • 68.
  • 69.
  • 70.
  • 71.
  • 72.
  • 73.
  • 74.
  • 75.
  • 76.
  • 77.
  • 78.
  • 79.
  • 80.
  • 81.
  • 82.
  • 83.
  • 84.
  • 85.
  • 86.
  • 87.
  • 88.
  • 89.
  • 90.
  • 91.
  • 92.
  • 93.
  • 94.
  • 95.
  • 96.
  • 97.

 

此时拿来做比较运算的字符串是Retrieved的内容,CHAR和VARCHAR 返回数据时对字符串尾部的空格处理方式不同,并且PAD_CHAR_TO_FULL_LENGTH只影响CHAR类型。

 

4.3、对唯一索引的影响 

 

For those cases where trailing pad characters are stripped or comparisons ignore them, if a column has an index that requires unique values, inserting into the column values that differ only in number of trailing pad characters results in a duplicate-key error. For example, if a table contains 'a', an attempt to store 'a ' causes a duplicate-key error.

参考官网说明:https://dev.mysql.com/doc/refman/8.0/en/char.html

 

如果存在唯一索引(单列、字符类型),插入的数据仅在尾部空格个数不同,有可能会报duplicate-key错误:

mysql> select c1,c4,c5,hex(c4),hex(c5) from t3;
+------+------+------+----------+----------+
| c1   | c4   | c5   | hex(c4)  | hex(c5)  |
+------+------+------+----------+----------+
|    1 | a    | a    | 61       | 61       |
|    2 | a    | a    | 6120     | 6120     |
|    3 | a    | a    | 612020   | 612020   |
|    4 | a    | a    | 61202020 | 61202020 |
+------+------+------+----------+----------+
4 rows in set (0.00 sec)

mysql> alter table t3 add unique(c4);
ERROR 1062 (23000): Duplicate entry 'a' for key 't3.c4'

mysql> alter table t3 add unique(c5);
Query OK, 0 rows affected (0.44 sec)
Records: 0  Duplicates: 0  Warnings: 0
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.

 

可以看到c4列创建唯一索引失败,c5列创建唯一索引成功。

 

c4 utf8mb4_unicode_ci->PAD SPACE->排序和比较运算,忽略字符串尾部空格,4行数据重复。

 

c5 utf8mb4_0900_ai_ci->NO PAD->排序和比较运算,字符串尾部空格当成普通字符,4行数据不同。

 

5、总结


Stored

 

Retrieved

 

Comparison(不包括like)

 

注意区分是取Stored的值还是Retrieved的值进行Comparison。

 


Enjoy GreatSQL :)

 

文章转载自公众号:GreatSQL社区

分类
标签
已于2022-8-29 15:35:29修改
收藏
回复
举报
回复
    相关推荐