文章詳情頁

MySQL之select in 子查詢優化的實現

瀏覽：2日期：2023-10-11 11:20:39

下面的演示基于MySQL5.7.27版本

一、關于MySQL子查詢的優化策略介紹:

子查詢優化策略

對于不同類型的子查詢，優化器會選擇不同的策略。

1. 對于 IN、=ANY 子查詢，優化器有如下策略選擇：

semijoin Materialization exists

2. 對于 NOT IN、<>ALL 子查詢，優化器有如下策略選擇：

Materialization exists

3. 對于 derived 派生表，優化器有如下策略選擇：derived_merge，將派生表合并到外部查詢中（5.7 引入）；將派生表物化為內部臨時表，再用于外部查詢。注意：update 和 delete 語句中子查詢不能使用 semijoin、materialization 優化策略

二、創建數據進行模擬演示

為了方便分析問題先建兩張表并插入模擬數據：

CREATE TABLE `test02` ( `id` int(11) NOT NULL, `a` int(11) DEFAULT NULL, `b` int(11) DEFAULT NULL, PRIMARY KEY (`id`), KEY `a` (`a`)) ENGINE=InnoDB;drop procedure idata;delimiter ;;create procedure idata()begin declare i int; set i=1; while(i<=10000)do insert into test02 values(i, i, i); set i=i+1; end while;end;;delimiter ;call idata();create table test01 like test02;insert into test01 (select * from test02 where id<=1000)

三、舉例分析SQL實例

子查詢示例：

SELECT * FROM test01 WHERE test01.a IN (SELECT test02.b FROM test02 WHERE id < 10)

大部分人可定會簡單的認為這個 SQL 會這樣執行：

SELECT test02.b FROM test02 WHERE id < 10

結果:1,2,3,4,5,6,7,8,9

SELECT * FROM test01 WHERE test01.a IN (1,2,3,4,5,6,7,8,9);

但實際上 MySQL 并不是這樣做的。MySQL 會將相關的外層表壓到子查詢中，優化器認為這樣效率更高。也就是說，優化器會將上面的 SQL 改寫成這樣：

select * from test01 where exists(select b from test02 where id < 10 and test01.a=test02.b);

提示：針對mysql5.5以及之前的版本

查看執行計劃如下，發現這條SQL對表test01進行了全表掃描1000，效率低下：

root@localhost [dbtest01]>desc select * from test01 where exists(select b from test02 where id < 10 and test01.a=test02.b);+----+--------------------+--------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+--------------------+--------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+| 1 | PRIMARY | test01 | NULL | ALL | NULL | NULL | NULL | NULL | 1000 | 100.00 | Using where || 2 | DEPENDENT SUBQUERY | test02 | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 9 | 10.00 | Using where |+----+--------------------+--------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+2 rows in set, 2 warnings (0.00 sec)

但是此時實際執行下面的SQL，發現也不慢啊,這不是自相矛盾嘛，別急，咱們繼續往下分析:

SELECT * FROM test01 WHERE test01.a IN (SELECT test02.b FROM test02 WHERE id < 10)

查看此條SQL的執行計劃如下：

root@localhost [dbtest01]>desc SELECT * FROM test01 WHERE test01.a IN (SELECT test02.b FROM test02 WHERE id < 10);+----+--------------+-------------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+--------------+-------------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------+| 1 | SIMPLE | <subquery2> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | 100.00 | Using where || 1 | SIMPLE | test01 | NULL | ref | a | a | 5 | <subquery2>.b | 1 | 100.00 | NULL || 2 | MATERIALIZED | test02 | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 9 | 100.00 | Using where |+----+--------------+-------------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------+3 rows in set, 1 warning (0.00 sec)

發現優化器使用到了策略MATERIALIZED。于是對此策略進行了資料查詢和學習。https://dev.mysql.com/doc/refman/5.6/en/subquery-optimization.html

原因是從MySQL5.6版本之后包括MySQL5.6版本，優化器引入了新的優化策略:materialization=[off|on],semijoin=[off|on],(off代表關閉此策略，on代表開啟此策略)可以采用show variables like ’optimizer_switch’; 來查看MySQL采用的優化器策略。當然這些策略都是可以在線進行動態修改的set global optimizer_switch=’materialization=on,semijoin=on’;代表開啟優化策略materialization和semijoin

MySQL5.7.27默認的優化器策略：

root@localhost [dbtest01]>show variables like ’optimizer_switch’; +------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| Variable_name | Value |+------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| optimizer_switch | index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,duplicateweedout=on,subquery_materialization_cost_based=on,use_index_extensions=on,condition_fanout_filter=on,derived_merge=on |+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

所以在MySQL5.6及以上版本時

執行下面的SQL是不會慢的。因為MySQL的優化器策略materialization和semijoin 對此SQL進行了優化

SELECT * FROM test01 WHERE test01.a IN (SELECT test02.b FROM test02 WHERE id < 10)

然而咱們把mysql的優化器策略materialization和semijoin 關閉掉測試，發現SQL確實對test01進行了全表的掃描(1000):

set global optimizer_switch=’materialization=off,semijoin=off’;

執行計劃如下test01表確實進行了全表掃描:

root@localhost [dbtest01]>desc SELECT * FROM test01 WHERE test01.a IN (SELECT test02.b FROM test02 WHERE id < 10);+----+--------------------+--------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+--------------------+--------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+| 1 | PRIMARY | test01 | NULL | ALL | NULL | NULL | NULL | NULL | 1000 | 100.00 | Using where || 2 | DEPENDENT SUBQUERY | test02 | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 9 | 10.00 | Using where |+----+--------------------+--------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+2 rows in set, 1 warning (0.00 sec)

下面咱們分析下這個執行計劃:

?。。。≡俅翁崾?如果是mysql5.5以及之前的版本，或者是mysql5.6以及之后的版本關閉掉優化器策略materialization=off,semijoin=off，得到的SQL執行計劃和下面的是相同的

root@localhost [dbtest01]>desc select * from test01 where exists(select b from test02 where id < 10 and test01.a=test02.b);+----+--------------------+--------+------------+-------+---------------+---------+---------+------+------+----------+-------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+--------------------+--------+------------+-------+---------------+---------+---------+------+------+----------+-------------+| 1 | PRIMARY | test01 | NULL | ALL | NULL | NULL | NULL | NULL | 1000 | 100.00 | Using where || 2 | DEPENDENT SUBQUERY | test02 | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 9 | 10.00 | Using where |+----+--------------------+--------+------------+-------+---------------+---------+---------+------+------+----------+-------------+2 rows in set, 2 warnings (0.00 sec)

不相關子查詢變成了關聯子查詢（select_type:DEPENDENT SUBQUERY），子查詢需要根據 b 來關聯外表 test01，因為需要外表的 test01 字段，所以子查詢是沒法先執行的。執行流程為：

掃描 test01，從 test01 取出一行數據 R；從數據行 R 中，取出字段 a 執行子查詢，如果得到結果為 TRUE，則把這行數據 R 放到結果集；重復 1、2 直到結束。

總的掃描行數為 1000+1000*9=10000（這是理論值，但是實際值比10000還少，怎么來的一直沒想明白，看規律是子查詢結果集每多一行，總掃描行數就會少幾行）。

Semi-join優化器:

這樣會有個問題，如果外層表是一個非常大的表，對于外層查詢的每一行，子查詢都得執行一次，這個查詢的性能會非常差。我們很容易想到將其改寫成 join 來提升效率：

select test01.* from test01 join test02 on test01.a=test02.b and test02.id<10;

# 查看此SQL的執行計劃:

desc select test01.* from test01 join test02 on test01.a=test02.b and test02.id<10;root@localhost [dbtest01]>EXPLAIN extended select test01.* from test01 join test02 on test01.a=test02.b and test02.id<10;+----+-------------+--------+------------+-------+---------------+---------+---------+-------------------+------+----------+-------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref| rows | filtered | Extra |+----+-------------+--------+------------+-------+---------------+---------+---------+-------------------+------+----------+-------------+| 1 | SIMPLE | test02 | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 9 | 100.00 | Using where || 1 | SIMPLE | test01 | NULL | ref | a | a | 5 | dbtest01.test02.b | 1 | 100.00 | NULL |+----+-------------+--------+------------+-------+---------------+---------+---------+-------------------+------+----------+-------------+2 rows in set, 2 warnings (0.00 sec)

這樣優化可以讓 t2 表做驅動表，t1 表關聯字段有索引，查找效率非常高。

但這里會有個問題，join 是有可能得到重復結果的，而 in(select ...) 子查詢語義則不會得到重復值。而 semijoin 正是解決重復值問題的一種特殊聯接。在子查詢中，優化器可以識別出 in 子句中每組只需要返回一個值，在這種情況下，可以使用 semijoin 來優化子查詢，提升查詢效率。這是 MySQL 5.6 加入的新特性，MySQL 5.6 以前優化器只有 exists 一種策略來“優化”子查詢。

經過 semijoin 優化后的 SQL 和執行計劃分為：

select `test01`.`id`,`test01`.`a`,`test01`.`b` from `test01` semi join `test02` where ((`test01`.`a` = `<subquery2>`.`b`) and (`test02`.`id` < 10));

##注意這是優化器改寫的SQL，客戶端上是不能用 semi join 語法的

semijoin 優化實現比較復雜，其中又分 FirstMatch、Materialize 等策略,上面的執行計劃中 select_type=MATERIALIZED 就是代表使用了 Materialize 策略來實現的 semijoin這里 semijoin 優化后的執行流程為：

先執行子查詢，把結果保存到一個臨時表中，這個臨時表有個主鍵用來去重；從臨時表中取出一行數據 R；從數據行 R 中，取出字段 b 到被驅動表 t1 中去查找，滿足條件則放到結果集；重復執行 2、3，直到結束。這樣一來，子查詢結果有 9 行，即臨時表也有 9 行（這里沒有重復值），總的掃描行數為 9+9+9*1=27 行，比原來的 10000 行少了很多。

MySQL 5.6 版本中加入的另一種優化特性 materialization，就是把子查詢結果物化成臨時表，然后代入到外查詢中進行查找，來加快查詢的執行速度。內存臨時表包含主鍵（hash 索引），消除重復行，使表更小。如果子查詢結果太大，超過 tmp_table_size 大小，會退化成磁盤臨時表。這樣子查詢只需要執行一次，而不是對于外層查詢的每一行都得執行一遍。不過要注意的是，這樣外查詢依舊無法通過索引快速查找到符合條件的數據，只能通過全表掃描或者全索引掃描，

semijoin 和 materialization 的開啟是通過 optimizer_switch 參數中的 semijoin={on|off}、materialization={on|off} 標志來控制的。上文中不同的執行計劃就是對 semijoin 和 materialization 進行開/關產生的總的來說對于子查詢，先檢查是否滿足各種優化策略的條件（比如子查詢中有 union 則無法使用 semijoin 優化）然后優化器會按成本進行選擇，實在沒得選就會用 exists 策略來“優化”子查詢，exists 策略是沒有參數來開啟或者關閉的。

下面舉一個delete相關的子查詢例子：

把上面的2張測試表分別填充350萬數據和50萬數據來測試delete語句

root@localhost [dbtest01]>select count(*) from test02;+----------+| count(*) |+----------+| 3532986 |+----------+1 row in set (0.64 sec)root@localhost [dbtest01]>create table test01 like test02;Query OK, 0 rows affected (0.01 sec)root@localhost [dbtest01]>insert into test01 (select * from test02 where id<=500000)root@localhost [dbtest01]>select count(*) from test01;+----------+| count(*) |+----------+| 500000 |

執行delete刪除語句執行了4s

root@localhost [dbtest01]>delete FROM test01 WHERE test01.a IN (SELECT test02.b FROM test02 WHERE id < 10);Query OK, 9 rows affected (4.86 sec)

查看執行計劃，對test01表進行了幾乎全表掃描:

root@localhost [dbtest01]>desc delete FROM test01 WHERE test01.a IN (SELECT test02.b FROM test02 WHERE id < 10);+----+--------------------+--------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+--------------------+--------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+| 1 | DELETE | test01 | NULL | ALL | NULL | NULL | NULL | NULL | 499343 | 100.00 | Using where || 2 | DEPENDENT SUBQUERY | test02 | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 9 | 10.00 | Using where |+----+--------------------+--------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+2 rows in set (0.00 sec)

于是修改上面的delete SQL語句偽join語句

root@localhost [dbtest01]>desc delete test01.* from test01 join test02 on test01.a=test02.b and test02.id<10;+----+-------------+--------+------------+-------+---------------+---------+---------+-------------------+------+----------+-------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref| rows | filtered | Extra |+----+-------------+--------+------------+-------+---------------+---------+---------+-------------------+------+----------+-------------+| 1 | SIMPLE | test02 | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 9 | 100.00 | Using where || 1 | DELETE | test01 | NULL | ref | a | a | 5 | dbtest01.test02.b | 1 | 100.00 | NULL |+----+-------------+--------+------------+-------+---------------+---------+---------+-------------------+------+----------+-------------+2 rows in set (0.01 sec)執行非常的快root@localhost [dbtest01]>delete test01.* from test01 join test02 on test01.a=test02.b and test02.id<10;Query OK, 9 rows affected (0.01 sec)root@localhost [dbtest01]>select test01.* from test01 join test02 on test01.a=test02.b and test02.id<10;Empty set (0.00 sec)

下面的這個表執行要全表掃描，非常慢，基本對表test01進行了全表掃描:

root@lcalhost [dbtest01]>desc delete FROM test01 WHERE id IN (SELECT id FROM test02 WHERE id=’350000’);+----+--------------------+--------+------------+-------+---------------+---------+---------+-------+--------+----------+-------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+--------------------+--------+------------+-------+---------------+---------+---------+-------+--------+----------+-------------+| 1 | DELETE | test01 | NULL | ALL | NULL | NULL | NULL | NULL | 499343 | 100.00 | Using where || 2 | DEPENDENT SUBQUERY | test02 | NULL | const | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | Using index |+----+--------------------+--------+------------+-------+---------------+---------+---------+-------+--------+----------+-------------+2 rows in set (0.00 sec)

然而采用join的話，效率非常的高:

root@localhost [dbtest01]>desc delete test01.* FROM test01 inner join test02 WHERE test01.id=test02.id and test02.id=350000 ;+----+-------------+--------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+--------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+| 1 | DELETE | test01 | NULL | const | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | NULL || 1 | SIMPLE | test02 | NULL | const | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | Using index |+----+-------------+--------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+2 rows in set (0.01 sec) root@localhost [dbtest01]> desc delete test01.* from test01 join test02 on test01.a=test02.b and test02.id=350000;+----+-------------+--------+------------+-------+---------------+---------+---------+-------+------+----------+-------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+--------+------------+-------+---------------+---------+---------+-------+------+----------+-------+| 1 | SIMPLE | test02 | NULL | const | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | NULL || 1 | DELETE | test01 | NULL | ref | a | a | 5 | const | 1 | 100.00 | NULL |+----+-------------+--------+------------+-------+---------------+---------+---------+-------+------+----------+-------+2 rows in set (0.00 sec)

參考文檔:

https://www.cnblogs.com/zhengyun_ustc/p/slowquery1.htmlhttps://www.jianshu.com/p/3989222f7084https://dev.mysql.com/doc/refman/5.6/en/subquery-optimization.html

到此這篇關于MySQL之select in 子查詢優化的實現的文章就介紹到這了,更多相關MySQL select in 子查詢優化內容請搜索好吧啦網以前的文章或繼續瀏覽下面的相關文章希望大家以后多多支持好吧啦網！

上一條：Mysql NULL導致的神坑下一條：簡單了解mysql存儲字段類型查詢效率

相關文章：

1. Oracle817 版本不同字符集之間的數據庫導入2. MySQL全文搜索之布爾搜索3. sql索引失效的情況以及超詳細解決方法4. MySQL收歸Oracle 開源數據庫或將很受傷5. 個人經驗總結：DB2數據庫邏輯卷的復制6. mysql 判斷是否為子集的方法步驟7. 詳解MySQL alter ignore 語法8. 精細分析Oracle分布式系統數據復制技術9. Oracle的PDB數據庫創建DIRECTORY時遇到ORA-65254問題及解決方法10. Microsoft Office Access設置字體顏色的方法

排行榜

					
					Mysql入門系列：建立MYSQL客戶機程序的一般過程
MySQL存儲過程例子（包含事務、參數、嵌套調用、游標循環等）
精細分析Oracle分布式系統數據復制技術
MySQL全文搜索之布爾搜索
MySQL收歸Oracle 開源數據庫或將很受傷
Mysql入門系列：安排預防性的維護MYSQL數據庫服務器
mysql 判斷是否為子集的方法步驟
Oracle的PDB數據庫創建DIRECTORY時遇到ORA-65254問題及解決方法
Oracle817 版本 不同字符集之間的數據庫導入
sql索引失效的情況以及超詳細解決方法
個人經驗總結：DB2數據庫邏輯卷的復制
				

熱門標簽

亚洲精品久久久中文字幕-亚洲精品久久片久久-亚洲精品久久青草-亚洲精品久久婷婷爱久久婷婷-亚洲精品久久午夜香蕉

MySQL之select in 子查詢優化的實現