This is a great question. InnoDB is a row level locking engine, but it has to set additional locks to ensure safety with the binary log (used for replication; point in time recovery). To start explaining it, consider the following (naive) example:
session1> START TRANSACTION;
session1> DELETE FROM users WHERE is_deleted = 1; # 1 row matches (user_id 10), deleted.
session2> START TRANSACTION;
session2> UPDATE users SET is_deleted = 1 WHERE user_id = 5; # 1 row matches.
session2> COMMIT;
session1> COMMIT;
Because statements are only written to the binary log once committed, on the slave session#2 would apply first, and would produce a different result, leading to data corruption.
So what InnoDB does, is sets additional locks. If is_deleted
is indexed, then before session1 commits nobody else will be able to modify or insert into the range of records where is_deleted=1
. If there are no indexes on is_deleted
, then InnoDB needs to lock every row in the entire table to make sure the replay is in the same order. You can think of this as locking the gap, which is different concept to grasp from row-level locking directly.
In your case with that ORDER BY position ASC
, InnoDB needs to make sure that no new rows could be modified between the lowest key value and a "special" lowest possible value. If you did something like ORDER BY position DESC
.. well, then nobody could insert into this range.
So here comes the solution:
Statement based binary logging sucks. I really look forward to a future where we all switch to row based binary logging (available from MySQL 5.1, but not on by default).
With Row-based replication, if you change the isolation level to read-committed, then only the one row that matches needs to be locked.
If you want to be a masochist, you can also turn on innodb_locks_unsafe_for_binlog with statement-based replication.
Update 22 April: To copy + paste my improved version of your testcase (it was not searching 'in the gap'):
session1> CREATE TABLE test (id int not null primary key auto_increment, data1 int, data2 int, INDEX(data1)) engine=innodb;
Query OK, 0 rows affected (0.00 sec)
session1> INSERT INTO test VALUES (NULL, 1, 2), (NULL, 2, 1), (5, 2, 2), (6, 3, 3), (3, 3, 4), (4, 4, 3);
Query OK, 6 rows affected (0.00 sec)
Records: 6 Duplicates: 0 Warnings: 0
session1> start transaction;
Query OK, 0 rows affected (0.00 sec)
session1> SELECT id FROM test ORDER BY data1 LIMIT 1 FOR UPDATE;
+----+
| id |
+----+
| 1 |
+----+
1 row in set (0.00 sec)
session2> INSERT INTO test values (NULL, 0, 99); # blocks - 0 is in the gap between the lowest value found (1) and the "special" lowest value.
# At the same time, from information_schema:
localhost information_schema> select * from innodb_locksG
*************************** 1. row ***************************
lock_id: 151A1C:1735:4:2
lock_trx_id: 151A1C
lock_mode: X,GAP
lock_type: RECORD
lock_table: `so5694658`.`test`
lock_index: `data1`
lock_space: 1735
lock_page: 4
lock_rec: 2
lock_data: 1, 1
*************************** 2. row ***************************
lock_id: 151A1A:1735:4:2
lock_trx_id: 151A1A
lock_mode: X
lock_type: RECORD
lock_table: `so5694658`.`test`
lock_index: `data1`
lock_space: 1735
lock_page: 4
lock_rec: 2
lock_data: 1, 1
2 rows in set (0.00 sec)
# Another example:
select * from test where id < 1 for update; # blocks