git bisect, the best skill I learned working with TiDB
September 22, 2021

For TiDB developers, it is not uncommon that a bug that occurs on a certain release branch does not exist on the master branch, this is usually because it has been fixed on master, but the fix PR has not been cherry-picked, or, conversely, the master branch contains but that only seems to be introduced recently. Whenever you are facing such scenarios that the behavior of TiDB changed unexpectedly, git bisect could always give you a hand.

Put it simply, by specifying two commit IDs that form a linear commit log history, git bisect could help you binary search the history between the commits and find the first commit where the behavior has changed.

So, formally speaking, if we have two git commit A and B, and certain behavior on the two commits differs, we want to find the first commit between A and B that changes the behavior.

First of all, we need to make sure the commit history is linear, i.e., commit B must be an ancestor of commit A. Most of the time, A is the HEAD of the master branch, and B is the HEAD of a release branch, and since we cherry-pick bug fixes into the release branch, B might not be the direct ancestor. A tip here is that you can use git merge-base A B to figure out the lowerest common ancestor (LCA) of A and B. You can then replace B with LCA as long as the LCA has the same behavior as B, and this often holds. Now B must be an ancestor of A.

For example, in https://github.com/pingcap/tidb/issues/18652, a bug appeared on the master branch at that time, however, I found it does not exist on release 3.0, so I figured out the LCA commit by

$ git merge-base master release-3.0                                                                                                                                                                                                                                                141412f6ef9783d03d3992a469284e0e3c57fe5b31d

So, the bug must be introduced between the master's HEAD commit and the LCA commit. Note by the time the post is written, I've already fixed the bug in https://github.com/pingcap/tidb/pull/19482, so, the commit before the fix is 357d432d4e6e835c8064028e72532d41473c2aa4, and we can use this commit as commit A, and LCA as commit B.

git bisect works by assuming that the earlier commit B is a good commit and later commit A is a bad commit (i.e. contains a bug).

So let's tell git bisect about it.

$ git bisect start
$ git bisect good 412f6ef9783d03d3992a469284e0e3c57fe5b31d
$ git bisect bad 357d432d4e6e835c8064028e72532d41473c2aa4
HEAD is now at 46f5f2177 ddl: fix ddl hang over when it meets panic in cancelling path (#23204)
Bisecting: 1625 revisions left to test after this (roughly 11 steps)
[e7554597886e94c1b368714b6ea0ad6362225e26] bindinfo: change the way of generate hints from sql (#14095)

Now you can tell from the message, there are 1625 commits to be checked, however using binary search, only 11 steps are needed, and git automatically picked the middle commit for you to be tested.

You can now perform the behavior test. If the current commit contains the bug, you can tell git by

git bisect bad

otherwise, you should tell git that the commit is good

git bisect good

whenever you provide git the goodness of each commit, git will then advance to the later half or early half of the commit history, and automatically checkout the next commit to be checked, until there is only one revision remained, and you know which commit is wrong.

11 steps is a lot better than 1625 steps, but is still needs you to manually typing commands to test the commit if it is good or bad.

In the above example, the testing includes recompiling the TiDB server, spinning it up and connecting it with MySQL client and executing the given SQL.

What's nicer about git bisect is it allows you to provide an executable script to it, and it will use the result of the script to tell the goodness of each commit to be checked.

So if you have a script named bisect.sh, and executing it yields no error on good commit and exits with error on bad commit, you can provide it to git bisect

git bisect run bisect.sh

and the whole binary search process can then be automated.

In the example above, the following script would suffice

#!/bin/sh
# compile
cd /home/ichn/go/src/github.com/pingcap/tidb
make server

# spin up the tidb server
./bin/tidb-server &
pid=$!
sleep 3

# test the SQL
mysql -h 127.0.0.1 -u root -P 4000 -D test -e 'DROP TABLE IF EXISTS t1; CREATE TABLE t1 ( `pk` int not null primary key auto_increment, `col_smallint_key_signed` smallint  , key (`col_smallint_key_signed`)); INSERT INTO `t1` VALUES (1,0),(2,NULL),(3,NULL),(4,0),(5,0),(6,NULL),(7,NULL),(8,0),(9,0),(10,0); SELECT * FROM t1 WHERE ( LOG( `col_smallint_key_signed`, -8297584758403770424 ) ) DIV 1;'
result=$?

# clean up the enviroment
kill -9 $pid
git clean -df
git reset --hard HEAD

# exit with the result
exit $result

Suppose the path to the script is ../bisect.sh

git bisect run ../bisect.sh

will stop at the first trouble-making commit after 11 steps. Grab a coffee and watching it does the heavy lifting is very pleasing.

Another common use case of git bisect is to find out the first commit that fixes a certain bug. You need to manually reverse the result, i.e. for a commit that does not contain the bug, you should tell git that it is bad, and the bugging one good. Probably new/old is a better pair of term than bad/good.

For example, in https://github.com/pingcap/tidb/issues/20027, TiKV panicked on a certain commit, but it is fixed on a later commit. In this case, using TiDB alone can't reproduce the problem, so I used tiup to deploy a mini cluster locally. The script looks like this

#!/bin/sh
# set up
cd /home/ichn/Projects/pingcap/tiup
yes | tiup cluster deploy tidb-timeout nightly ./mini.yaml --user ichn
cd /home/ichn/Projects/pingcap/tidb
make server
sudo cp ./bin/tidb-server /home/tidb/tmp/mini-deploy/tidb-4101/bin
tiup cluster start tidb-timeout

# wait for TiDB to serve
while true; do
    mysql -h 0.0.0.0 -u root -P 4101 -D test -e "select tidb_version()"
    if [[ $? -eq 0 ]]; then
        break
    fi
    sleep 1
done

# test the SQL
mysql -h 0.0.0.0 -u root -P 4101 -D test -e "use test; drop table if exists t1, t2; create table t1 (c_int int, c_str varchar(40), c_datetime datetime, c_timestamp timestamp, primary key (c_int, c_datetime)  , key(c_str), key(c_timestamp)) partition by range (to_days(c_datetime)) ( partition p0 values less than (to_days('2020-02-01')), partition p1 values less than (to_days('2020-04-01')), partition p2 values less than (to_days('2020-06-01')), partition p3 values less than maxvalue) ; create table t2 like t1 ; insert into t1 values (1, 'elastic borg', '2020-04-11 01:01:02', '2020-01-13 17:17:47'), (2, 'hopeful swanson', '2020-06-01 21:48:06', '2020-01-14 11:55:42'), (3, 'fervent nightingale', '2020-01-25 05:03:12', '2020-01-15 20:24:00'); insert into t2 values (1, 'agitated chandrasekhar', '2020-03-26 00:38:16', '2020-06-13 07:44:39'), (2, 'optimistic mclaren', '2020-01-17 11:06:35', '2020-03-27 12:53:29'), (3, 'clever nobel', '2020-03-28 10:06:23', '2020-03-26 18:29:45');"
mysql -h 0.0.0.0 -u root -P 4101 -D test -e "use test; begin; delete from t2 where c_int = 2; select * from t1 where c_timestamp in (select c_timestamp from t2 where t1.c_int = t2.c_int or t1.c_int = t2.c_int and t1.c_str <= t2.c_str); commit;"
res=$?

# reverse the result
if [[ $res -eq 0 ]]; then
        result=1
    echo "query okay, going backward"
else
        result=0
    echo "query bad, going forward"
fi

# clean up
yes | tiup cluster destroy tidb-timeout

# tell git bisect
exit $result

and the mini cluster's configuration looks like this

$ cat mini.yaml
# # Global variables are applied to all deployments and used as the default value of
# # the deployments if a specific deployment value is missing.
global:
  user: "tidb"
  ssh_port: 22
  deploy_dir: "/home/tidb/tmp/mini-deploy"
  data_dir: "/home/tidb/tmp/mini-data"

pd_servers:
  - host: 127.0.0.1
    peer_port: 13380
    client_port: 13379

tidb_servers:
  - host: 127.0.0.1
    port: 4101
    status_port: 11081

tikv_servers:
  - host: 127.0.0.1
    port: 21161

You can try it yourself

╭─ichn@menheraArch ~/Projects/pingcap/tidb ‹release-4.0›
╰─$ git bisect start                                                                                                                                                                                                                                      1 ↵
╭─ichn@menheraArch ~/Projects/pingcap/tidb ‹release-4.0›
╰─$ git bisect good 51d365fc45fdfc039eb204a96268c5bd1c55075f
╭─ichn@menheraArch ~/Projects/pingcap/tidb ‹release-4.0›
╰─$ git bisect bad release-4.0
Bisecting: 86 revisions left to test after this (roughly 7 steps)
[639d4596bc32571376390095bfd6556d7f61866c] 2pc: fix backoffer and cancel in batchExecutor (#20542) (#20548)
╭─ichn@menheraArch ~/Projects/pingcap/tidb ‹639d4596b›
╰─$ git bisect run ../bisect-20027.sh

In conclusion, if you need to find the change of behavior, git bisect is a killing tool to help. The above two scripts should be helpful in most scenarios and you can adapt it to cater your own cases easily.

Knowing the code change that makes the behavior different opens the door to resolving problems, because you can narrow the problem into a limited range. And sometimes, problem is resolved once you find the commit (all you need to do is to cherry-pick).

I would value git bisect the best skill I learned working with TiDB.

Let me know if you have any questions!