Thursday, April 17, 2008

Division Between Users And Kernel Hackers On Git Bisect

The source code management tool git has come under scanner again. This time for a different reason. Flame war's are pretty common in linux community. Everytime there is a divided opinion on certain things, it unlocks a fury of mails from the community guru's. What happened this time is no different. It all started when Mark Lord reported a regression in the network stack. Even after a few mail exchanges it was not clear what the cause of problem was. So the netdev guru's asked Mark to "bisect" and arrive at the culprit patch. Mark responded furiously saying that he didnt have time or inclination to do such a thing. He argued that he was only a bug reporter and is not his job to do the bisection. This triggered the whole issue of who does what. It was exchange of heated arguments over mail and few humorous stories to support the claims. No one can forget the "Doctor Patient" story. To sum up the argument, main focus of this whole episode was who is responsible for such regressions. Lets know a little bit of git bisect. Git bisect is used to find a possible cause of the problem. It works on a simple principle, the bug hunter has to know which kernel is working well and which has bug. For example, 2.6.24 is not having any problem but 2.6.25 does, in this case one can use bisect to choose a version somewhere in the middle of these two release. Once done, the bug is to be verified and if found, git bisect has to be run again with the first half release. To make things easy take the same example as above. Bisecting this showed that 2.6.25-rc4 had the issue. So its now clear that the bug was introduced somewhere between 2.6.24 and 2.6.25-rc4. So running git bisect will narrow down even further and this will continue till we narrow down on a particular commit. This will help identify the bug. But the process is time consuming. To lay fact down plain and simple, git bisect need not necessarily narrow down on the correct patch which causes the problem. There is a possibility that problem was created else where and came into light on introducing this "culprit" patch. So this is not a sure shot way to identify the problem. In our issue Mark states that a user identifying a problem must only report it and that where his/her duty ends. It is upto the individual user to do some more homework and help the developers fix the bug. The developers argue that user will be asked to bisect as a last resort. By forcing users to do more work than just reporting bug can cause them to stop reporting bugs which is not a good thing for the community. On the other hand well known kernel hackers like David Miller claims that it is unavoidable sometimes due to unavailability of hardware the user had used in his environment. This requires the user to cooperate in this effort. As one can see, both side do have a strong point to argue. Its difficult to take sides here. This can become a major issue if not resolved quickly. Some suggestions made by Al Viro and James Morris suggest that the subsystem maintainers need to be more careful in committing patches. This can avoid most regression. Another question that was discussed was, how does a user decide which are "real" bugs? What happens in a complex code like the kernel is, bugs can arise due to some faulty hardware which nobody else faces. When that happens it is virtually impossible to fix it. In such cases the bug remains unfixed. It only come to light if multiple users complain of the same problem. The community urged its users to properly test before posting bugs on the mailing list. The story has not concluded yet as there is no clear solution to this problem. It is left to be seen as to how the community will tackle this issue.

No comments: