Posts Tagged ‘backend’

Better Faster IMAP in Thunderbird: Pseudo Offline Delete/Copy/Move

June 2, 2008

In the context of pseudo offline delete/copy/move implementation – Bug 435153, I have submitted my first experimental patch last week. Surprisingly, most of the pseudo offline code was already in Thunderbird, but the glue code to integrate this feature was missing. Missing parts were; the code that replaces the on-line operation with the offline one, and the code that plays back pending requests on time. My patch filled the blanks.

In Thunderbird, the default delete model for IMAP is to move messages from the source folder into the trash on the same server. Unless it is set otherwise by the user, the delete operation and move operation are essentially the same. The move operation consists of two combined operations; copy and delete. Because of that, it was a natural decision to implement the code that stores the pseudo offline operations in nsImapMailFolder::CopyMessages() method. The idea was to create a playback request object for each copy operation and play it back after a predefined delay to allow UI to fetch the next message in the list. For timing requirements, the first prototype included a timer into each request object. Since this was a proof of concept for the timer approach, I didn’t pay attention to performance issues. Timer delay is set to 500 ms by default. The prototype worked well in general and allowed me to identify potential problems. Mainly 3 design, and 3 unforeseen problems are identified. Design problems were;

1) Not having a management mechanism to monitor the beginning, and ending of each request posed control problems. This was important, especially for error handling.

2) Creating a timer in each request object looked far from optimal, especially when the user deletes messages one-by-one repeatedly.

3) The scope of the playback operation was rather large. It was iterating all servers and all folders to execute the pending operations, every time it is called.

Unforeseen problems were;

1) Bug 435259. Thanks to Bienvenu, it has been fixed.

2) We have discovered that unless it is handled silently, IDLE messages sent by the imap server after each batch playback operation would cause unnecessary folder updates, which eventually causes UI noise.

3) Consecutive move and delete operations cause empty and duplicated messages – looks very similar to Bug 414723.

In the next prototype I made some changes for the reasons mentioned above. I added a manager class to control request objects. Since we have already problems with timer events on Mac OS X, I moved the timer into manager class to lessen the burden on Gecko event loop. One shot timers in playback request objects are consolidated into one REPEATING_SLACK timer in manager class. State management, request grouping and IDLE handling mechanism are implemented as well. And finally, with Bienvenu’s help, the scope of the playback operations are reduced from server level to folder level.

These changes solved most of the stated problems. Grouping helped to optimize the number of playbacks executed, IDLE message handling reduced folder update noise, changing playback scope increased the performance. Also, I have found  out that  although copy operation is managed by the destination IMAP folders, the playback should be executed on the source folders for proper behavior. And surprisingly, IDLE messages are not sent to source folders, but they are sent to destination folders, therefore they should be handled there.

This patch also has revealed couple more problems that should be addressed:

1- Copy operations covering folders belong to different servers cause infinite recursion, eventually leading to a crash

2- The offline playback state machine still doesn’t send any error messages

3- After delete/move operations, when the user selects the destination folder, the content of the folder gets updated unnecessarily – from the user perspective. Another side effect of this behavior is if TB goes offline before the playback is  executed on the server, TB gives an error to the user and shows the previous state of the folder. Not a show stopper but definitely annoying.

4- Very rarely, consecutive delete/move/copy operations cause view index jumps, duplicate or empty messages

Well, in the final patch I tried to address all these problems as much as possible; the first problem is solved by simply limiting pseudo offline operations to in-server folders. If the operation is cross-server, then it is executed on-line or offline, as it is right now. The second one is tricky and requires more pluming. Currently error messages are not handled differently than it was before, and maybe it is better to push this work into  the next feature implementation. I have discussed the third problem with Bienvenu and he pointed out the necessity  of such behavior in the existing code. It looks like it is not trivial at all to change it. And finally, I can’t reproduce the last problem with the new code. This is good news but not convinced yet that it doesn’t exist anymore. I think needs more investigation before feeling safe.

Implementation wise, I limited my code with two classes, and tried to isolate them from the rest of the system as much as possible. In order to accomplish the isolation, I did use an unnamed “namespace” hoping that it will be ok for the module owner. Also, followed the DeCOMtamination guidelines to use nsRefPtr smart pointer instead of nsCOMPtr.

I must admit that it took me longer than I thought to get to this point, and still counting – reviews, changes etc.. The feeling of pushing the limits of the current design is a little bit uncomfortable, especially in the absence of regression tests.

This week I am going to start working on the second item of this series: Bug 436615.  I will continue to keep you posted about the developments.