-
Type: Task
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: 0.7
-
Component/s: None
-
Environment:Linux Fedora 2.6.35.14-97.fc14.x86_64, mongodb version 1.6.4
I have an implementation of tailable cursors and wanted to test the behaviour of this cursor when connection to mongo was lost. My observation is that the cursor stays in the MONGO_CURSOR_PENDING state even when the connection has been lost. The conn->err at this point is MONGO_IO_ERROR due to the lack of connection.
Should the cursor state not be changed to MONGO_CURSOR_INVALID at this point?
I took another approach as well. Since I want auto-reconnections, I added some test auto-reconnect logic at the very lower API level in the function mongo_message_send. This allows reconnection to take place without adding any wrappers to the higher level APIs. Now what happens is that the client is successfully able to reconnect. However the tailable cursor still stays in the MONGO_CURSOR_PENDING state as mentioned above. Now since mongodb has gone through a restart cycle(I am trying to mimic a recovery from a crash) it would not recognize the old cursor ID which the client is using hence it return a zero value for cursorID in the response. mongodb logs gives:
Wed Apr 24 11:25:24 [conn1] getMore: cursorid not found admin.configupdates 2515604382749781657
This causes the following code path to be activated in the function mongo_cursor_next:
if ( cursor->reply->fields.num == 0 ) {
/* Special case for tailable cursors. */
-->> if( cursor>reply->fields.cursorID ) {
if( ( mongo_cursor_get_more( cursor ) != MONGO_OK ) ||
cursor->reply->fields.num == 0 )
}
else
--->> return MONGO_ERROR;
}
Here the check 'if(cursor->reply->fields.cursorID)' fails and MONGO_ERROR is returned without changing the state of the cursor. Shouldnt the cursor state be changed to MONGO_CURSOR_INVALID?, so that the application layer can look at this error and figure out that some kind of problem occurred and refresh its cursor.
I am attaching a patch for the naive auto-reconnect that i used. Kindly comment if i should raise an issue for this or otherwise