Uploaded image for project: 'C Driver'
  1. C Driver
  2. CDRIVER-3563

Cursor may use an invalidated server description

      Since a mongoc_cursor_t ties itself to a server_id, it is currently possible that it may attempt to send a command using a server description that has been marked as UNKNOWN.

      I have been able to reproduce such a situation by modifying example-client.c. Here is the salient bit:

         printf ("initializing agg, this ties cursor to the server, but does not send the command\n");
         cursor = mongoc_collection_aggregate (collection, MONGOC_QUERY_NONE, &empty_doc, NULL /* opts */, NULL /* read_prefs */);
         if (mongoc_cursor_error (cursor, &error)) {
            printf ("agg error: %s\n", error.message);
         }
      
         printf ("simulating a server being marked as UNKNOWN\n");
         mongoc_topology_invalidate_server (client->topology, cursor->server_id, &error);
         printf ("done\n");
         
         printf ("sending aggregate command\n");
         mongoc_cursor_next (cursor, &doc);
         if (mongoc_cursor_error (cursor, &error)) {
            printf ("error: %s\n", error.message);
         }
      

      This prints:

      initializing agg, this ties cursor to the server, but does not send the command
      simulating a server being marked as UNKNOWN
      sending aggregate command
      error: "aggregate" command does not support readConcern with wire version 0, wire version 4 is required
      

      The test case forces a server being marked as UNKNOWN. But I believe this can happen in a two situations:
      1. A > 4.0 server receives a "not master" error. The server description is marked as unknown, but the connections are left open. In <= 4.0 I don't believe this is an issue since the connections are also reset, so the next attempt to send a command on the cursor will recreate the connection and do another handshake.
      2. The background monitor receives a network error and marks the server as unknown. Because of CDRIVER-3529, this is more likely to happen on a transient network error.

      This bug is extremely similar to CDRIVER-3404. I think it is worth investigating if this could surface in other parts of the codebase, since we have other wire version checks. I think it's worth considering rethinking how we're invalidating server descriptions / doing wire version checks.

            Assignee:
            kevin.albertson@mongodb.com Kevin Albertson
            Reporter:
            kevin.albertson@mongodb.com Kevin Albertson
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: