Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-30276

Secondary crashes after querying a unique index containing duplicates

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Works as Designed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Indexing, Replication
    • Labels:
      None
    • Operating System:
      ALL
    • Case:

      Description

      Our general recommendation for replica sets is to build indexes in a rolling fashion. To the best of my knowledge, the same approach is used when Automation creates the indexes in a replica set.

      It is absolutely appropriate in case with regular indexes, however there is a problem if the index that needs to be created is unique. I tested different versions of MongoDB and found that behaviour has changed over the years. The behaviour of the most recent release (3.4) still seems problematic to me - I'll explain below.

      This is my test:

      1. In a replica set of 3 members start a secondary in a standalone and create a unique index
      2. Re-start the secondary as a replica set member again
      3. Try inserting duplicate entries on the primary (that does not have the unique index)

      2.6.12:

      • As soon as the duplicate document is inserted on the Primary, the secondary crashes with:

        "2.6.12"

        2017-07-24T14:25:06.265+1000 [repl writer worker 1] ERROR: writer worker caught exception:  :: caused by :: 11000 insertDocument :: caused by :: 11000 E11000 duplicate key error index: test.c.$a_1  dup key: { : 1.0 } on: { ts: Timestamp 1500870306000|1, h: -7295089461209908464, v: 2, op: "i", ns: "test.c", o: { _id: ObjectId('597576a2cdfffab3d32a73bc'), a: 1.0 } }
        2017-07-24T14:25:06.265+1000 [repl writer worker 1] Fatal Assertion 16360
        

      3.0.14 (MMAPv1, WT):

      • The document with a duplicate index key can be inserted on the Primary without crashing the secondary
      • The document can be queried on the Secondary via the unique index with no issues - multiple matching documents are getting returned:

        "3.0.14"

        testRS2:SECONDARY> db.c.getIndexes()
        [
        	{
        		"v" : 1,
        		"key" : {
        			"_id" : 1
        		},
        		"name" : "_id_",
        		"ns" : "test.c"
        	},
        	{
        		"v" : 1,
        		"unique" : true,
        		"key" : {
        			"a" : 1
        		},
        		"name" : "a_1",
        		"ns" : "test.c"
        	}
        ]
        testRS2:SECONDARY> db.c.find({a:1}).hint({a:1})
        { "_id" : ObjectId("597573bef38745c49e94f6eb"), "a" : 1 }
        { "_id" : ObjectId("597573c5f38745c49e94f6ec"), "a" : 1 }
        

      3.2.14 (MMAPv1):

      • The document with a duplicate index key can be inserted on the Primary without crashing the secondary
      • The document can be queried on the Secondary via the unique index with no issues

      3.2.14 (WT):

      • The document with a duplicate index key can be inserted on the Primary without crashing the secondary
      • The document can be queried on the Secondary with no issues, unless the unique index is used
      • If the unique index is used to run the query, the secondary server crashes:

        "3.2.14"

        testRS3:SECONDARY> db.c.find()
        { "_id" : ObjectId("597575c1cdfffab3d32a73bb"), "a" : 1 }
        { "_id" : ObjectId("597576a2cdfffab3d32a73bc"), "a" : 1 }
        testRS3:SECONDARY> db.c.find({a:1})
        2017-07-24T15:22:20.510+1000 I NETWORK  [thread1] Socket say send() Broken pipe 127.0.0.1:27001
        Error: socket exception [SEND_ERROR] for 127.0.0.1:27001
        2017-07-24T15:22:20.523+1000 I NETWORK  [thread1] trying reconnect to 127.0.0.1:27001 (127.0.0.1) failed
        2017-07-24T15:22:20.524+1000 I NETWORK  [thread1] reconnect 127.0.0.1:27001 (127.0.0.1) ok
         
        // in the mongod.log
        2017-07-24T13:59:28.584+1000 F STORAGE  [conn8] Unique index cursor seeing multiple records for key { : 1.0 }
        2017-07-24T13:59:28.610+1000 I -        [conn8] Fatal Assertion 28608
        2017-07-24T13:59:28.610+1000 I -        [conn8]
        

      3.4.4 (MMAPv1):

      • The behaviour is the same as with v3.2.14 (MMAPv1)

      3.4.4 (WT):

      • The behaviour is the same as with v3.2.14 (WT)

      I can understand how the behaviour of v2.6 was unwanted - from the Primary/client's logic the Secondaries should not crash, since the client is performing legitimate actions.

      Having said that, the behaviour of v3.0+ MMAPv1 is more troublesome as unique constraint is not in effect and duplicate entries are getting added into the unique index. Needless to say, it must be a defect that WT and MMAPv1 do not behave in the same way (WT aborts, but MMAPv1 does not).

      The versions v3.2 and 3.4 are more strict, but what it allows is that during the rolling index builds (note that is the procedure we recommend officially in our docs!) there is a window of opportunity that allows duplicates to be written on the primary, while it doesn't have a unique index yet, and then replicated onto the secondaries with the unique indexes. Those secondaries are now timing bombs - they will crash as soon as the duplicate docs are queried via unique index. That can happen 2 minutes after the unique index is created, or 5 years down the road.

      Of course the user will be able to figure out there is a problem when they try to build the unique index on the last replica set member (the index build should fail due to the presence of the duplicates). But there is also a possibility that that member can get decommissioned without having the index built, and, with the remaining (rigged) replica set members running.

      I'm not sure what's the best solution here. Ideally the secondary should not allow new entries to be added to a unique index, but having it crashed (as with 2.6) is not good either. Regardless of the solution, we need to ensure that the outcome is the same no matter which storage engine is used (MMAPv1 or WT).

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              backlog-server-execution Backlog - Execution Team
              Reporter:
              dmitry.ryabtsev Dmitry Ryabtsev
              Participants:
              Votes:
              3 Vote for this issue
              Watchers:
              13 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: