Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Works as Designed
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Index Maintenance, Replication
Labels:
None

Assigned Teams:

Storage Execution
Operating System:
ALL
Case:
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

Our general recommendation for replica sets is to build indexes in a rolling fashion. To the best of my knowledge, the same approach is used when Automation creates the indexes in a replica set.

It is absolutely appropriate in case with regular indexes, however there is a problem if the index that needs to be created is unique. I tested different versions of MongoDB and found that behaviour has changed over the years. The behaviour of the most recent release (3.4) still seems problematic to me - I'll explain below.

This is my test:

In a replica set of 3 members start a secondary in a standalone and create a unique index
Re-start the secondary as a replica set member again
Try inserting duplicate entries on the primary (that does not have the unique index)

2.6.12:

As soon as the duplicate document is inserted on the Primary, the secondary crashes with:

"2.6.12"

2017-07-24T14:25:06.265+1000 [repl writer worker 1] ERROR: writer worker caught exception:  :: caused by :: 11000 insertDocument :: caused by :: 11000 E11000 duplicate key error index: test.c.$a_1  dup key: { : 1.0 } on: { ts: Timestamp 1500870306000|1, h: -7295089461209908464, v: 2, op: "i", ns: "test.c", o: { _id: ObjectId('597576a2cdfffab3d32a73bc'), a: 1.0 } }
2017-07-24T14:25:06.265+1000 [repl writer worker 1] Fatal Assertion 16360

3.0.14 (MMAPv1, WT):

The document with a duplicate index key can be inserted on the Primary without crashing the secondary

The document can be queried on the Secondary via the unique index with no issues - multiple matching documents are getting returned:

"3.0.14"

testRS2:SECONDARY> db.c.getIndexes()
[
	{
		"v" : 1,
		"key" : {
			"_id" : 1
		},
		"name" : "_id_",
		"ns" : "test.c"
	},
	{
		"v" : 1,
		"unique" : true,
		"key" : {
			"a" : 1
		},
		"name" : "a_1",
		"ns" : "test.c"
	}
]
testRS2:SECONDARY> db.c.find({a:1}).hint({a:1})
{ "_id" : ObjectId("597573bef38745c49e94f6eb"), "a" : 1 }
{ "_id" : ObjectId("597573c5f38745c49e94f6ec"), "a" : 1 }

3.2.14 (MMAPv1):

The document with a duplicate index key can be inserted on the Primary without crashing the secondary
The document can be queried on the Secondary via the unique index with no issues

3.2.14 (WT):

The document with a duplicate index key can be inserted on the Primary without crashing the secondary
The document can be queried on the Secondary with no issues, unless the unique index is used

If the unique index is used to run the query, the secondary server crashes:

"3.2.14"

testRS3:SECONDARY> db.c.find()
{ "_id" : ObjectId("597575c1cdfffab3d32a73bb"), "a" : 1 }
{ "_id" : ObjectId("597576a2cdfffab3d32a73bc"), "a" : 1 }
testRS3:SECONDARY> db.c.find({a:1})
2017-07-24T15:22:20.510+1000 I NETWORK  [thread1] Socket say send() Broken pipe 127.0.0.1:27001
Error: socket exception [SEND_ERROR] for 127.0.0.1:27001
2017-07-24T15:22:20.523+1000 I NETWORK  [thread1] trying reconnect to 127.0.0.1:27001 (127.0.0.1) failed
2017-07-24T15:22:20.524+1000 I NETWORK  [thread1] reconnect 127.0.0.1:27001 (127.0.0.1) ok

// in the mongod.log
2017-07-24T13:59:28.584+1000 F STORAGE  [conn8] Unique index cursor seeing multiple records for key { : 1.0 }
2017-07-24T13:59:28.610+1000 I -        [conn8] Fatal Assertion 28608
2017-07-24T13:59:28.610+1000 I -        [conn8]

3.4.4 (MMAPv1):

The behaviour is the same as with v3.2.14 (MMAPv1)

3.4.4 (WT):

The behaviour is the same as with v3.2.14 (WT)

I can understand how the behaviour of v2.6 was unwanted - from the Primary/client's logic the Secondaries should not crash, since the client is performing legitimate actions.

Having said that, the behaviour of v3.0+ MMAPv1 is more troublesome as unique constraint is not in effect and duplicate entries are getting added into the unique index. Needless to say, it must be a defect that WT and MMAPv1 do not behave in the same way (WT aborts, but MMAPv1 does not).

The versions v3.2 and 3.4 are more strict, but what it allows is that during the rolling index builds (note that is the procedure we recommend officially in our docs!) there is a window of opportunity that allows duplicates to be written on the primary, while it doesn't have a unique index yet, and then replicated onto the secondaries with the unique indexes. Those secondaries are now timing bombs - they will crash as soon as the duplicate docs are queried via unique index. That can happen 2 minutes after the unique index is created, or 5 years down the road.

Of course the user will be able to figure out there is a problem when they try to build the unique index on the last replica set member (the index build should fail due to the presence of the duplicates). But there is also a possibility that that member can get decommissioned without having the index built, and, with the remaining (rigged) replica set members running.

I'm not sure what's the best solution here. Ideally the secondary should not allow new entries to be added to a unique index, but having it crashed (as with 2.6) is not good either. Regardless of the solution, we need to ensure that the outcome is the same no matter which storage engine is used (MMAPv1 or WT).

Assignee:: [DO NOT USE] Backlog - Storage Execution Team
Reporter:: Dmitry Ryabtsev
Participants:: [DO NOT USE] Backlog - Storage Execution Team, Brian Lane, Dmitry Ryabtsev, Eric Milkie
Votes:: 3 Vote for this issue
Watchers:: 14 Start watching this issue

Created:: Jul 24 2017 04:56:26 AM UTC
Updated:: Oct 27 2023 01:54:20 PM UTC
Resolved:: Sep 16 2019 06:44:58 PM UTC

Details

Description

Attachments

Activity

People

Dates