[SERVER-11908] Failure to rollback usePowerOf2Sizes should not cause fatal error Created: 30/Nov/13  Updated: 11/Jul/16  Resolved: 04/Dec/13

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.4.6, 2.4.8
Fix Version/s: 2.4.9, 2.5.5

Type: Task Priority: Critical - P2
Reporter: Cailin Nelson Assignee: Eliot Horowitz (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-19719 Failure to rollback noPadding should ... Closed
Participants:

 Description   
Issue Status as of December 30th, 2013

ISSUE SUMMARY
This issue only occurs if a replica set member enters a ROLLBACK state, and the operations being rolled back include a call to the collMod command which modifies usePowerOf2Sizes. If these conditions are encountered it will cause the member to shutdown and enter a FATAL state in the replica set.

USER IMPACT
If a replica set member encounters this command in the oplog section it is rolling back the server will shut down and the following messages will appear in the server log:

Tue Sep 24 05:47:09.376 [rsBackgroundSync] replSet error can't rollback this command yet: { collMod: "files", usePowerOf2Sizes: true }
Tue Sep 24 05:47:09.383 [rsBackgroundSync] replSet cmdname=collMod
Tue Sep 24 05:47:09.384 [rsBackgroundSync] replSet replica set fatal exception
Tue Sep 24 05:47:09.384 [rsBackgroundSync] replSet error fatal, stopping replication

It will not be possible to restart the member successfully until this situation is cleared, the member will be left in the FATAL state.

This issue is present in all versions of MongoDB prior to and including v2.4.8.

SOLUTION
Instead of shutting down, the call is ignored and a warning is logged: "replSet not rolling back change of usePowerOf2Sizes"

WORKAROUNDS
The best workaround is to re-sync the replica set member. See documentation on re-syncing a member.

PATCHES
Production release v2.4.9 contains the fix for this issue, and production release v2.6.0 will contain the fix as well.

Original Description

If a replica set member attempts a rollback of a period which contained

{ collMod: "files", usePowerOf2Sizes: true }

this causes a fatal error. The replica set member is thereafter left in the FATAL state.

While it seems reasonable that usePowerOf2Sizes cannot be rolled back, this is probably not the best user experience. I would prefer that my replica set member continue to function, even if the disk space allocation algorithm is different than what I asked for.

Instead, this op could be skipped (with a loud warning)?

Full log snippet demonstrating the problem:

Tue Sep 24 05:47:06.335 [rsBackgroundSync] replSet syncing to: brs7.ny1.10gen.cc:27010
Tue Sep 24 05:47:09.374 [rsBackgroundSync] replSet we are ahead of the sync source, will try to roll back
Tue Sep 24 05:47:09.375 [rsBackgroundSync] replSet rollback 0
Tue Sep 24 05:47:09.375 [rsBackgroundSync] replSet ROLLBACK
Tue Sep 24 05:47:09.375 [rsBackgroundSync] replSet rollback 1
Tue Sep 24 05:47:09.375 [rsBackgroundSync] replSet rollback 2 FindCommonPoint
Tue Sep 24 05:47:09.376 [rsBackgroundSync] replSet info rollback our last optime:   Sep 24 05:47:05:39
Tue Sep 24 05:47:09.376 [rsBackgroundSync] replSet info rollback their last optime: Sep 24 05:46:48:ab
Tue Sep 24 05:47:09.376 [rsBackgroundSync] replSet info rollback diff in end of log times: 17 seconds
Tue Sep 24 05:47:09.376 [rsBackgroundSync] replSet WARNING ignoring op on rollback no _id TODO : backupstore.system.indexes { ts: Timestamp 1380001625000|3, h: 2707240384590046610, v: 2, op: "i", ns: "backupstore.system.indexes", o: { name: "_id_1_filename_1", ns: "backupstore.files", key: { _id: 1, filename: 1 } } }
Tue Sep 24 05:47:09.376 [rsBackgroundSync] replSet error can't rollback this command yet: { collMod: "files", usePowerOf2Sizes: true }
Tue Sep 24 05:47:09.383 [rsBackgroundSync] replSet cmdname=collMod
Tue Sep 24 05:47:09.384 [rsBackgroundSync] replSet replica set fatal exception
Tue Sep 24 05:47:09.384 [rsBackgroundSync] replSet error fatal, stopping replication
Tue Sep 24 05:47:09.755 [conn476406] end connection 10.10.0.135:59317 (6 connections now open)



 Comments   
Comment by Githook User [ 20/Dec/13 ]

Author:

{u'username': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}

Message: SERVER-11908: let rollback handle collmod
Branch: v2.4
https://github.com/mongodb/mongo/commit/8bc5cef78b701ee3165b3f97f86f7b97ca49e2b4

Comment by Githook User [ 04/Dec/13 ]

Author:

{u'username': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}

Message: SERVER-11908: let rollback handle collmod
Branch: master
https://github.com/mongodb/mongo/commit/85274e6dd5bd2b38983a43aa1e9af419b9d3f1b2

Generated at Thu Feb 08 03:27:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.