[SERVER-6984] Initial sync can fail, or break future replication, when updates shrink or grow docs in capped collections Created: 10/Sep/12  Updated: 06/Dec/22  Resolved: 15/Jan/16

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: 2.2.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kevin Matulef Assignee: Backlog - Storage Execution Team
Resolution: Done Votes: 11
Labels: repl1, sync
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by DOCS-3288 Documents in Capped Collections shoul... Closed
Duplicate
is duplicated by SERVER-9994 Slave hits Fatal Assertion 16361 duri... Closed
is duplicated by SERVER-4939 capped collection replication may fai... Closed
Related
related to SERVER-4940 capped collection replication dedups ... Closed
related to SERVER-20529 WiredTiger allows capped collection o... Closed
is related to SERVER-8972 extent layout differences can cause i... Closed
Assigned Teams:
Storage Execution
Backwards Compatibility: Minor Change
Operating System: ALL
Participants:

 Description   

Right now we allow updates on docs in capped collections, as long as docs don't grow past the size of their initial allocation. However, during initial sync, the data is cloned but the initial allocation size is lost on the secondary. So if inserts or updates which affect document size occur during the cloning process, when replaying the docs you can get an error message saying "objects in a capped ns cannot grow."

For instance, this happens if the following sequence of ops happens during initial sync:

  • db.foo.insert( { a : 1, b : "big "}

    )

  • db.foo.update( { a : 1 }

    , {$unset : {b : 1}})

If the smaller version of the doc is cloned during the initial sync, you will get an error message at the end of the initial sync when it goes to apply the ops:

Sun Sep  9 19:52:48 [repl writer worker 1] ERROR: exception: failing update: objects in a capped ns cannot grow on: { ts: Timestamp 1347234740000|1, h: -4246821095103890152, op: "i", ns: "test.zzz", o: { _id: ObjectId('504d2bb46bbaa186f1cc7566'), a: 1.0, b: "big" } }
Sun Sep  9 19:52:48 [repl writer worker 1]   Fatal Assertion 16361
0x109c45b1b 0x10a09bde7 0x10a072239 0x109df1bc8 0x109e2c915 0x7fff8c2c5782 0x7fff8c2b21c1 
 0   mongod                              0x0000000109c45b1b _ZN5mongo15printStackTraceERSo + 43
 1   mongod                              0x000000010a09bde7 _ZN5mongo13fassertFailedEi + 151
 2   mongod                              0x000000010a072239 _ZN5mongo7replset21multiInitialSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE + 393
 3   mongod                              0x0000000109df1bc8 _ZN5mongo10threadpool6Worker4loopEv + 138
 4   mongod                              0x0000000109e2c915 thread_proxy + 229
 5   libsystem_c.dylib                   0x00007fff8c2c5782 _pthread_start + 327
 6   libsystem_c.dylib                   0x00007fff8c2b21c1 thread_start + 13
Sun Sep  9 19:52:48 [repl writer worker 1] 
 
***aborting after fassert() failure

Even if initial sync manages to succeed, it's possible that future updates which grow a document on the primary will break replication, because there is no space for the doc to grow on the secondary. I've verified that this can occur, and the error message is similar to above.



 Comments   
Comment by Eric Milkie [ 15/Jan/16 ]

Due to SERVER-20529, fixed in 3.2.0-rc0, documents in capped collections can no longer shrink, so the initial sync problem is no longer an issue.

Comment by Asya Kamsky [ 03/Jun/15 ]

rosmo - capped collections are most effective when they are used as append-only inserts and tailing reads - they are not as effective for updates, so certainly eliminating update operations would avoid this issue. If updates cannot be avoided then making sure that updates don't change the size of the document would be the next best thing.

Asya

Comment by Taneli Leppä [ 01/Jun/15 ]

This affects 2.4.12 as well. Any decent workaround?

Comment by David Burley [ 19/Feb/13 ]

Any update on this issue? Are there any workarounds for it in the mean time?

Comment by Aaron Staple [ 10/Sep/12 ]

Here's a (potentially old) test (from a duplicate ticket):

Author:

{u'login': u'astaple', u'name': u'Aaron', u'email': u'aaron@10gen.com'}

Message: SERVER-4939 test
Branch: master
https://github.com/mongodb/mongo/commit/921c3d74f30927ff49261cf79266fbbc9f37901a

Comment by Aaron Staple [ 10/Sep/12 ]

matulef Sure I'll make the other a dup since you have more info here.

Comment by Aaron Staple [ 10/Sep/12 ]

Also see SERVER-4939 (with test).

Comment by Kevin Matulef [ 10/Sep/12 ]

Options for fixing this are:

  1. when cloning capped collections, allocate the same amount of space the doc was originally allocated with (don't compact).
  2. for capped collections, disallow updates which shrink or grow docs (strictly in-place updates only).
Generated at Thu Feb 08 03:13:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.