[SERVER-15393] Renaming a collection with newly added background indexes may fail to replicate Created: 25/Sep/14  Updated: 11/Jul/16  Resolved: 07/Nov/14

Status: Closed
Project: Core Server
Component/s: Index Maintenance, Replication
Affects Version/s: 2.6.4, 2.7.4
Fix Version/s: 2.7.5

Type: Bug Priority: Major - P3
Reporter: Kevin Pulo Assignee: Eric Milkie
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-4941 collection rename may not replicate /... Closed
related to SERVER-16274 secondary fasserts trying to replicat... Closed
Tested
tested by SERVER-16943 Add test for compatibility of replset... Closed
Operating System: ALL
Steps To Reproduce:

use db1
db.content.ensureIndex({a:1}, {background:true})
db.content.insert({})
db.adminCommand({renameCollection: 'db1.content', to: 'db2.content', dropTarget: true})

Participants:

 Description   

After renaming a collection which contains background indexes, the collection will be successfully renamed on the primary, correctly recorded in the oplog, but does not replicate, ie. the collection will not have been renamed on the secondaries. This is due to the following error on the secondaries:

2014-09-25T15:56:39.080+1000 [repl writer worker 1] build index on: db2.content properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "db2.content" }
2014-09-25T15:56:39.080+1000 [repl writer worker 1]      building index using bulk method
2014-09-25T15:56:39.080+1000 [repl writer worker 1] build index done.  scanned 1 total records. 0 secs
2014-09-25T15:56:39.080+1000 [repl writer worker 1] build index on: db2.content properties: { v: 1, key: { a: 1.0 }, name: "a_1", ns: "db2.content", background: true }
2014-09-25T15:56:39.083+1000 [repl writer worker 1] index build failed. spec: { v: 1, key: { a: 1.0 }, name: "a_1", ns: "db2.content", background: true } error: 13130 can't start bg index b/c in recursive lock (db.eval?)
2014-09-25T15:56:39.083+1000 [repl writer worker 1] restarting 0 index build(s)
2014-09-25T15:56:39.083+1000 [repl writer worker 1] warning: repl Failed command { renameCollection: "db1.content", to: "db2.content", dropTarget: true } on admin with status UnknownError Location13130 can't start bg index b/c in recursive lock (db.eval?) during oplog application
2014-09-25T15:56:48.965+1000 [repl writer worker 1] warning: repl Failed command { renameCollection: "db2.content", to: "db1.content", dropTarget: true } on admin with status UnknownError source namespace does not exist during oplog application

The problem does not occur in 2.7.5, so it must have been fixed between 2.7.4 and 2.7.5. It's still present in 2.6.4.



 Comments   
Comment by Eric Milkie [ 07/Nov/14 ]

Fixed with commit 00913e47de5aced5267e44e82ac9e976bbaac089

Comment by Eric Milkie [ 07/Nov/14 ]

Renames across databases no longer builds indexes in the background, so this issue is solved in the 2.7 series. We need to backport the changes to the 2.6 branch to fix it there.

Comment by Eric Milkie [ 25/Sep/14 ]

redbeard0531 can you investigate why renameCollection() on a secondary has different locking behavior when building indexes? I'm surprised that it would be recursive there but not on a primary.

Comment by Eric Milkie [ 25/Sep/14 ]

I believe the problem might be restricted by two more things:
1. you must be doing a rename across databases, which requires a copy-and-delete strategy; rename within collection is not affected
2. a background index must not only exist on the collection, but it must also still be in progress of building on the secondary when the secondary receives the replicated renameCollection command.
Can you confirm? I'm not certain about the 2nd one.

Generated at Thu Feb 08 03:37:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.