[SERVER-27834] Index builds during initial sync should not implicitly create collections Created: 27/Jan/17 Updated: 05/Apr/17 Resolved: 24/Mar/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 3.4.4, 3.5.5 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Judah Schvimer | Assignee: | Benety Goh |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||||
| Backport Completed: | |||||||||||||||||||||||||||||||||||||||||
| Sprint: | Repl 2017-03-27 | ||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||||||||||||||||||||||||||
| Description |
|
Implicit collection creation is never correct during initial sync. In index builds we call getOrCreateCollection which creates the collection to index if we don't have one. Additionally, we do not call txn->setReplicatedWrites(false) on the operation context created for background index builds, so when those create a new collection, they also attempt to write to the oplog. This behavior should probably be changed, but the real problem is the implicit collection creation. |
| Comments |
| Comment by Githook User [ 31/Mar/17 ] |
|
Author: {u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}Message: (cherry picked from commit fd46b39bd957df28fa2273bf5e4dcbb1765e4026) |
| Comment by Githook User [ 31/Mar/17 ] |
|
Author: {u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}Message: This avoids relying on the implicit collection behavior in IndexBuilder. (cherry picked from commit 9bc30836f60a888c44de09c55a45c278b215a02b) |
| Comment by Githook User [ 31/Mar/17 ] |
|
Author: {u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}Message: (cherry picked from commit f54e89fa124ef679e25ca37b3bcf8e02572f38fd) |
| Comment by Githook User [ 24/Mar/17 ] |
|
Author: {u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}Message: |
| Comment by Githook User [ 24/Mar/17 ] |
|
Author: {u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}Message: This avoids relying on the implicit collection behavior in IndexBuilder. |
| Comment by Githook User [ 24/Mar/17 ] |
|
Author: {u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}Message: |
| Comment by Judah Schvimer [ 01/Feb/17 ] |
|
Node 2 fails to see the drop because the drop entry is after the createIndex entry in the oplog. Since we apply oplog entries in order of the oplog, Node 2 will create the index and then drop it when it hits the drop oplog entry. There certainly may be room for improvement here with respect to skipping unnecessary operations. |
| Comment by Nathan Myers [ 30/Jan/17 ] |
|
I assumed that any attempts to create an index during initial sync would have to come from mistaken user input. In the scenario, I don't see how node 2 fails to see the drop before it starts indexing. Is the indexing job queued separately, so it is not synchronous with oplog events? Maybe queued jobs operating on dropped tables should also be cleared from the queue synchronously with the drop. That wouldn't help with tasks already running, but there is probably a good chance that the job hasn't started yet. If it has, then the index already exists, and can be dropped too. We do drop the index when we drop the table, right? |
| Comment by Judah Schvimer [ 30/Jan/17 ] |
|
I agree that the index creation should fail if the collection does not exist. That is how we handle other idempotency issues in initial sync. nathan.myers, can you elaborate more on how the collection name can be misspelled during initial sync? In initial sync we just call listCollections on the upstream node and use that name. I believe this occurs due to the following sequence of events. Node 2 is initial syncing from Node 1. Node 1 creates collection A. Node 2 then starts initial sync and starts buffering ops from Node 1. Node 1 creates an index on the collection and Node 2 buffers that op. Node 1 then drops the collection. Node 2 never clones the collection. Node 2 goes to apply its ops and has to apply a createIndex op on a collection that it never cloned. By turning off the oplog, I assume you mean setReplicatedWrites(false). We can either create another ticket for it, or do them together. It's not that we need to turn off replicated writes while it creates the collection since it shouldn't be creating the collection in the first place. Rather we have to turn them off on the new thread that does the background index build since it has a new OperationContext. |
| Comment by Nathan Myers [ 28/Jan/17 ] |
|
Maybe we should fail index creation during initial sync. And, maybe we need another ticket about turning off the oplog while it creates the collection. I suggest it would be better for the index creation to fail any time there is no collection for it to index. It's overwhelmingly more likely that the collection name was misspelled; and then the misnamed collection and the index on it have to be cleaned up manually, if anybody notices they are there. |