[SERVER-20277] Better handling of failing long running index builds in initial sync Created: 03/Sep/15  Updated: 06/Dec/22

Status: Open
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: features we're not sure of

Type: Improvement Priority: Major - P3
Reporter: Andre de Frere Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Replication
Participants:

 Description   

When a document is failed to index in initial sync, the build fails and is reattempted 9 times. For example something like the following might be seen:

2015-08-29T08:46:34.920-0600 [rsSync] replSet initial sync exception: 16755 Can't extract geo keys from object, malformed geometry?: { _id: ObjectId('...'), loc: [ x, y ] } 9 attempts remaining

Because this index takes a long time to fail, user intervention could be required (or the node will fall off the oplog) before 9 attempts expire.



 Comments   
Comment by Eric Milkie [ 03/Sep/15 ]

Each initial sync attempt is independent; thus, there is no concern the node will fall off the oplog due to retry attempts.
I'm not certain about the example presented here; it would be a bug that the primary managed to build a geo index but a secondary cannot build it.

Generated at Thu Feb 08 03:53:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.