[SERVER-33057] Indexing fatal assertion on secondary during sync Created: 01/Feb/18 Updated: 21/Mar/18 Resolved: 16/Feb/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Index Maintenance, Replication |
| Affects Version/s: | 3.4.10 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Chad Kreimendahl | Assignee: | William Schultz (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Linux |
||
| Issue Links: |
|
||||||||
| Participants: | |||||||||
| Description |
|
We recently added a new hidden secondary to one of our many clusters. The cluster, while on version 3.4.10, is running in featureCompatabilityMode: 3.2. After initial sync, while rebuilding numerous indexes, the server crashed while attempting to create what appears to have been its 65th index. I suspect a fatal exception here is not ideal. I'm also curious how our other systems are still alive, while on the same version.
|
| Comments |
| Comment by Chad Kreimendahl [ 16/Feb/18 ] | ||
|
It's certainly possible. There shouldn't have been any index changes, but they do happen occasionally. That specific collection didn't have any administrative updates in our console, suggesting it might have been a predictive analysis tool we have that picks which 64 fields get indexed. If you guys would update your limit on number of indexes to like 256, that tool wouldn't matter much to us, as we'd index almost everything. While I say that, I'm reminded that | ||
| Comment by William Schultz (Inactive) [ 15/Feb/18 ] | ||
|
Thanks for submitting the log files sallgeud. This appears to be a known issue with initial sync related to the fact that MongoDB enforces a maximum of 64 indexes per collection. This creates potential problems for initial sync, due to the way that we clone data and apply oplog operations. Initial sync consists of two main data transfer operations, the Clone phase, where we copy all documents from each collection on the sync source, and the Oplog Fetching phase, where we fetch and buffer all oplog operations that occurred on the sync source for the duration of the initial sync. These processes happen concurrently. We can imagine the following initial sync event sequence:
This is likely the issue you ran into. After looking at the logs, it looks like collection OnspringDemo2.D_22 was the collection that violated the 64 index limit. It appears that, when it was cloned, it had 63 indexes on it, and then, during oplog application, a new index was created on it (a 64th index):
and then another index creation op was applied, which fails:
Without the logs of the sync source, it is tough to tell exactly, but what I presume happened is that the Fx.646.RRIds index was created and then later deleted on the sync source, before the collection was cloned (that index doesn't show up in the list of collections built on OnspringDemo2.D_22 during the initial sync), and then during oplog application we see the error appear because it tried to apply that index creation op, violating the 64 index constraint. | ||
| Comment by Chad Kreimendahl [ 07/Feb/18 ] | ||
|
Submitted. Takes a while to anonymize the data, schedule the firewall rules for egress of this type of info, and get it your way. You should have it now. | ||
| Comment by Kelsey Schubert [ 07/Feb/18 ] | ||
|
sallgeud, would you please let us know when you've uploaded the files? Thanks, | ||
| Comment by William Schultz (Inactive) [ 06/Feb/18 ] | ||
|
As Spencer mentioned, the fact that the collection has 64 indexes on it and this occurs during initial sync makes it seem highly likely it is a known issue (SERVER-27122), that is not slated to be fixed at this point. To be sure, it would be worth seeing the full logs. I wasn't able to get them from the download portal kelsey.schubert? | ||
| Comment by Kelsey Schubert [ 05/Feb/18 ] | ||
|
Hi sallgeud, Sorry for the delay, here's a secure portal for you to use. Kind regards, | ||
| Comment by Chad Kreimendahl [ 02/Feb/18 ] | ||
|
Can I get a secure upload site? I've already deleted half the indexes in that collection (for the time being), and will have our indexing tool recreate them after. I verified before I nuked some that it had exactly 64 on the primary and both existing secondaries. | ||
| Comment by Spencer Brody (Inactive) [ 01/Feb/18 ] | ||
|
I suspect this is a dupe of SERVER-27122 | ||
| Comment by Kelsey Schubert [ 01/Feb/18 ] | ||
|
Hi sallgeud, Would you please provide the complete logs starting when the initial sync began from the node that encountered this issue as well as the output of db.getSiblingDB("XDemo2").D_22.getIndexes()? Thank you, | ||
| Comment by Chad Kreimendahl [ 01/Feb/18 ] | ||
|
Apologies. This should be a "bug" not an "improvement". Won't let me change now. |