[SERVER-32865] Dropped write during insert loop with retry_writes on Created: 23/Jan/18 Updated: 21/Feb/18 Resolved: 24/Jan/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Shell |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Timothy Olsen (Inactive) | Assignee: | Max Hirschhorn |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
|||
| Operating System: | ALL | |||
| Steps To Reproduce: | 1. Start a MongoDB 3.6.2 replica set with FCV 3.6
3. While the inserts are happening issue an rs.stepDown() command on the primary |
|||
| Participants: |
| Description |
|
max.hirschhorn suggested I file this ticket. If I start a an insert loop that attempts to insert 10,000 documents with retry_writes on and then step down the primary only 9,999 documents are inserted. MongoDB is running with FCV 3.6. |
| Comments |
| Comment by Timothy Olsen (Inactive) [ 24/Jan/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Yep, that was it! I found that parameter from someone's blog. I should have checked our official documentation. You can close this ticket. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Max Hirschhorn [ 24/Jan/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Yeah, that looks like a replica set connection string.
tim.olsen, it looks like you may not actually be enabling retryable writes because the parameter is "retryWrites" as camelCase and not snake_case. Additionally, the --retryWrites command line option to the mongo shell is intended to be the user-facing way to enable retryable writes in the mongo shell. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Timothy Olsen (Inactive) [ 24/Jan/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I believe I'm using a replica set connection string. Here is the command line I used to start the mongo shell:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Max Hirschhorn [ 24/Jan/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
tim.olsen, could you confirm that the mongo shell you were using to perform the insert operations was using a replica set connection string? I think the "not master" error you're seeing after having put in the assert.writeOK() call is potentially indicating that the mongo shell attempted to retry the failed insert operation on the same server rather than trying to discover the new primary. This would happen if you attempted to use retryable writes while directly connected to the primary you're stepping down. Below is a codified version of what I understood your procedure to be and I haven't seen it fail.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Timothy Olsen (Inactive) [ 24/Jan/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Here is the shell output with TestData= {logRetryAttempts: true}and using assert.writeOK(). In this case however the final document count is 7,349. I assume that means the loop got interrupted somehow.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Timothy Olsen (Inactive) [ 24/Jan/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I've attached the logs from the old primary and new primary. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Max Hirschhorn [ 23/Jan/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
kaloian.manassiev, I had Tim define TestData={logRetryAttempts: true} during his local setup to see if there'd be evidence of the mongo shell actually doing the retry following a stepdown, and we didn't see the print() statements that Jack had added from | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kaloian Manassiev [ 23/Jan/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Would it be possible to attach the logs from both the new and the old primary? I am curious if there are any errors reported. max.hirschhorn, just curious - have you discounted a possible bug in the shell? CC jack.mulrow |