[SERVER-6975] $push idempotency issues with version 2.2.0 Created: 09/Sep/12 Updated: 15/Feb/13 Resolved: 28/Sep/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 2.2.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Braden Evans | Assignee: | Eric Milkie |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
ubuntu/3.0.0-17 |
||
| Issue Links: |
|
||||||||||||||||
| Operating System: | Linux | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
We first got this crash during initial sync of a secondary node, ended up downgrading to 2.0 because the sync could not resume, and upgrading to 2.2 once it completed. Now after three days of uptime we have the crash again:
And it crashes on startup with the same exception:
|
| Comments |
| Comment by Eric Milkie [ 15/Nov/12 ] |
|
Hi Andy. |
| Comment by Andy O'Neill [ 15/Nov/12 ] |
|
This is marked as resolved but it does remain as a crashing bug in the production version (2.2.1). Is there a tracking bug for a backport to 2.2? I do not agree that crashing during initial sync is preferable to keeping the 2.0 behavior... if you can consider from my position I now have to remain on 2.0 until 2.4 comes out because I can't bring up new 2.2 servers. Is that any better than not aborting during the sync? |
| Comment by Eric Milkie [ 28/Sep/12 ] |
|
Please see |
| Comment by Eric Milkie [ 10/Sep/12 ] |
|
|
| Comment by Braden Evans [ 10/Sep/12 ] |
|
Eric, Unfortunately this is our production system, and it is not viable to stop the $push operations while we sync (and indeed the crash I opened the case for is not a sync it is just normal operations) We are heavy uses (abusers?) of $push. |
| Comment by Eric Milkie [ 10/Sep/12 ] |
|
Hi Braden. Thanks for the detailed problem description. |
| Comment by Braden Evans [ 10/Sep/12 ] |
|
Looking through the logs from the first time we encountered this I noticed it always fails on a $push operation, on various different collections. The exception appears to be around the _id index, but to clarify there are no indexes on the field being pushed, or the other field being updated. The update that generated this op is: Some other examples of this exception (different collections): Mon Sep 3 15:00:38 [repl writer worker 7] ERROR: writer worker caught exception: E11000 duplicate key error index: [collection].$id dup key: { : ObjectId('5044fe2d293c0b02a893576b') } on: { ts: Timestamp 1346698527000|3315, h: -2145615260524114526, op: "u", ns: "[collection]", o2: { _id: ObjectId('5044fe2d293c0b02a893576b'), pc: { $size: 35 }}, o: { $push: { pc: { u: ObjectId('0000000000000000002283a9'), c: [ 12, 20, 43, 51, 73, 5, 24, 31, 48, 67, 1, 23, 34, 58, 69, 3, 30, 36, 53, 71, 11, 19, 39, 59, 68 ] }} } } Mon Sep 3 20:43:57 [repl writer worker 8] ERROR: writer worker caught exception: E11000 duplicate key error index: [collection].$id dup key: { : ObjectId('50454763063c0b8da41b0558') } on: { ts: Timestamp 1346717426000|21, h: -2936717105214827886, op: "u", ns: "[collection]", o2: { _id: ObjectId('50454763063c0b8da41b0558'), i: { $size: 1 }}, o: { $push: { i: { a: ObjectId('50427caa083c0b958c01d055'), i: ObjectId('50454807463c0b8da42307ff') }} } } |
| Comment by Braden Evans [ 09/Sep/12 ] |
|
Downgraded to 2.0 and the server came right back up |