[SERVER-12209] _id with $prefix field causes replication failure due to unvalidated insert Created: 27/Dec/13 Updated: 16/Nov/21 Resolved: 06/Feb/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 2.4.8, 2.4.9, 2.5.4 |
| Fix Version/s: | 2.4.11, 2.6.0-rc0 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Shane O'Donnell | Assignee: | Eric Milkie |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Linux ip-10-xxx-xxx-xxx 3.4.37-40.44.amzn1.x86_64 #1 SMP Thu Mar 21 01:17:08 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux |
||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Minor Change | ||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||
| Description |
|
Server died while using Genghis on primary server in two-node replication set. Both running on m1.large instances in EC2. Storage on EBS. Was testing scope of namespace for duplicate _id. Add was allowed and sync killed secondary. Fri Dec 27 01:10:33.835 [initandlisten] MongoDB starting : pid=19511 port=27017 dbpath=/usr/local/var/mongod/db 64-bit host=ip-10-238-205-161 Fri Dec 27 01:10:33.844 [initandlisten] journal dir=/usr/local/var/mongod/db/journal , api_key: "604f5baa-fbc2-453a-ad9d-6e09a7d5e713", billing: {}, customEvents: {}, defaultDS: "ds.4444444.sample.internetTraffic", filters: [ { _id: { $oid: "5283fe3e9d46cd49718b457b" }, days_in_week: {}, desc: "Filter out non-business hours", hours_in_day: [ 17, 19, 20, 21, 22, 23, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ], name: "Business Hours", type: "hourDayMonth", visibility: "account" }, { _id: { $oid: "5283fe3e9d46cd49718b457c" }, days_in_week: [ 0, 6 ], desc: "Filter out any data from Saturday and Sunday.", hours_in_day: {}, name: "No Weekends", type: "hourDayMonth", visibility: "account" } ], knId: "4945496", kn_subscription: { subscriptionType: "", startDate: "", endDate: "", autoRenew: false }, name: "Know Normal, Inc.", nr_account_id: "279085", nr_api_key: "4cdafbfa8a636aefc1be23e0a21bbc027605ee8ed754d4a", nr_data_key: "52086bb8297a0bf139511477b0f77dbb4dee674cd754d4a", nr_event_feed_uri: "/account_feeds/52086bb8297a0bf139511477b0f77dbb4dee674cd754d4a/events.rss", nr_subscription: { annual-renewal-on: { @attributes: { nil: "true" }}, expires-on: { @attributes: { nil: "true" }}, number-of-hosts: "1", starts-on: "2013-03-22", state: "paid", product-name: "Pro" }, owner: "nr@knownormal.com", sources: [ { knColName: "ds.4444444.sample.wx", knDesc: "Raleigh-Durham, North Carolina Hourly Weather for 2012", knCustomName: "Weather Data", knSourceType: "sample", knFieldNames: [ "TmpF", "DPf", "RH", "CC", "Wind" ], knDateField: "Date", knTimeField: "Time", knDateTimeField: -1 }, { knColName: "ds.4444444.sample.internetTraffic", knDesc: "Internet traffic data (in bits) from United Kingdom academic network backbone (19 November 2004 thru 27 January 2005). Hourly data.", knCustomName: "ISP Network Traffic", knSourceType: "sample", knFieldNames: [ "bits" ], knDateField: null, knTimeField: null, knDateTimeField: "Time" }, { knSourceType: "NR", knSourceSubType: "summary", knColName: "ds.kn.279085.summary", knEventSource: "events.4945496.nr.279085", knCustomName: "KN App", knDesc: "", nrAppId: 1368487, nrAppName: "KN App", nrOverviewUrl: "https://rpm.newrelic.com/accounts/279085/applications/1368487", nrServerUrl: "https://rpm.newrelic.com/api/v1/accounts/279085/applications/1368487/servers" }, { knSourceType: "NR", knSourceSubType: "summary", knColName: "ds.4945496.nr.279085.1783546.summary.hour", knEventSource: "events.4945496.nr.279085", knCustomName: "KN App Dev Maury", knDesc: "", nrAppId: 1783546, nrAppName: "KN App Dev Maury", nrOverviewUrl: "https://rpm.newrelic.com/accounts/279085/applications/1783546", nrServerUrl: "https://rpm.newrelic.com/api/v1/accounts/279085/applications/1783546/servers" }, { knSourceType: "NR", knSourceSubType: "summary", knColName: "ds.4945496.nr.279085.1942741.summary.hour", knEventSource: "events.4945496.nr.279085", knCustomName: "My Application", knDesc: "", nrAppId: 1942741, nrAppName: "My Application", nrOverviewUrl: "https://rpm.newrelic.com/accounts/279085/applications/1942741", nrServerUrl: "https://rpm.newrelic.com/api/v1/accounts/279085/applications/1942741/servers" }], users: [ { knId: "4819339", role: "owner" } ] } } ***aborting after fassert() failure Fri Dec 27 01:11:56.583 Got signal: 6 (Aborted). Fri Dec 27 01:11:56.586 Backtrace: |
| Comments |
| Comment by Githook User [ 12/Aug/14 ] | ||||||||||||||||||||||||||
|
Author: {u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}Message: Prohibit _id of type Array or Undefined. | ||||||||||||||||||||||||||
| Comment by Daniel Pasette (Inactive) [ 08/Aug/14 ] | ||||||||||||||||||||||||||
|
milkie, please shepherd this through for 2.4.11 | ||||||||||||||||||||||||||
| Comment by Rafael Gaspar [ 26/Jul/14 ] | ||||||||||||||||||||||||||
|
Just adding that this fix should be backported to 2.4.x, since it's a serious problem that kills all slaves. | ||||||||||||||||||||||||||
| Comment by Githook User [ 06/Feb/14 ] | ||||||||||||||||||||||||||
|
Author: {u'username': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}Message: | ||||||||||||||||||||||||||
| Comment by Thomas Rueckstiess [ 03/Feb/14 ] | ||||||||||||||||||||||||||
|
Eliot, this causes the bug in 2.5.5. Prior to that, _skipValidation wasn't available.
| ||||||||||||||||||||||||||
| Comment by Eliot Horowitz (Inactive) [ 03/Feb/14 ] | ||||||||||||||||||||||||||
|
I figured one out. | ||||||||||||||||||||||||||
| Comment by Eliot Horowitz (Inactive) [ 03/Feb/14 ] | ||||||||||||||||||||||||||
|
Is there a repro without godinsert? | ||||||||||||||||||||||||||
| Comment by Scott Hernandez (Inactive) [ 27/Jan/14 ] | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| Comment by Shane O'Donnell [ 13/Jan/14 ] | ||||||||||||||||||||||||||
|
http://pecl.php.net/package/mongo – | ||||||||||||||||||||||||||
| Comment by Ger Hartnett [ 13/Jan/14 ] | ||||||||||||||||||||||||||
|
AFAIKS the released version of the MongoDB PHP driver is 1.3.0 | ||||||||||||||||||||||||||
| Comment by Shane O'Donnell [ 12/Jan/14 ] | ||||||||||||||||||||||||||
>>> The primary crashed as well on the initial attempt to remove the document (via the mongo shell) The format { "_id" : { "$oid" : "52d0c341330e7982e125ad02" } } >>> Me too! What driver and version are you using? >>> In our current environment, we are accessing data in MongoDB via 4 different mechanisms: >>> 1) PHP driver (currently version 1.4.5 (dev), 1.3.4 & 1.4.2 (prod – now upgraded to 1.4.5), and records in MongoDB could have been installed by almost anything in between) Did you create them yourself through a driver that does not validate for this format? >>> See above. Did you use mongoexport to export your data previously and insert the documents manually again? (mongoimport would convert this format back to a proper ObjectId before inserting). >>> Not that I'm aware of. How many documents with $oid do you have in your collection? If all your other documents in the collection have a proper ObjectId, then you can query for the $oid ones by their type, with use kn1 >>> I can't run a query against it because it's still in ROLLBACK state. I don't know how to clear that. No rollback files were generated. That said, this collection only had six or so documents in it, and I'd looked at it numerous times and only noted the one document with the {$oid} key. In my tests, a repair did not fix or remove the documents. >>> In my test, a repair shrunk the size on disk so markedly (from 168GB to less than 1GB) that I was suspicious that I didn't lose production data, so I restored a backup (taken just prior to the --repair). | ||||||||||||||||||||||||||
| Comment by Thomas Rueckstiess [ 11/Jan/14 ] | ||||||||||||||||||||||||||
|
Hi Shane, I appears that secondaries are failing when trying to replicate documents whose _id field has a key starting with a $. The format
is a representation in our extended JSON format for ObjectIds in "strict" and "json" mode (mongoexport uses this for example to write valid JSON), but it is not a valid ObjectId in the server. Keys should not start with a $ sign, those are reserved operators in MongoDB. We'd like to find out how you ended up with such _id values. What driver and version are you using? Did you create them yourself through a driver that does not validate for this format? Did you use mongoexport to export your data previously and insert the documents manually again? (mongoimport would convert this format back to a proper ObjectId before inserting). How many documents with $oid do you have in your collection? If all your other documents in the collection have a proper ObjectId, then you can query for the $oid ones by their type, with
This query finds all documents with an ObjectId of type 3 (= sub-document), which would only target the problematic ones if the rest have a true ObjectId. If there are many of them and you want to retain the data, we may be able to come up with a script that converts these back to normal ObjectIds (or alternatively just remove them all, if you don't want to keep them). It's likely though that you would have to take your primary down, convert/remove the documents in standalone mode, then bring it back up and re-sync your secondaries afterwards. In my tests, a repair did not fix or remove the documents. Regards, | ||||||||||||||||||||||||||
| Comment by Shane O'Donnell [ 10/Jan/14 ] | ||||||||||||||||||||||||||
|
If I can edit the original ticket, I can't figure out how. Versions should be: Note that I have since upgraded PRIMARY to 2.4.8 as well and have a related-if-not-duplicate bug to file, where bad $oid reference kills PRIMARY and replication then replicates the reference to SECONDARYs and kills them too. |