[SERVER-12209] _id with $prefix field causes replication failure due to unvalidated insert Created: 27/Dec/13  Updated: 16/Nov/21  Resolved: 06/Feb/14

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.4.8, 2.4.9, 2.5.4
Fix Version/s: 2.4.11, 2.6.0-rc0

Type: Bug Priority: Critical - P2
Reporter: Shane O'Donnell Assignee: Eric Milkie
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux ip-10-xxx-xxx-xxx 3.4.37-40.44.amzn1.x86_64 #1 SMP Thu Mar 21 01:17:08 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux


Issue Links:
Depends
Duplicate
is duplicated by SERVER-12788 ObjectID error in replicaset Closed
Related
related to DRIVERS-133 Drivers should not allow sending _ids... Closed
related to PYTHON-623 Python driver should not allow sendin... Closed
is related to SERVER-18938 Inserting $id will crash Secondaries Closed
Tested
Backwards Compatibility: Minor Change
Operating System: ALL
Participants:

 Description   

Server died while using Genghis on primary server in two-node replication set. Both running on m1.large instances in EC2. Storage on EBS.

Was testing scope of namespace for duplicate _id. Add was allowed and sync killed secondary.

Fri Dec 27 01:10:33.835 [initandlisten] MongoDB starting : pid=19511 port=27017 dbpath=/usr/local/var/mongod/db 64-bit host=ip-10-238-205-161
Fri Dec 27 01:10:33.836 [initandlisten] db version v2.4.6
Fri Dec 27 01:10:33.836 [initandlisten] git version: b9925db5eac369d77a3a5f5d98a145eaaacd9673
Fri Dec 27 01:10:33.836 [initandlisten] build info: Linux ip-10-2-29-40 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_49
Fri Dec 27 01:10:33.836 [initandlisten] allocator: tcmalloc
Fri Dec 27 01:10:33.836 [initandlisten] options:

{ bind_ip: "0.0.0.0", config: "/etc/mongod.conf", dbpath: "/usr/local/var/mongod/db", fork: "true", logappend: "true", logpath: "/log/mongodb-2.log", replSet: "knrs" }

Fri Dec 27 01:10:33.844 [initandlisten] journal dir=/usr/local/var/mongod/db/journal
Fri Dec 27 01:10:33.844 [initandlisten] recover begin
Fri Dec 27 01:10:33.844 [initandlisten] recover lsn: 0
Fri Dec 27 01:10:33.844 [initandlisten] recover /usr/local/var/mongod/db/journal/j._0
Fri Dec 27 01:10:33.852 [initandlisten] recover cleaning up
Fri Dec 27 01:10:33.852 [initandlisten] removeJournalFiles
Fri Dec 27 01:10:34.132 [initandlisten] recover done
Fri Dec 27 01:10:34.132 [initandlisten] preallocating a journal file /usr/local/var/mongod/db/journal/prealloc.0
Fri Dec 27 01:11:49.551 [initandlisten] waiting for connections on port 27017
Fri Dec 27 01:11:49.552 [websvr] admin web console waiting for connections on port 28017
Fri Dec 27 01:11:49.562 [rsStart] replSet I am ec2-54-227-255-57.compute-1.amazonaws.com:27017
Fri Dec 27 01:11:49.562 [rsStart] replSet STARTUP2
Fri Dec 27 01:11:49.564 [rsMgr] replSet total number of votes is even - add arbiter or give one member an extra vote
Fri Dec 27 01:11:49.565 [rsHealthPoll] replset info ec2-54-227-252-146.compute-1.amazonaws.com:27017 thinks that we are down
Fri Dec 27 01:11:49.565 [rsHealthPoll] replSet member ec2-54-227-252-146.compute-1.amazonaws.com:27017 is up
Fri Dec 27 01:11:49.565 [rsHealthPoll] replSet member ec2-54-227-252-146.compute-1.amazonaws.com:27017 is now in state SECONDARY
Fri Dec 27 01:11:50.564 [rsSync] replSet still syncing, not yet to minValid optime 52bcad0e:1
Fri Dec 27 01:11:50.857 [initandlisten] connection accepted from 10.138.56.84:41413 #1 (1 connection now open)
Fri Dec 27 01:11:50.857 [conn1] end connection 10.138.56.84:41413 (0 connections now open)
Fri Dec 27 01:11:50.858 [initandlisten] connection accepted from 10.138.56.84:41414 #2 (1 connection now open)
Fri Dec 27 01:11:56.563 [rsBackgroundSync] replSet syncing to: ec2-54-227-252-146.compute-1.amazonaws.com:27017
Fri Dec 27 01:11:56.565 [rsSync] replSet still syncing, not yet to minValid optime 52bcad0e:1
Fri Dec 27 01:11:56.579 [repl writer worker 1] ERROR: writer worker caught exception: invalid operator: $oid on: { ts: Timestamp 1388096782000|1, h: -6012491477194752701, v: 2, op: "i", ns: "kn1.accounts", o: { _id:

{ $oid: "5283fe3ea782c9aff9d96654" }

, api_key: "604f5baa-fbc2-453a-ad9d-6e09a7d5e713", billing: {}, customEvents: {}, defaultDS: "ds.4444444.sample.internetTraffic", filters: [ { _id:

{ $oid: "5283fe3e9d46cd49718b457b" }

, days_in_week: {}, desc: "Filter out non-business hours", hours_in_day: [ 17, 19, 20, 21, 22, 23, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ], name: "Business Hours", type: "hourDayMonth", visibility: "account" }, { _id:

{ $oid: "5283fe3e9d46cd49718b457c" }

, days_in_week: [ 0, 6 ], desc: "Filter out any data from Saturday and Sunday.", hours_in_day: {}, name: "No Weekends", type: "hourDayMonth", visibility: "account" } ], knId: "4945496", kn_subscription:

{ subscriptionType: "", startDate: "", endDate: "", autoRenew: false }

, name: "Know Normal, Inc.", nr_account_id: "279085", nr_api_key: "4cdafbfa8a636aefc1be23e0a21bbc027605ee8ed754d4a", nr_data_key: "52086bb8297a0bf139511477b0f77dbb4dee674cd754d4a", nr_event_feed_uri: "/account_feeds/52086bb8297a0bf139511477b0f77dbb4dee674cd754d4a/events.rss", nr_subscription: { annual-renewal-on: { @attributes:

{ nil: "true" }

}, expires-on: { @attributes:

{ nil: "true" }

}, number-of-hosts: "1", starts-on: "2013-03-22", state: "paid", product-name: "Pro" }, owner: "nr@knownormal.com", sources: [

{ knColName: "ds.4444444.sample.wx", knDesc: "Raleigh-Durham, North Carolina Hourly Weather for 2012", knCustomName: "Weather Data", knSourceType: "sample", knFieldNames: [ "TmpF", "DPf", "RH", "CC", "Wind" ], knDateField: "Date", knTimeField: "Time", knDateTimeField: -1 }

,

{ knColName: "ds.4444444.sample.internetTraffic", knDesc: "Internet traffic data (in bits) from United Kingdom academic network backbone (19 November 2004 thru 27 January 2005). Hourly data.", knCustomName: "ISP Network Traffic", knSourceType: "sample", knFieldNames: [ "bits" ], knDateField: null, knTimeField: null, knDateTimeField: "Time" }

,

{ knSourceType: "NR", knSourceSubType: "summary", knColName: "ds.kn.279085.summary", knEventSource: "events.4945496.nr.279085", knCustomName: "KN App", knDesc: "", nrAppId: 1368487, nrAppName: "KN App", nrOverviewUrl: "https://rpm.newrelic.com/accounts/279085/applications/1368487", nrServerUrl: "https://rpm.newrelic.com/api/v1/accounts/279085/applications/1368487/servers" }

,

{ knSourceType: "NR", knSourceSubType: "summary", knColName: "ds.4945496.nr.279085.1783546.summary.hour", knEventSource: "events.4945496.nr.279085", knCustomName: "KN App Dev Maury", knDesc: "", nrAppId: 1783546, nrAppName: "KN App Dev Maury", nrOverviewUrl: "https://rpm.newrelic.com/accounts/279085/applications/1783546", nrServerUrl: "https://rpm.newrelic.com/api/v1/accounts/279085/applications/1783546/servers" }

,

{ knSourceType: "NR", knSourceSubType: "summary", knColName: "ds.4945496.nr.279085.1942741.summary.hour", knEventSource: "events.4945496.nr.279085", knCustomName: "My Application", knDesc: "", nrAppId: 1942741, nrAppName: "My Application", nrOverviewUrl: "https://rpm.newrelic.com/accounts/279085/applications/1942741", nrServerUrl: "https://rpm.newrelic.com/api/v1/accounts/279085/applications/1942741/servers" }

], users: [

{ knId: "4819339", role: "owner" }

] } }
Fri Dec 27 01:11:56.579 [repl writer worker 1] Fatal Assertion 16360
0xdddd81 0xd9dc13 0xc26bfc 0xdab721 0xe26609 0x7fb086d29c6b 0x7fb0860cf5ed
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdddd81]
/usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xa3) [0xd9dc13]
/usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x12c) [0xc26bfc]
/usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x281) [0xdab721]
/usr/bin/mongod() [0xe26609]
/lib64/libpthread.so.0(+0x7c6b) [0x7fb086d29c6b]
/lib64/libc.so.6(clone+0x6d) [0x7fb0860cf5ed]
Fri Dec 27 01:11:56.583 [repl writer worker 1]

***aborting after fassert() failure

Fri Dec 27 01:11:56.583 Got signal: 6 (Aborted).

Fri Dec 27 01:11:56.586 Backtrace:
0xdddd81 0x6d0d29 0x7fb0860219c0 0x7fb086021945 0x7fb08602325b 0xd9dc4e 0xc26bfc 0xdab721 0xe26609 0x7fb086d29c6b 0x7fb0860cf5ed
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdddd81]
/usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x6d0d29]
/lib64/libc.so.6(+0x349c0) [0x7fb0860219c0]
/lib64/libc.so.6(gsignal+0x35) [0x7fb086021945]
/lib64/libc.so.6(abort+0x17b) [0x7fb08602325b]
/usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xde) [0xd9dc4e]
/usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x12c) [0xc26bfc]
/usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x281) [0xdab721]
/usr/bin/mongod() [0xe26609]
/lib64/libpthread.so.0(+0x7c6b) [0x7fb086d29c6b]
/lib64/libc.so.6(clone+0x6d) [0x7fb0860cf5ed]



 Comments   
Comment by Githook User [ 12/Aug/14 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-12209 _id prohibitions

Prohibit _id of type Array or Undefined.
Also, if _id is an Object, recursively check the object for $fieldnames.
Branch: v2.4
https://github.com/mongodb/mongo/commit/91e1ec144205e140272b2ed9560accf634c4e93b

Comment by Daniel Pasette (Inactive) [ 08/Aug/14 ]

milkie, please shepherd this through for 2.4.11

Comment by Rafael Gaspar [ 26/Jul/14 ]

Just adding that this fix should be backported to 2.4.x, since it's a serious problem that kills all slaves.

Comment by Githook User [ 06/Feb/14 ]

Author:

{u'username': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}

Message: SERVER-12209: check _id for $fields at insert time
Branch: master
https://github.com/mongodb/mongo/commit/34d899304d971942e6ce4743489495d11a4d48c1

Comment by Thomas Rueckstiess [ 03/Feb/14 ]

Eliot, this causes the bug in 2.5.5. Prior to that, _skipValidation wasn't available.

// start replica set
var rs = new ReplSetTest({name: "insert_dollar_oid", nodes: 3})
rs.startSet()
rs.initiate()
rs.awaitReplication()
 
var conn = rs.getMaster()
var db = conn.getDB('test')
 
// disable shell validation
db._mongo._skipValidation = true
 
// insert bad document
db['coll'].insert({_id: {$oid: 1}})
 
// throws Fatal Assertion 16360 on secondaries
rs.awaitReplication()

Comment by Eliot Horowitz (Inactive) [ 03/Feb/14 ]

I figured one out.

Comment by Eliot Horowitz (Inactive) [ 03/Feb/14 ]

Is there a repro without godinsert?

Comment by Scott Hernandez (Inactive) [ 27/Jan/14 ]

// This insert works, which causes the problem below
> db.runCommand({godinsert:"u", obj:{_id:{$oid:1}}})
{ "ok" : 1 }
 
// Which turns into an update with replication
> db.u.update({_id:{$oid:1}}, {_id:{$oid:1}}, true)
SingleWriteResult({
        "writeErrors" : [
...                     "errmsg" : "could not canonicalize query { _id: { $oid: 1.0 } };  :: caused by :: unknown operator: $oid",
                        "op" : {
                                "q" : {
                                        "_id" : {
                                                "$oid" : 1
                                        }
                                },
                                "u" : {
                                        "a" : 1
                                },
                                "multi" : false,
                                "upsert" : false
                        }
...
// 2.4 has this error msg 
{ ...
"err" : "invalid operator: $oid",
... }

Comment by Shane O'Donnell [ 13/Jan/14 ]

http://pecl.php.net/package/mongo


Shane O.
========================
Shane O'Donnell
Know Normal, Inc.
cell: +1.919.323.5380
skype: shaneodonnell
email: shaneo@knownormal.com
========================
:wq!

Comment by Ger Hartnett [ 13/Jan/14 ]

AFAIKS the released version of the MongoDB PHP driver is 1.3.0
http://ie2.php.net/manual/en/changelog.mongo.php

Comment by Shane O'Donnell [ 12/Jan/14 ]
      • RESPONSES INLINE WITH >>> ***
        I appears that secondaries are failing when trying to replicate documents whose _id field has a key starting with a $.

>>> The primary crashed as well on the initial attempt to remove the document (via the mongo shell)

The format { "_id" :

{ "$oid" : "52d0c341330e7982e125ad02" }

}
is a representation in our extended JSON format for ObjectIds in "strict" and "json" mode (mongoexport uses this for example to write valid JSON), but it is not a valid ObjectId in the server. Keys should not start with a $ sign, those are reserved operators in MongoDB.
We'd like to find out how you ended up with such _id values.

>>> Me too!

What driver and version are you using?

>>> In our current environment, we are accessing data in MongoDB via 4 different mechanisms:

>>> 1) PHP driver (currently version 1.4.5 (dev), 1.3.4 & 1.4.2 (prod – now upgraded to 1.4.5), and records in MongoDB could have been installed by almost anything in between)
>>> 2) Java driver 2.11.0 (now, but previous versions were definitely used).
>>> 3) mongo shell
>>> 4) Genghis

Did you create them yourself through a driver that does not validate for this format?

>>> See above.

Did you use mongoexport to export your data previously and insert the documents manually again? (mongoimport would convert this format back to a proper ObjectId before inserting).

>>> Not that I'm aware of.

How many documents with $oid do you have in your collection? If all your other documents in the collection have a proper ObjectId, then you can query for the $oid ones by their type, with

use kn1
db.accounts.find({_id: {$type: 3}}).count()
This query finds all documents with an ObjectId of type 3 (= sub-document), which would only target the problematic ones if the rest have a true ObjectId. If there are many of them and you want to retain the data, we may be able to come up with a script that converts these back to normal ObjectIds (or alternatively just remove them all, if you don't want to keep them). It's likely though that you would have to take your primary down, convert/remove the documents in standalone mode, then bring it back up and re-sync your secondaries afterwards.

>>> I can't run a query against it because it's still in ROLLBACK state. I don't know how to clear that. No rollback files were generated. That said, this collection only had six or so documents in it, and I'd looked at it numerous times and only noted the one document with the {$oid} key.

In my tests, a repair did not fix or remove the documents.

>>> In my test, a repair shrunk the size on disk so markedly (from 168GB to less than 1GB) that I was suspicious that I didn't lose production data, so I restored a backup (taken just prior to the --repair).

Comment by Thomas Rueckstiess [ 11/Jan/14 ]

Hi Shane,

I appears that secondaries are failing when trying to replicate documents whose _id field has a key starting with a $. The format

{ "_id" : { "$oid" : "52d0c341330e7982e125ad02" } }

is a representation in our extended JSON format for ObjectIds in "strict" and "json" mode (mongoexport uses this for example to write valid JSON), but it is not a valid ObjectId in the server. Keys should not start with a $ sign, those are reserved operators in MongoDB.

We'd like to find out how you ended up with such _id values. What driver and version are you using? Did you create them yourself through a driver that does not validate for this format? Did you use mongoexport to export your data previously and insert the documents manually again? (mongoimport would convert this format back to a proper ObjectId before inserting).

How many documents with $oid do you have in your collection? If all your other documents in the collection have a proper ObjectId, then you can query for the $oid ones by their type, with

use kn1
db.accounts.find({_id: {$type: 3}}).count()

This query finds all documents with an ObjectId of type 3 (= sub-document), which would only target the problematic ones if the rest have a true ObjectId. If there are many of them and you want to retain the data, we may be able to come up with a script that converts these back to normal ObjectIds (or alternatively just remove them all, if you don't want to keep them). It's likely though that you would have to take your primary down, convert/remove the documents in standalone mode, then bring it back up and re-sync your secondaries afterwards.

In my tests, a repair did not fix or remove the documents.

Regards,
Thomas

Comment by Shane O'Donnell [ 10/Jan/14 ]

If I can edit the original ticket, I can't figure out how.

Versions should be:
PRIMARY: 2.4.6
SECONDARY: 2.4.8
SECONDARY: 2.4.8

Note that I have since upgraded PRIMARY to 2.4.8 as well and have a related-if-not-duplicate bug to file, where bad $oid reference kills PRIMARY and replication then replicates the reference to SECONDARYs and kills them too.

Generated at Thu Feb 08 03:27:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.