[SERVER-19361] Insert of document with duplicate _id fields should be forbidden Created: 10/Jul/15  Updated: 08/Apr/23  Resolved: 16/Sep/15

Status: Closed
Project: Core Server
Component/s: Write Ops
Affects Version/s: None
Fix Version/s: 3.0.7, 3.1.9

Type: Bug Priority: Critical - P2
Reporter: David Golden Assignee: YunHe Wang
Resolution: Done Votes: 0
Labels: neweng
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-22327 Replication fails with error on docum... Closed
is duplicated by TOOLS-148 Produce error on duplicate keys at th... Closed
Related
related to SERVER-75879 Upsert permits document to contain mu... Closed
related to SERVER-6439 Duplicate fields at the same level sh... Backlog
Backwards Compatibility: Minor Change
Operating System: ALL
Backport Completed:
Steps To Reproduce:

Construct a BSON document with multiple _id fields and send it to the server.

E.g. with MongoDB Perl driver v0.708.2.0:

#!/usr/bin/env perl
use v5.10;
use strict;
use warnings;
use MongoDB;
use Tie::IxHash;
 
my $client = MongoDB::MongoClient->new;
my $database = $client->get_database('test');
my $collection = $database->get_collection('duptest');
$collection->drop;
 
$collection->insert( Tie::IxHash->new( _id => 1 ) );
$collection->insert( [ _id => 2 ] );

Then look at the test.duptest collection in the mongo shell.

Sprint: Quint 9 09/18/15
Participants:

 Description   

(At behackett's suggestion, I'm opening this separate from SERVER-6439 to highlight the impact on _id, specifically.)

A bug in the Perl driver revealed that it's possible to insert a document with duplicate/multiple _id fields.

> db.duptest.find()
{ "_id" : NumberLong(1), "_id" : NumberLong(1) }
{ "_id" : NumberLong(2), "_id" : NumberLong(2) }

I reproduced this against 3.1.5, 3.0.4, 2.6.10, and 2.4.14.



 Comments   
Comment by Githook User [ 16/Sep/15 ]

Author:

{u'username': u'yhjw88', u'name': u'Yunhe (John) Wang', u'email': u'yunhe.wang@mongodb.com'}

Message: SERVER-19361 Prevent inserting of doc with more than one _id field

(cherry picked from commit 61baf779dacca5ae17ac3ea1c3338b039824c280)
Branch: v3.0
https://github.com/mongodb/mongo/commit/ef78d1b66f3f88ce4b68ea7c93496c2287341f98

Comment by Githook User [ 16/Sep/15 ]

Author:

{u'name': u'Yunhe (John) Wang', u'email': u'yunhe.wang@mongodb.com'}

Message: SERVER-19361 Prevent inserting of doc with more than one _id field
Branch: master
https://github.com/mongodb/mongo/commit/61baf779dacca5ae17ac3ea1c3338b039824c280

Comment by YunHe Wang [ 14/Sep/15 ]

New behavior:

zambonis-MacBook-Pro:test zamboni$ mongorestore dump
2015-09-14T10:15:29.864-0400	building a list of dbs and collections to restore from dump dir
2015-09-14T10:15:29.865-0400	reading metadata file from dump/test/duplicate.metadata.json
2015-09-14T10:15:29.866-0400	restoring test.duplicate from file dump/test/duplicate.bson
2015-09-14T10:15:29.867-0400	error: can't have multiple _id fields in one document
2015-09-14T10:15:29.867-0400	restoring indexes for collection test.duplicate from metadata
2015-09-14T10:15:29.867-0400	finished restoring test.duplicate (2 documents)
2015-09-14T10:15:29.867-0400	done
zambonis-MacBook-Pro:test zamboni$ mongo
MongoDB shell version: 3.0.6
connecting to: test
Server has startup warnings: 
2015-09-14T10:14:35.139-0400 I CONTROL  [initandlisten] 
2015-09-14T10:14:35.139-0400 I CONTROL  [initandlisten] ** NOTE: This is a development version (3.1.8-pre-) of MongoDB.
2015-09-14T10:14:35.139-0400 I CONTROL  [initandlisten] **       Not recommended for production.
2015-09-14T10:14:35.139-0400 I CONTROL  [initandlisten] 
2015-09-14T10:14:35.139-0400 I CONTROL  [initandlisten] 
2015-09-14T10:14:35.139-0400 I CONTROL  [initandlisten] ** WARNING: soft rlimits too low. Number of files is 256, should be at least 1000
> db.duplicate.find()
> exit
bye

Comment by J Rassi [ 10/Jul/15 ]

Ah, well in that case, I can confirm this as a legitimate bug separate from SERVER-6439.

After running a couple of test scenarios, I've determined that documents with duplicate _id fields only replicate one copy of the field (and that documents with other duplicate fields do not suffer from this issue). Scott suggests that SERVER-18982 may resolve this issue. However, I think that we've gathered enough evidence of problems in this area to warrant a fix where these inserts are rejected by the write path.

Updating issue title and moving issue to the "Needs Triage" state. We'll schedule this at our next triage meeting. Thanks again, folks.

Comment by Bernie Hackett [ 10/Jul/15 ]

Can you use the insert command directly to reproduce?

Comment by David Golden [ 10/Jul/15 ]

2015-07-10T17:14:27.401-0400 D -        [repl writer worker 15] User Assertion: 13596:in Collection::updateDocument _id mismatch
2015-07-10T17:14:27.401-0400 E REPL     [repl writer worker 15] writer worker caught exception:  :: caused by :: 13596 in Collection::updateDocument _id mismatch on: { ts: Timestamp 1436562867000|1, h: -4053498944266186361, v: 2, op: "u", ns: "test.duptest", o2: { _id: 1 }, o: { $unset: { _id: true } } }
2015-07-10T17:14:27.401-0400 I -        [repl writer worker 15] Fatal Assertion 16360
2015-07-10T17:14:27.401-0400 I -        [repl writer worker 15]

I can't test this for other fields, as the Perl bug was specific to _id.

Comment by J Rassi [ 10/Jul/15 ]

David, could post the log from the failed secondary? Also, I suspect that the same behavior will be exhibited when the duplicated field has a different name than "_id"; would you mind confirming this?

Comment by David Golden [ 10/Jul/15 ]

With a replica set, the duplicate _id doesn't replicate and then unsetting the _id field cleans up the duplicate on the primary and crashes the secondary. Here's an example with MongoDB 3.0.4. I used the repro Perl code above to create the duplicate fields and then did the rest with the shell. The replica set had one primary, one secondary and one arbiter.

On the primary:

foo:PRIMARY> db.duptest.find()
{ "_id" : NumberLong(1), "_id" : NumberLong(1) }
{ "_id" : NumberLong(2), "_id" : NumberLong(2) }

On the secondary:

foo:SECONDARY> db.duptest.find()
{ "_id" : NumberLong(1) }
{ "_id" : NumberLong(2) }

Unsetting the _id on the primary:

foo:PRIMARY> db.duptest.update({_id:1},{$unset:{_id:1}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
foo:PRIMARY> db.duptest.find()
{ "_id" : NumberLong(1) }
{ "_id" : NumberLong(2), "_id" : NumberLong(2) }

Secondary crashed:

foo:PRIMARY> rs.status()
{
        "set" : "foo",
        "date" : ISODate("2015-07-10T20:50:52.665Z"),
        "myState" : 1,
        "members" : [
                {
                        "_id" : 0,
                        "name" : "metis.local:50037",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 176,
                        "optime" : Timestamp(1436561419, 1),
                        "optimeDate" : ISODate("2015-07-10T20:50:19Z"),
                        "electionTime" : Timestamp(1436561280, 1),
                        "electionDate" : ISODate("2015-07-10T20:48:00Z"),
                        "configVersion" : 1,
                        "self" : true
                },
                {
                        "_id" : 1,
                        "name" : "metis.local:51188",
                        "health" : 0,
                        "state" : 8,
                        "stateStr" : "(not reachable/healthy)",
                        "uptime" : 0,
                        "optime" : Timestamp(0, 0),
                        "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
                        "lastHeartbeat" : ISODate("2015-07-10T20:50:52.411Z"),
                        "lastHeartbeatRecv" : ISODate("2015-07-10T20:50:18.344Z"),
                        "pingMs" : 0,
                        "lastHeartbeatMessage" : "Failed attempt to connect to metis.l
ocal:51188; couldn't connect to server metis.local:51188 (192.168.168.1), connection a
ttempt failed",
                        "configVersion" : -1
                },
                {
                        "_id" : 2,
                        "name" : "metis.local:51261",
                        "health" : 1,
                        "state" : 7,
                        "stateStr" : "ARBITER",
                        "uptime" : 175,
                        "lastHeartbeat" : ISODate("2015-07-10T20:50:52.409Z"),
                        "lastHeartbeatRecv" : ISODate("2015-07-10T20:50:52.409Z"),
                        "pingMs" : 0,
                        "configVersion" : 1
                }
        ],
        "ok" : 1
}

Comment by Bernie Hackett [ 10/Jul/15 ]

There are definitely replication implications for not having an _id at all. I don't know what the implications are for having multiple _id fields. The fact is, MongoDB gives special treatment in various ways to _id. That might make this a distinct issue from having duplicates of any other arbitrary field.

Comment by J Rassi [ 10/Jul/15 ]

From the server's perspective, are the consequences of inserting a document with duplicate _id fields any different than the consequence of inserting a document with some other duplicate field (for example, do updates on these documents replicate correctly)? If not, I'd be inclined to close this as a dup of SERVER-6439.

Generated at Thu Feb 08 03:50:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.