[SERVER-8663] Slow performance on duplicate key exception Created: 22/Feb/13  Updated: 14/May/15  Resolved: 14/May/15

Status: Closed
Project: Core Server
Component/s: Performance
Affects Version/s: 2.2.2
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Antoine Girbal Assignee: Mathias Stearn
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PDF File dups.pdf     Text File stacks.txt    
Participants:

 Description   

While doing testing with C++ driver, I noticed that when all
keys were duplicates the performance was way slower.
Doesnt make sense since it's not supposed to do work in that case...

I tried with the very simple loop in js, and gives even worse result:

var start = new Date();
for (var count = 0; count < 1000000; ++count) {
db.foo.insert({_id: count});
}
print(new Date() - start);

Collection is empty:
antoine@ag410:~/Downloads/mongodb-linux-x86_64-2.2.2$ ./bin/mongo
~/adobe/testwrites.js
MongoDB shell version: 2.2.2
connecting to: test
12527

2nd run without dropping:
antoine@ag410:~/Downloads/mongodb-linux-x86_64-2.2.2$ ./bin/mongo
~/adobe/testwrites.js
MongoDB shell version: 2.2.2
connecting to: test
78153

this is reproducible always..



 Comments   
Comment by Mathias Stearn [ 22/Feb/13 ]

antoine, That case only comes up in the rare case that we can't do two-phase indexing. In the normal case we never need to roll-back the insert since it hasn't occured yet (we check if the insert will work before doing it).

My profiling suggests that all of the overhead is due to exception-unwind overhead. Most C++ compliers use "zero-cost" exceptions which makes it free to enter try-block and move all of the work of figuring out how to handle the exception into the throw case. This extra work is why the dup-key case is slower than then clean insert case. We may be able to optimize some of this away, but I don't know if it is worth it as I'm able to do ~20k dup-key inserts per second on my machine. Anything even approaching that number is likely to be a serious bug in user code anyway.

Comment by Antoine Girbal [ 22/Feb/13 ]

I've tried to profile with gprof but mongod keeps hanging after a few
thousand records when testing duplicate keys.
Attached are the stack traces of the hung process as well as the pdf
result for a few records worth.

Looking at our code a bit, I see on dup key we do this:

// normal case – we can roll back
_deleteRecord(d, ns, r, loc);
throw;

Is it expected to be slower, since we actually insert then remove the
record on dup key?
I guess we need to insert first in order to get a document location
for the index to use..

Comment by Antoine Girbal [ 22/Feb/13 ]

Reproduced with 2.0 and 2.2.
I tried doing upsert instead, it's a bit faster, but still x2 slower
than straight inserts
Rates / s below

straight inserts (no preexisting):
20095: aggregate rate = 16902 17273 17236 16697 16437 16788

inserts all dups:
20195: aggregate rate = 5804 7951 5793 5506 5536 4993

upserts:
20061: aggregate rate = 9541 10811 8811 8852 6541 8242 8125

Generated at Thu Feb 08 03:18:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.