[SERVER-31694] 17% throughput regression in insert workload Created: 24/Oct/17 Updated: 06/Dec/22 Resolved: 28/Oct/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage, Write Ops |
| Affects Version/s: | 3.6.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Bruce Lucas (Inactive) | Assignee: | Backlog - Replication Team |
| Resolution: | Incomplete | Votes: | 3 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Storage 2017-11-13, Storage 2017-12-04, Storage 2018-01-29, Repl 2018-02-26, Repl 2018-03-12, Repl 2018-03-26 | ||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Simple insert workload:
standalone shows ~11% throughput reduction
1-node replica set shows ~50% throughput reduction
Metrics for the 1-node replica set case suggest issue may be related to large number of write-lock calls, about 3 per insert:
|
| Comments |
| Comment by Judah Schvimer [ 28/Oct/22 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
This regression is between two EOL versions of MongoDB, and the Server has diverged significantly since this occurred. Bulk insert performance continues to be something we track and aim to improve over time. Future performance improvements will be made using new analyses on active versions of the Server. Closing this as "Incomplete". | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andy Schwerin [ 20/Mar/19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Great analysis, bartle. setMyLastAppliedOpTimeForward and more generally the replication coordinator mutex are known contention points in the system, today. That said, I don't see a ticket specifically tracking it, and I think this ticket actually covers a separate case involving w:1 bulk insert performance. I have filed CC alyson.cabral and sheeri.cabral | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Bartley [ 20/Mar/19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Are there plans to improve performance of setMyLastAppliedOpTimeForward? On a write-majority, insert-heavy workflow we basically see single-core contention on setMyLastAppliedOpTimeForward (based on a CPU profile). That particular function takes an exclusive mutex, so it's unsurprising that if you push enough write-majority writes through you'd contend on a single core (in practice we're hitting a bottleneck of 12k wps, on a 16-core machine, with ~7% CPU usage). Ultimately. all of the CPU ends up in _wakeReadyWaiters_inlock. That particular implementation seems rather naive; it ends up recomputing a bunch of things (again, under a global, exclusive lock) for every replication waiter. Instead, it seems like you should structure this code such that it determines the largest optime that satisfies the various write concern modes (basically "majority" and w="N") once, and then pass that information down into _doneWaitingForReplication_inlock. Beyond this, reading through the code, it's fairly concerning how coarse-grained _mutex on ReplicationCoordinatorImpl is. Is there a reason more work hasn't been invested in finer-grained locks, or even reader-writer locks? As-is, it's really difficult to make any perf improvements. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by VictorGP [ 08/Jun/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Yes, this is our case, i guess upserts are also affected. We have a heavy write pattern so bulk inserts are needed. Isn't this usage of bulk inserts a common operation among MongoDB users? I hardly believe most users will use simple inserts when they have the option of doing them bulked.
| ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andy Schwerin [ 08/Jun/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
victorgp and stutiredboy@gmail.com, this ticket tracks a very particular performance regression when clients are using the Bulk.insert or related driver functionality to efficiently insert a large volume of documents. Is this the action you're performing when you experience your performance problem? If not, I would suggest filing another ticket. As for this particular performance regression, we hope to address it this year. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Adun [ 07/Jun/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Seconded. The declining performance is a big problem for us, We have already decided to give up the 3.6.x version but staying on 3.4. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by VictorGP [ 06/Jun/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Can we get an update on how this is going and when is it going to be released? Such a performance impact in a new MongoDB version should be a big concern. This was reported in October 2017, it has Major priority but after ~8months there is not a fix yet, and that is a bit concerning. In our case, we can't upgrade our infrastructure to 3.6, the performance degradation is too big for an optimal database. Also, last comment is from 1st Feb, hope this was not forgotten... | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andy Schwerin [ 01/Feb/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Moving to "Needs Triage" so replication team can assess when/how to schedule this. Given that it was in 3.7 Required, I expect it'll be going back to 3.7 Required as part of the re-triage. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Susan LoVerso [ 01/Feb/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
I have made changes in WiredTiger develop to alleviate blocking on the timestamp queue locks. In WiredTiger-only timestamp tests I see 12% improvement. When I run that against this workload I see no improvement. Pmp is still showing a lot of contention and I frequently see the thread holding the lock in the now call, presumably gettimeofday. I don't believe WiredTiger is the underlying cause and reducing the lock contention in repl::getNextOpTimes and ReplicationCoordinatorImpl::setMyLastAppliedOpTimeForward seems to be the next step. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Susan LoVerso [ 18/Jan/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Running the mongodb 3.6 branch against the WiredTiger 3.8 branch shows a small, but very consistent 3% degradation in the insert workload (about 96.7k) compared to using the WiredTiger 3.6 branch. I am going to do some bisecting to see if it is a single change that accounts for it. ETA: The change that causes the degradation was the addition of lock tracking statistics around the three rwlocks in WiredTiger in the transaction subsystem: the global rwlock, and the two rwlocks around the two timestamp queues. Those changes were added in If I turn off tracking on those three locks, then the WiredTiger 3.8 branch is on par with the WiredTiger 3.6 branch when run with the MongoDB 3.6 branch. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Susan LoVerso [ 17/Jan/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Since this is back on my plate I did baseline runs to see what the current situation is. Given above discussion, I ran on 3.4 with majority read concern turned on. Here are my results (avg of 3 runs):
So the 3.6 performance has not changed since my last runs back in November. I ran pmp on several runs to see if any patterns emerge from the stacks. I continue to see lots of contention in setMyLastAppliedOpTimeForward and getNextOpTime. In particular, in every run I see the thread holding the lock from setMyLastAppliedOpTimeForward in _replExecutor->now(). Here's a good illustration, as the test is using insert 15 threads and here are 14 of them:
In another run, I also see some contention on the WiredTiger commit timestamp queue write lock. Eight threads are waiting on the mutex in getNextOpTimes and the thread holding it has the commit timestamp queue lock but other threads in WiredTiger are blocked on that lock:
| ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 11/Dec/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Situation on a 2-node repl set is similar:
| ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 11/Dec/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
In the test on this ticket enabling majority read concern in 3.4.10 does decrease performance some, by about 5%, but not by the 17% decrease seen at the default settings in 3.6.0.
| ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Geert Bosch [ 11/Dec/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
The picture changes a bit when you compare with 3.4 with majority reads enabled. IIRC, we're about at parity there. Of course, we now have enabled something by default that users may care less about than 17% perf difference. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 11/Dec/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Status update: 3.6.0 is much improved compared to the test on 3.6.0-rc0 in the original description, but still shows ~17% performance regression compared to 3.4.10:
| ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Michael Cahill (Inactive) [ 28/Nov/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Reopening – since this workload isn't automated (yet), we need to keep this ticket open to make sure we drive down the remaining regression. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eric Milkie [ 28/Nov/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
I'm closing this ticket now, as we have no remaining immediate server changes planned to improve this general performance issue. We will continue working on the individual linked BF tickets, and such work may spawn new SERVER tickets if the work is to be done in the server. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Susan LoVerso [ 27/Nov/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
I have run with the current mongodb master as of this morning (61122064) and WT develop (00374e0c7) and I'm seeing a solid 10%+ improvement from last week. These results are from a single-node replica set:
Running pmp (twice) on the current code during the insert test now shows consistently about 5 insert threads waiting on the mutex in getNextOpTimes, 1 or 2 other insert threads waiting on the ReplicationCoordinator mutex and all the other threads doing real work. That is a big improvement! | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eric Milkie [ 21/Nov/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
I've been working on making the calls to _setStableTimestampForStorage_inlock() more efficient, on the primary write path, but am seeing mixed results. In some cases it performs better and in some cases, especially when there is cache pressure, the performance is much worse. I'm continuing to investigate. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Susan LoVerso [ 21/Nov/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
That is correct. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Michael Cahill (Inactive) [ 21/Nov/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
sue.loverso, just to make sure we're on the same page: MongoDB 3.6 is not setting WiredTiger's stable_timestamp at all – all checkpoints write with a current transaction snapshot and no read timestamp. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Susan LoVerso [ 20/Nov/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
Internally to WiredTiger the stable timestamp is meaningful on every machine as it is a visibility mechanism for checkpoints and eviction. So it must be updated frequently enough on a primary as well. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Geert Bosch [ 17/Nov/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
There are various slowdowns, but the one that stands out is ReplicationCoordinatorImpl::_setStableTimestampForStorage_inlock, as it is quite expensive (about 20% as expensive as _insertRecords, which inserts and indexes the actual data), but executes inside of the ReplicationCoordinator mutex. If I understand correctly, the stable timestamp is not meaningful when we're primary, as any majority confirmed timestamp is stable. When we're secondary, we only try to advance the stable timestamp once per batch, so it is not performance critical. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Susan LoVerso [ 13/Nov/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
I reran with the latest changes in mongodb master from several tickets and the latest drop of WiredTiger. Running changeset 9fbb13033 and using the internal wiredtiger code I see the same performance gains that we saw with the hacked changes, about 91K ops/sec. Running pmp shows 13 of 15 workload threads in one stack:
| ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eric Milkie [ 10/Nov/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
I linked two more tickets to help improve performance. After these tickets go in, we should reevaluate the throughput and then decide if we need to do more. With the improvements made thus far, I don't think this needs to block the next rc. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Michael Cahill (Inactive) [ 10/Nov/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
milkie, is there anything to do in this ticket? Now that we have | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eric Milkie [ 09/Nov/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
I filed | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eric Milkie [ 09/Nov/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
I have now pushed two commits into the master branch that should have a positive effect on performance of this. Continuing to look at perf numbers for the next bottleneck to address. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Susan LoVerso [ 08/Nov/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
I have attached the full pmp output | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Susan LoVerso [ 08/Nov/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
The second run shows very different stacks. I see a lot of contention on the SessionCache mutex including threads in Michael's patch ensureSession:
There are also two threads waiting on the repl coordinator mutex in setMyLastAppliedOpTimeForward and one waiting on the newOpMutex in getNextOpTimes. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Susan LoVerso [ 08/Nov/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
This is very interesting. I ran pmp on two separate runs of the test and saw very different output. On one run, I still see more than half the workload threads waiting on the repl coordinator mutex in setMyLastAppliedOpTimeForward even with the small change Michael added. Also it would make sense for the same change to go into setMyLastDurableOpTimeForward:
The thread holding it is at the beginning of the worker function:
| ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Susan LoVerso [ 08/Nov/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Interestingly, Michael and I both ran with his patch installed and saw different stacks. He frequently saw threads in synchronizeOnOplogInFlightResource and my runs saw setMyLastAppliedOpTimeForward and getTerm. I am running on AWS, I don't know where Michael is running. These numbers are all with a production build of WiredTiger (my earlier numbers were with a diagnostic build): Here are my results today:
With those changes we've regained a large portion of the performance, but we're still below 3.5.8. I'll keep looking into the stacks of the patched runs and see if I can find more bottlenecks. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eric Milkie [ 08/Nov/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Nice find. I agree that’s a good change to make. Tomorrow morning I’ll take Michael’s patch and prepare it for commit. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Susan LoVerso [ 07/Nov/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
While I focused yesterday on WiredTiger locking statistics I think the biggest problem is that nearly all threads are waiting on a single lock here:
Mongod runs 50% faster in 3.5.12 than it does in 3.5.13 (and later, up to today's master). 3.5.12 gets about 67k ops/sec and 3.5.13+ gets 45k ops/sec. Current master also gets 45k ops/sec. milkie the stacks above show where the bottleneck is. I see threads waiting on the mutex while one thread is settings its first commit timestamp on a transaction. michael.cahill has a diff that pulls the starting of a transaction outside the mutex in getNextOpTimes. That has a large positive effect on performance. I have attached it ensure-session.patch | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Michael Cahill (Inactive) [ 06/Nov/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
sue.loverso, thanks for the update. The changes you are suggesting to __wt_txn_global_set_timestamp are in the | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Susan LoVerso [ 06/Nov/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Last week I ran this test against successive releases from 3.5.9-3.5.13. Each showed a gradual decline but then there was a large drop between 3.5.12 and 3.5.13. There has been more, gradual code added that uses timestamps in that period as well.
I instrumented a whole bunch of temporary statistics (to 3.5.13) to characterize this today. Here are some results. It builds on the write locks mentioned above by Bruce but with a lot more detail.
The lowest hanging fruit here seems to be in wt_txn_global_set_timestamp. That is called from replication code and it always, unconditionally acquires the txn_global->rwlock for writing and it does not need to update 98.3% of the time. If we could avoid taking that lock until we're actually needing to update, that should help particularly if the replication stack is holding a lock in its own code while setting the timestamp. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Susan LoVerso [ 03/Nov/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for the explanations! While | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by William Schultz (Inactive) [ 02/Nov/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
As Andy mentioned, my commit c7661b14867cd058e1a67986b8e05a7020fc0a5e did introduce a considerable regression, which was noted by BF-6399. It was resolved, however, by commit bf8ae1b3edbf7470e92b7407f76a042cc2246d48. Even with that fix, though, it could still be investigated as a possible source of the regression against 3.4. Perhaps we should determine if the slowest part of the "set stable timestamp" logic lies at the replication layer (i.e. ReplicationCoordinatorImpl::_setStableTimestampForStorage_inlock), or at the storage layer (StorageInterface::setStableTimestamp)? | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eric Milkie [ 02/Nov/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
There is also | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andy Schwerin [ 02/Nov/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
In those stacks, there is a thread blocked on a WT spinlock trying to set the "oldest timestamp" / "stable timestamp for storage" while holding the replication coordinator mutex. If that's actually taking a long time, and not a coincidence, then it is causing other inserts to block waiting to do the same (7 threads) or to assign a term to their oplog entries (2 threads). That makes c7661b14867cd058e1a67986b8e05a7020fc0a5e by william.schultz a candidate. It committed on August 8, and so the first release it appeared in was 3.5.12. sue.loverso, is it hard to run your test at and immediately before this commit? If not, could you? If this is the problem, then our goal is probably to speed up _storage->setStableTimestamp, to shorten the critical section. redbeard0531, did you look into this earlier when optimizing? Does this affect benchmarks other than bulk insert? | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Susan LoVerso [ 02/Nov/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
I have attached the full pmp output files showing all threads that contains the threads above for both 3.5.9 and 3.5.13. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Susan LoVerso [ 02/Nov/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Ok, since 3.5.13 the replication initiate stuff changed such that running just rs.initiate() was failing and then running the script resulted in the 14k ops/sec in some hobbled mode. Correctly starting replication I'm seeing about 46k ops/sec. Much better but far less than the 80k I see consistently in 3.5.9. Running pmp.sh on during the test, I see most/all threads in the WiredTiger library inserting items in 3.5.9 and most insert threads waiting for a replication lock in 3.5.13. If I grep receivedInsert pmp.out as the way to see what the 15 application threads are doing I see in 3.5.9:
But in 3.5.13, 11 out of the 15 threads are waiting on one or more locks in mongoDB and only a single thread is inside WiredTiger:
spencer can you take a look and re-assign if needed? | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Susan LoVerso [ 02/Nov/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
Running on that branch with MongoDB master yields the same throughput, about 14100 ops/sec. So there is no change compared to the earlier run I did last week. Continuing to try to isolate when in the timeline the large drop happened and then get stacks and isolate further. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Susan LoVerso [ 02/Nov/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Yes, I've been running tests today trying to isolate when it started dropping so much, but I'll retest with the branch first. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Michael Cahill (Inactive) [ 02/Nov/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
sue.loverso, could you please retest against the Also, the lookaside eviction changes probably explain most of the additional regression when replication is enabled, so that should also be better on the branch. Thanks! | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Susan LoVerso [ 26/Oct/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Here are the results I ran today. I ran 3.5.8 with the current WiredTiger develop branch but it showed the same numbers as straight 3.5.8. I will further investigate the timestamp and replication issue above and given my much worse numbers, I'm sure it'll be easy to get stacks. I will run with successive releases because the timestamp work was rolled out gradually. The numbers below are the average of 3 runs, but they were not statistically far apart run-to-run.
| ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Susan LoVerso [ 26/Oct/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
bruce.lucas thank you for those stacks. I ran eight variations today and I'll post the full results soon. But your stacks give a strong clue for where we should start. FYI I ran MongoDB-3.5.8+current WiredTiger-develop and MongoDB-master+WiredTiger-develop. Master is running 84% slower than 3.5.8 (14100 ops/sec versus 88000 ops/sec). | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 26/Oct/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
I looked at the ftdc data from the test and confirmed that in 3.6.0-rc0 there are a large number of write-lock calls for the replica set case, but not for the standalone case, confirming that it is a possible suspect for the regression seen in the former but not the latter. Using gdb determined that we're calling __wt_writelock in the following 5 places in equal numbers, apparently each once per update:
| ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 24/Oct/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
12 cores / 24 cpus | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Pasette (Inactive) [ 24/Oct/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
what's your machine setup? | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 24/Oct/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
It can use either depending on value of writeCmd option; ops/s is different depending on which you choose but regression is essentially the same. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eric Milkie [ 24/Oct/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
How does this workload compare to the insert Microbenchmark tests and the YCSB load sys-perf tests? |