[SERVER-10723] Bulk insert is slow in sharded environment Created: 10/Sep/13  Updated: 09/Jul/16  Resolved: 06/Nov/13

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.4.5
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Anton Kozak Assignee: Asya Kamsky
Resolution: Duplicate Votes: 9
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux Centos 2.3, sharded environment (12 shards), sharding was done based on hashed key with pre-splitting.


Attachments: Zip Archive bulk_insert-bin.zip     Zip Archive bulk_insert_src.zip     File init_sharded_env.sh    
Issue Links:
Duplicate
duplicates SERVER-12950 Ordered write commands in mongos do n... Closed
duplicates SERVER-9038 New write operation method for insert... Closed
Operating System: ALL
Participants:

 Description   

We've faced a performance issue recently: bulk insert operation works slow in sharded environment:
Average speed of single insert: 2.7ms, bulk insert of 500 records: 3418.8ms, it looks like that average speed of single insert in 2.5 times slower in case of bulk. Initially we thought that it's an issue in our source code, but investigation confirmed that's mongos issue, another developers have the same problem: http://cfc.kizzx2.com/index.php/slow-batch-insert-with-mongodb-sharding-and-how-i-debugged-it/
It's very strange that operation of bulk insert wasn't optimized for this case.



 Comments   
Comment by Anton Kozak [ 04/Mar/14 ]

It's a very good news, thank you very much!

Comment by Daniel Pasette (Inactive) [ 04/Mar/14 ]

For those watching this ticket, we just committed a very large perf improvement for ordered bulk writes which will be in 2.6.0-rc1: SERVER-12950

Comment by Anton Kozak [ 24/Feb/14 ]

Hi Asya,

Yes, all shards were started on the same machine, the idea was to understand what performance does new feature actually give and provide early feedback, my current project really needs fast bulk inserts/updates.
I used simplified environment for the testing, but usually it shows the trend (both approaches have the same conditions) and saves time.

Today I rerun the tool again trying to avoid external impact (antivirus, updates, another processes, etc), the results are better but not excellent:
unordered bulk
10:44:52.997 [main] INFO sharding.test.Starter - ========= Started unordered bulk inserts ==============
10:52:28.393 [main] INFO sharding.test.Starter - ======= total time: 454449ms
11:04:04.094 [main] INFO sharding.test.Starter - ======= total time: 695391ms
11:17:57.261 [main] INFO sharding.test.Starter - ======= total time: 832561ms
11:37:56.397 [main] INFO sharding.test.Starter - ======= total time: 1198663ms
11:37:56.764 [main] INFO sharding.test.Starter - ========= Finished unordered bulk inserts ==============

single inserts
09:01:47.801 [main] INFO sharding.test.Starter - ========= Started single inserts ==============
09:09:13.885 [main] INFO sharding.test.Starter - ======= total time: 446081ms
09:21:37.211 [main] INFO sharding.test.Starter - ======= total time: 743326ms
09:35:31.183 [main] INFO sharding.test.Starter - ======= total time: 833971ms
09:53:57.469 [main] INFO sharding.test.Starter - ======= total time: 1106167ms
09:53:57.770 [main] INFO sharding.test.Starter - ========= Finished bulk inserts ==============

Unfortunately I cannot deploy rc-0 to any of existed environment but I'm going to create simplified environment on AWS for this testing.

We haven't tested ACKNOWLEDGED on 2.6.0, but on 2.4.8 this level causes serious performance degradation in case of many shards (12 shards in our case):
ACKNOWLEDGED
1mln 1,715,806
+1mln 1,716,493
+1mln 1,770,679
+1mln 1,848,524

UNACKNOWLEDGED/NORMAL
1mln 204,997
+1mln 196,042
+1mln 197,971
+1mln 230,566

It would be great to find somewhere the results of the performance testing done by MongoDB (synthetic testing on optimized environment). Is it available somewhere?

Thank you,
Anton

Comment by D.H.J. Takken [ 24/Feb/14 ]

Could anyone please add a remark in the db.collection.insert() documentation about this issue? That would clear up the performance issues for quite a few people, I guess.

Comment by Asya Kamsky [ 24/Feb/14 ]

It looks like you are running all the servers (shards, config and mongos and your java process) on the same machine (localhost). Can you try it on a regular set-up?

Realistic test should probably insert with more than a single thread, using separate machines and not use smallfiles option on the shards since that'll slow things down significantly when inserting a lot of new data.

"Acknowledged" having higher latency than "unacknowledged" should be expected - is it significantly different than in previous versions? Do you have a separate test for that?

Comment by Anton Kozak [ 23/Feb/14 ]

Please reopen this issue, unfortunately 2.6.0-rc0/mongo-java-driver:2.12.0-rc0 don't resolve it.
According to my observations single inserts are still faster than new unordered bulk operations.
I attached the script which created 3 shards and shards collection, sources and binary for the testing of different modes:

2.6.0- old bulk (collection.insert(values))
21:47:03.411 [main] INFO sharding.test.Starter - ========= Started bulk inserts ==============
21:55:28.372 [main] INFO sharding.test.Starter - ======= total time: 504082ms
22:10:52.592 [main] INFO sharding.test.Starter - ======= total time: 923134ms
22:26:36.025 [main] INFO sharding.test.Starter - ======= total time: 940696ms
22:46:04.166 [main] INFO sharding.test.Starter - ======= total time: 1166059ms
22:46:04.525 [main] INFO sharding.test.Starter - ========= Finished bulk inserts ==============

2.6.0-unordered-bulk
00:09:59.682 [main] INFO sharding.test.Starter - ========= Started unordered bulk inserts ==============
00:19:40.667 [main] INFO sharding.test.Starter - ======= total time: 580061ms
00:36:22.097 [main] INFO sharding.test.Starter - ======= total time: 1000626ms
00:52:58.655 [main] INFO sharding.test.Starter - ======= total time: 991383ms
01:13:43.546 [main] INFO sharding.test.Starter - ======= total time: 1244285ms
01:13:43.685 [main] INFO sharding.test.Starter - ========= Finished unordered bulk inserts ==============

2.6.0-single inserts
11:54:27.220 [main] INFO sharding.test.Starter - ========= Started single inserts ==============
12:02:37.279 [main] INFO sharding.test.Starter - ======= total time: 490031ms
12:19:25.769 [main] INFO sharding.test.Starter - ======= total time: 1008434ms
12:32:53.076 [main] INFO sharding.test.Starter - ======= total time: 807055ms
12:48:20.562 [main] INFO sharding.test.Starter - ======= total time: 927485ms
12:48:20.641 [main] INFO sharding.test.Starter - ========= Finished bulk inserts ==============

There is another issue with ACKNOWLEDGED write concern, it's an extremely slow in sharded environment (in 8-9 times slower than UNACKNOWLEDGED/Normal).

Please share the results of performance testing which you've made (different modes and write concerns).

Thanks,
Anton

Comment by David Storch [ 06/Nov/13 ]

Resolving as duplicate. If you feel that the issues brought up in this ticket have not been adequately addressed, feel free to re-open.

Comment by David Storch [ 05/Nov/13 ]

shadowman131/avish: order can indeed matter, and preserving the current behavior is important to make this as safe as possible. For example, a single operation in a batch of updates can depend on previous operations. In the insert scenario, duplicate keys depend on order inserted. The pre-2.4 behavior did not allow a way to express that order matters—however, the default behavior was that it does matter (the behavior of a standalone vs. sharded collection should be the same). In 2.6 it is being made opt-in as it is less restrictive and allows opportunities for parallelization, which is exactly what we're taking advantage of when the user indicates "ordered: false". We will make sure that this is well-documented, so that the expected behavior is clear in both the "ordered: true" and "ordered: false" cases.

Strict ordering has always been the default for batch inserts, so unfortunately changing this default would be backwards breaking. Some applications rely on an ordering guarantee. Consider the example of an event logging application where successive documents contain monotonically increasing timestamps. In order to maintain consistency of the data, the documents must be inserted in order. Imagine taking a snapshot of the data during an unordered batch insert: the snapshot will not necessarily represent a point in time on the event timeline that the application intends to record.

I definitely understand the point that an unordered (and therefore as fast as possible) batch operation could be a better default. However, it is certainly safer to have ordered as the default, which is why the system currently works as it does. We are hearing your feedback, and I will continue to discuss the feature developer in order to ensure that we take the best approach moving forward.

Comment by Walt Woods [ 05/Nov/13 ]

If batches aren't sent in parallel in an unordered implementation, that's a pretty big oversight.

Comment by Avishay Lavie [ 05/Nov/13 ]

I agree with the previous comments that the requirement for preserving documents order during batch insert was not really there in the first place (i.e. it wasn't clearly documented or communicated); in fact this issue exists only because users implicitly expected batch inserts to be unordered and to happen as fast as possible, and were surprised to discover this wasn't the case.

So +1 from me for having the default be "make this batched insert as fast as you can regardless of order", especially if the write concern is able to report exactly which inserts worked and which didn't, which avoids any use case for preserving order. I realize that this might be considered a breaking change but I think it's worthwhile to try to understand how many users actually rely on this behavior; my guess would be that most users of batch inserts in sharded environment either don't care about this issue or would be surprised to discover this isn't already the default behavior.

A more important point IMO is that in unordered mode (regardless of whether it is the default), batches sent to different shards should probably be sent in parallel to gain a significant performance boost similar to that gained when making a non-targeted query in a sharded cluster.

Comment by Thanos Angelatos [ 05/Nov/13 ]

Hello David,

Appreciate you taking the time to respond, unfortunately your answer is focusing on the new write commands that are internal to the way drivers work and not to the existing user code and patterns that rely on defined behaviour of your drivers. Since "ever" (is this a very big word? maybe) the purpose of having a batch insert is to fire as fast as possible a barrage of writes against a MongoDB. Is ordering necessary? Maybe yes, in specific occasions, but usually, no. In this I agree with the previous comment and I do not understand why "MongoDB must honor the requirement that the inserts happen in the order that they are specified" should apply in this case - it was never there in the first place on the bulk insertions, why introduce this now?
The whole argument behind my comment is focused on the use of "sensible defaults." I am for sensible defaults; I just don't understand why we need to specify "ordered":false just to get what we already have in 2.4.4? please clarify why you think applications expect batch inserts to be ordered in a sharded environment (using hash indexing!).

Comment by Walt Woods [ 05/Nov/13 ]

Hi David, I don't understand any situation when the ordering of a batch insert would matter? Could you please enlighten us as to how a system would rely on batch inserts being inserted in the order they are specified? What I find particularly weird about this (seeming double standard) is that when reading from shards (without a sort specified), the data can come back in any order. I'd go so far as to say that is the expected behavior in a sharded environment for speed reasons. It seems very unreasonable to assume (or care) that a batch insert happens exactly in the order specified.

The only situation I can think of where this would matter is the one where a single bulk insert call contains the same _id twice. This seems very easy to rectify - just have the client drivers reject the second, duplicate _id insert.

Comment by David Storch [ 05/Nov/13 ]

Hi Thanos,

In 2.6, all drivers will use the new write commands for their existing batch insert API. You will not need to change your application's batch insert in order to make use of the new write commands.

The "ordered" field will be optional with a default of "true", so it does mean that applications will have to specify "ordered: false" in order to take advantage of unordered fast batch inserts. Unfortunately we must keep the default as "ordered: false"—we do not want the behavior of existing applications that rely on strict ordering of batch inserts to change in unexpected ways.

Comment by Thanos Angelatos [ 05/Nov/13 ]

Hello David,

Thanks for the detailed response. We do face this problem and what I do not understand and you have not explained is why you're introducing functionality (ordered inserts, getLastError) that was not existing in the past - I am referring to batch updates pro-2.4.5 - as default for the new fix version and not have it as an option so existing systems that expect the unordered but fast batch inserts need to be changed.

thanks for your take on this

Comment by David Storch [ 04/Nov/13 ]

Hi all,

Apologies for the delayed response. The short answer is that this will be fixed in version 2.6 with SERVER-9038, so I have marked this as a duplicate. Read on for a more detailed explanation.

For batch inserts, MongoDB must honor the requirement that the inserts happen in the order that they are specified. The consequence is that consecutive inserts can be batched if they correspond to the same shard. First consider the case where "k" is the shard key. There are two shards, corresponding to ranges

[MinKey, 10], (20, MaxKey]

Now suppose we batch insert the following documents:

[{k: 1}, {k: 25}, {k: 2}]

A call to getLastError is required after each document in this case because no two consecutive documents will be sent to the same shard. If MongoDB tried to combine k:1 and k:2 into a batch, then it is possible that k:2 will be inserted before k:25, which is not allowed. Supposing that we first sort by the shard key, as in

[{k: 1}, {k: 2}, {k: 25}]

then k:1 and k:2 can be sent to shard 1 as a batch, reducing the number of calls to getLastError required and improving insert throughput.

In the case of a hashed keys, inserted documents will be distributed more or less randomly among the shards, which means that very little batching can take place. The new bulk write commands to added for SERVER-9038 fix this problem by allowing you to specify "ordered: false". This tells the database that you do not care about strictly preserving the order in which insertions take place. With "ordered: false", mongos will create a single batch per shard, obviating the extra getLastError calls. Each batch operation can be performed on the appropriate shard concurrently, without waiting for the getLastError response from the previous batch.

Comment by Christos Soulios [ 15/Oct/13 ]

We have seen exactly the same performance degradation when we upgraded our database from 2.2 to 2.4, while changing our sharding strategy from range to hash.

Is there any plan for solving this issue soon?
Thank you.

Comment by Walt Woods [ 03/Oct/13 ]

For what it's worth, I ran some tests on our system and verified this incredibly bizarre behavior. Running a larger sample right now, but for a small set of documents on a 3 shard server, I got a 10x speedup just by sorting the documents before passing them to PyMongo's insert() function.

Never thought I'd spend part of a day coding in a custom look-at-what-chunk-each-document-will-be-in-and-sort function for my database driver...

Comment by Avishay Lavie [ 03/Oct/13 ]

Looking at strategy_shard.cpp I see that mongos groups together inserts to the same shard as long as they're consecutive (in the original list of documents given to the batch insert command), i.e. it maintains the original document order. I would expect a sharded batch insert operation to do a parallel scatter-gather operation, so order shouldn't matter anyway.

So, the question is: Is there a reason to maintain the original ordering of documents in the batch, at the expense of performance? I assume it wouldn't be too hard to pre-sort the documents in the batch according to their assigned shards so as to minimize number of insert-groups (ideally having no more than one group per shard). Would you be willing to look at a pull request implementing such pre-ordering, or is the current implementation considered the desired behavior?

Comment by Avishay Lavie [ 03/Oct/13 ]

I'm suffering from this as well. Whenever a batch insert targets more than a single shard, I see about 10x drop in performance.
I assumed mongos is splitting a batch-insert operation into N batch-inserts, one for each shard; but from the code and the blog post linked above it appears that it's splitting it to individual inserts, which is indeed a huge performance drain.

Comment by D.H.J. Takken [ 19/Sep/13 ]

Just checked the effect of sorting on bulk insert performance. I can confirm that sorting the documents on the sharding key in the application improves bulk insert speed quite a bit. Note that my sharding key is a compound key, which has a hash as its rightmost component. So, the sharding keys of the documents generated by the application have a high degree of randomness.

Comment by D.H.J. Takken [ 19/Sep/13 ]

I am seeing the exact same thing on a cluster with four shards, running MongoDB 2.4.6 on Ubuntu Linux 12.10. Each shard runs on its own physical machine, as does the mongos instance. I use pymongo to insert documents. Simply replacing batch insert commands with for loops doubles insert speed.

Bulk inserting on an empty cluster is fast at first. As chunks get distributed across the shards, insert speed drops rapidly. The more shards that mongos needs to distribute the data to, the slower the batch insert operations get. Inserting documents one by one shows a higher insert speed, which remains the same during the run of the application.

I also verified that the effect is not caused by a stressed out balancer, by manually splitting the chunks, distributing them before inserting the documents and disabling the balancer. This yields a low insert speed right from the start.

In my application, the documents are small, about 1 kb. The mongos instance and the shards show only little CPU load while inserting documents. Running multiple instances of the application in parallel allows the total insert speed to grow significantly, which suggests that the mongos threads spend most of their time waiting for something.

Generated at Thu Feb 08 03:23:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.