[SERVER-13574] performance/index problem on concurrent search/insert Created: 14/Apr/14  Updated: 10/Dec/14  Resolved: 20/May/14

Status: Closed
Project: Core Server
Component/s: Concurrency, Performance, Querying
Affects Version/s: 2.6.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Mike Pasieka Assignee: Thomas Rueckstiess
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Win 8.1 x64, i7, 16GB of RAM, 10GB available for Mongo.


Operating System: Windows
Steps To Reproduce:

have 2 concurrent read/write processes, and fairly big collections (A is 60m, B is 4m records)

Participants:

 Description   

I have a db and 2 processes that aggregate the data. They do the following:
1. find oldest record in collection B, if any
2. select some records from collection A for aggregation
3. aggregate (outside mongo) and insert into collection B

Issue was not present in 2.4, it's 2.6 only. Both collections have proper indices (

{s: 1, p: 1, d: 1}

and

{d:1, s:1 }

), I query by s and p + sort by d. No sharding.

When 1 worker is doing its job - it's ok.
when 2 workers are doing the job - it seems that indices stop working, because find() operation is veery slow (at least 50 times slower), details below. Disk is barely used all the time.

when i kill one worker and only one remains - everything works as usually, i mean fast.

i'm running windows, data + indices are slightly bigger than RAM.

console log, 2 workers are working, and then one is killed:

2014-04-14T18:49:12.406+0200 [conn106] query kwitdb.daily.products query: { $query: { s: 3259, p: 551 }, $orderby: { d: 1 } } planSummary: IXSCAN { s: 1.0, d: 1.0 } ntoreturn:1 ntoskip:0 nscanned:4441837 nscannedObjects:4441836 keyUpdates:0 numYields:34642 locks(micros) r:12437794 nreturned:1 reslen:158 6518ms
2014-04-14T18:49:16.774+0200 [conn106] query kwitdb.daily.products query: { $query: { s: 3259, p: 552 }, $orderby: { d: 1 } } planSummary: IXSCAN { s: 1.0, d: 1.0 } ntoreturn:1 ntoskip:0 nscanned:3001134 nscannedObjects:3001133 keyUpdates:0 numYields:23445 locks(micros) r:8551123 nreturned:1 reslen:158 4319ms
2014-04-14T18:49:21.931+0200 [conn106] query kwitdb.daily.products query: { $query: { s: 3259, p: 556 }, $orderby: { d: 1 } } planSummary: IXSCAN { s: 1.0, d: 1.0 } ntoreturn:1 ntoskip:0 nscanned:3536380 nscannedObjects:3536379 keyUpdates:0 numYields:27541 locks(micros) r:10049331 nreturned:1 reslen:158 5081ms
2014-04-14T18:49:26.713+0200 [clientcursormon] mem (MB) res:5723 virt:86414
2014-04-14T18:49:26.713+0200 [clientcursormon]  mapped (incl journal view):85946
2014-04-14T18:49:26.713+0200 [clientcursormon]  connections:6
2014-04-14T18:49:30.227+0200 [conn106] query kwitdb.daily.products query: { $query: { s: 3259, p: 582 }, $orderby: { d: 1 } } planSummary: IXSCAN { s: 1.0, d: 1.0 } ntoreturn:1 ntoskip:0 nscanned:4441850 nscannedObjects:4441849 keyUpdates:0 numYields:34688 locks(micros) r:12389182 nreturned:1 reslen:158 6361ms
2014-04-14T18:49:35.156+0200 [conn106] query kwitdb.daily.products query: { $query: { s: 3259, p: 583 }, $orderby: { d: 1 } } planSummary: IXSCAN { s: 1.0, d: 1.0 } ntoreturn:1 ntoskip:0 nscanned:3486873 nscannedObjects:3486872 keyUpdates:0 numYields:27241 locks(micros) r:9676540 nreturned:1 reslen:158 4884ms
2014-04-14T18:49:41.679+0200 [conn106] query kwitdb.daily.products query: { $query: { s: 3259, p: 603 }, $orderby: { d: 1 } } planSummary: IXSCAN { s: 1.0, d: 1.0 } ntoreturn:1 ntoskip:0 nscanned:3721757 nscannedObjects:3721756 keyUpdates:0 numYields:29036 locks(micros) r:10548485 nreturned:1 reslen:165 5340ms
2014-04-14T18:49:47.215+0200 [conn106] query kwitdb.daily.products query: { $query: { s: 3259, p: 607 }, $orderby: { d: 1 } } planSummary: IXSCAN { s: 1.0, d: 1.0 } ntoreturn:1 ntoskip:0 nscanned:3721759 nscannedObjects:3721758 keyUpdates:0 numYields:28665 locks(micros) r:10396267 nreturned:1 reslen:158 5254ms
2014-04-14T18:49:47.216+0200 [conn106] end connection 192.168.1.19:61191 (5 connections now open)
2014-04-14T18:50:10.551+0200 [conn99] insert kwitdb.weekly.products ninserted:1 keyUpdates:0 numYields:0 locks(micros) w:19 196ms
2014-04-14T18:51:10.727+0200 [conn99] insert kwitdb.weekly.products ninserted:1 keyUpdates:0 numYields:0 locks(micros) w:31 352ms

could you please explain that? is it fixable, or it's just the global write lock?



 Comments   
Comment by Thomas Rueckstiess [ 20/May/14 ]

Hi Mike,

I haven't heard back from you for some time, so I'll assume that you were able to resolve or work around the issue. I'll close the ticket now, but if you'd like to follow up feel free to re-open it. In that case, we'd still need the additional information requested in my previous reply.

Kind Regards,
Thomas

Comment by Thomas Rueckstiess [ 22/Apr/14 ]

Hi Mike,

From the output in the log file, I can see that the slow queries are using the wrong index:

planSummary: IXSCAN { s: 1.0, d: 1.0 }

This should be {s: 1.0, p: 1.0, d: 1.0}. That's why the query has to scan a large number of documents and also page them from disk if they do not reside in RAM already:

nscanned:4441837 nscannedObjects:4441836

I've tried to reproduce this situation locally with the same indexes and queries as you describe, but in my test, it always picked the correct index. The different behavior could be explained by the data distribution. Mine was uniformly random, but perhaps you have certain values that appear a lot more than others. MongoDB's query planner is empirical, and occasionally runs short experiments to see which of the possible indexes is most effective. It's possible that it picked the wrong index based on the experiment, but without knowing your exact data distribution it's hard to tell.

If you would like to investigate this further, I'd need to get an idea of the distribution of your data. If this is a non-production environment, you could run an aggregation task like this:

use kwitdb
db.daily.products.aggregate({"$group": {_id: {s:"$s", p:"$p"}, count: {$sum: 1}}}, {$out: "datadistr"})

which would generate a new collection "datadistr" and count the number of documents per (s, p) pair. Once completed, you can mongodump this collection and attach it to the ticket.

Note: If this is a production environment, I do not recommend running this aggregation task because it is computationally expensive and can have serious impact on the performance of your system!

I'd also like to look at some example documents from this collection, so I can more accurately reproduce this here. Could you also attach the output of:

use kwitdb
db.daily.products.find().limit(10)

If you are looking for a short-term solution, as a workaround I recommend that you use a .hint() for these types of queries as you know exactly which index they should use. This will force the planner to use the correct index.

Please let me know how you'd like to proceed.

Thanks,
Thomas

Comment by Mike Pasieka [ 15/Apr/14 ]

hi! thanks for quick reply.

1. Both collections have same index structure, as listed below:

> use kwitdb
switched to db kwitdb
> db.daily.products.getIndexes()
[
	{
		"v" : 1,
		"key" : {
			"_id" : 1
		},
		"ns" : "kwitdb.daily.products",
		"name" : "_id_"
	},
	{
		"v" : 1,
		"key" : {
			"s" : 1,
			"d" : 1
		},
		"ns" : "kwitdb.daily.products",
		"name" : "s_1_d_1"
	},
	{
		"v" : 1,
		"key" : {
			"s" : 1,
			"p" : 1,
			"d" : 1
		},
		"ns" : "kwitdb.daily.products",
		"name" : "s_1_p_1_d_1",
		"background" : true
	}
]
> db.weekly.products.getIndexes()
[
	{
		"v" : 1,
		"key" : {
			"_id" : 1
		},
		"ns" : "kwitdb.weekly.products",
		"name" : "_id_"
	},
	{
		"v" : 1,
		"key" : {
			"s" : 1,
			"d" : 1
		},
		"ns" : "kwitdb.weekly.products",
		"name" : "s_1_d_1",
		"background" : true
	},
	{
		"v" : 1,
		"key" : {
			"s" : 1,
			"p" : 1,
			"d" : 1
		},
		"ns" : "kwitdb.weekly.products",
		"name" : "s_1_p_1_d_1",
		"background" : true
	}
]

2. yes

3. I have a slow query treshold of 100ms. No slow entries in the log mean that all queries run under 100ms.

I tried to grab a log of this with full profiling, but after restarting OS and MongoDB the issue is gone - maybe query plans got updated? Now it's fast enough.

I'll try to reproduce this issue later today, with all the operations I was performing yesterday.

Comment by Thomas Rueckstiess [ 14/Apr/14 ]

Hi Mike,

Thanks for reporting this issue. I have some additional questions before we can determine what the problem is.

From the log lines you attached it appears that the index being used to retrieve the products is {s: 1, d: 1}, instead of the more suitable index {s:1, p:1, d:1}, hence the scanning of documents.

  1. Do both collection have both indexes, or did you mean collection A has {s:1, p:1, d:1} and collection B has {d: 1, s:1}? Can you please attach the output of

    use kwitdb
    db.daily.products.getIndexes()
    db.weekly.products.getIndexes()

  2. In your example, is collection A is kwitdb.daily.products and collection B kwitdb.weekly.products?
  3. In the log file snippet you attached all the queries are very slow (several seconds). Can you also show us queries where one of workers are killed and the queries are running fast? Ideally, a much larger portion of the log file (hours) would be helpful to see if anything else stands out.

Thanks,
Thomas

Generated at Thu Feb 08 03:32:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.