[SERVER-18179] "top" reports inaccurate counts for multi-document operations Created: 22/Apr/15  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Diagnostics
Affects Version/s: 3.0.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Adinoyi Omuya Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 0
Labels: query-44-grooming
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to DOCS-10628 (Server) The documentation should bet... Closed
Assigned Teams:
Query Execution
Operating System: ALL
Steps To Reproduce:

Both mongorestore and mongod are version 3.0.2 and the restore was run against a standalone.

Here's some data collected during this period:

mongorestore - Ran:

mongorestore dump/

2015-04-22T15:36:14.077-0400	building a list of dbs and collections to restore from /Users/bedrich/Downloads/dump/ dir
2015-04-22T15:36:14.084-0400	reading metadata file from /Users/bedrich/Downloads/dump/support/issues.metadata.json
2015-04-22T15:36:14.087-0400	restoring support.issues from file /Users/bedrich/Downloads/dump/support/issues.bson
2015-04-22T15:36:17.080-0400	[#.......................]  support.issues  190.0 MB/2.8 GB  (6.5%)
2015-04-22T15:36:20.082-0400	[###.....................]  support.issues  439.3 MB/2.8 GB  (15.1%)
2015-04-22T15:36:23.083-0400	[#####...................]  support.issues  637.2 MB/2.8 GB  (21.9%)
2015-04-22T15:36:26.081-0400	[#######.................]  support.issues  861.1 MB/2.8 GB  (29.7%)
2015-04-22T15:36:29.079-0400	[########................]  support.issues  1.0 GB/2.8 GB  (36.3%)
2015-04-22T15:36:32.080-0400	[########................]  support.issues  1.0 GB/2.8 GB  (36.3%)
2015-04-22T15:36:35.082-0400	[########................]  support.issues  1.1 GB/2.8 GB  (37.4%)
2015-04-22T15:36:38.081-0400	[##########..............]  support.issues  1.2 GB/2.8 GB  (44.0%)
2015-04-22T15:36:41.080-0400	[###########.............]  support.issues  1.4 GB/2.8 GB  (48.4%)
2015-04-22T15:36:44.083-0400	[###########.............]  support.issues  1.4 GB/2.8 GB  (48.4%)
2015-04-22T15:36:47.080-0400	[############............]  support.issues  1.5 GB/2.8 GB  (53.9%)
2015-04-22T15:36:50.084-0400	[##############..........]  support.issues  1.7 GB/2.8 GB  (61.6%)
2015-04-22T15:36:53.081-0400	[################........]  support.issues  1.9 GB/2.8 GB  (67.1%)
2015-04-22T15:36:56.079-0400	[################........]  support.issues  1.9 GB/2.8 GB  (67.1%)
2015-04-22T15:36:59.083-0400	[#################.......]  support.issues  2.0 GB/2.8 GB  (71.5%)
2015-04-22T15:37:02.083-0400	[#################.......]  support.issues  2.0 GB/2.8 GB  (71.5%)
2015-04-22T15:37:05.081-0400	[##################......]  support.issues  2.2 GB/2.8 GB  (75.9%)
2015-04-22T15:37:08.080-0400	[###################.....]  support.issues  2.3 GB/2.8 GB  (82.5%)
2015-04-22T15:37:11.082-0400	[#####################...]  support.issues  2.5 GB/2.8 GB  (89.1%)
2015-04-22T15:37:14.080-0400	[#######################.]  support.issues  2.7 GB/2.8 GB  (96.2%)

mongostat - Ran:

mongostat

 1574    *0     *0     *0       0     3|0       0   8.5G 19.8G 4.1G      0          0   0|1   0|1   34m    11k    6 15:36:45
  5702    *0     *0     *0       0     4|0       0   8.5G 19.8G 4.2G      0          0   0|0   0|1  101m    11k    6 15:36:46
  3438    *0     *0     *0       0     4|0       0   8.5G 19.8G 3.6G      0          0   0|0   0|0   67m    11k    6 15:36:47
  3780    *0     *0     *0       0     3|0       0   8.5G 19.8G 3.7G      0          0   0|0   0|1   67m    10k    6 15:36:48
  5989    *0     *0     *0       0     6|0       0   8.5G 19.8G 3.8G      0          0   0|1   0|1  101m    11k    6 15:36:49
insert query update delete getmore command flushes mapped vsize  res faults idx miss % qr|qw ar|aw netIn netOut conn     time
  4683    *0     *0     *0       0     3|0       0   8.5G 19.8G 3.6G      0          0   0|0   0|0   67m    10k    6 15:36:50
  6036    *0     *0     *0       0     4|0       0   8.5G 19.8G 3.7G      0          0   0|0   0|0  101m    11k    6 15:36:51
  1855    *0     *0     *0       0     3|0       0   8.5G 19.8G 3.7G      0          0   0|0   0|0   34m    10k    6 15:36:52
    *0    *0     *0     *0       0     1|0       0   8.5G 19.8G 3.7G      0          0   0|0   0|0   79b    10k    6 15:36:53
    *0    *0     *0     *0       0     2|0       0   8.5G 19.8G 3.7G      0          0   0|0   0|0  133b    10k    6 15:36:54
    *0    *0     *0     *0       0     1|0       0   8.5G 19.8G 3.7G      0          0   0|0   0|0   79b    10k    6 15:36:55
    *0    *0     *0     *0       0     1|0       0   8.5G 19.8G 3.7G      0          0   0|0   0|0   79b    10k    6 15:36:56
  1692    *0     *0     *0       0     3|0       0   8.5G 19.8G 3.6G      0          0   0|0   0|0   33m    10k    6 15:36:57
  4976    *0     *0     *0       0     4|0       0   8.5G 19.8G 3.7G      0          0   0|0   0|0  101m    11k    6 15:36:58
    *0    *0     *0     *0       0     2|0       0   8.5G 19.8G 3.7G      0          0   0|0   0|0  133b    10k    6 15:36:59
insert query update delete getmore command flushes mapped vsize  res faults idx miss % qr|qw ar|aw netIn netOut conn     time
    *0    *0     *0     *0       0     1|0       0   8.5G 19.8G 3.7G      0          0   0|0   0|0   79b    10k    6 15:37:00
    *0    *0     *0     *0       0     1|0       0   8.5G 19.8G 3.7G      0          0   0|0   0|0   79b    10k    6 15:37:01
    *0    *0     *0     *0       0     2|0       0   8.5G 19.8G 3.7G      0          0   0|0   0|0  133b    10k    6 15:37:02
    *0    *0     *0     *0       0     2|0       0   8.5G 19.8G 3.7G      0          0   0|0   0|0  136b    23k    6 15:37:03
  4990    *0     *0     *0       0     5|0       0   8.5G 19.8G 3.8G      0          0   0|0   0|0  101m    23k    6 15:37:04
  1691    *0     *0     *0       0     4|0       0   8.5G 19.8G 3.7G      0          0   0|1   0|1   33m    23k    6 15:37:05
  3341    *0     *0     *0       0     4|0       0   8.5G 19.8G 3.6G      0          0   0|1   0|1   67m    23k    6 15:37:06
  3298    *0     *0     *0       0     5|0       0   8.5G 19.8G 3.7G      0          0   0|1   0|1   67m    23k    6 15:37:07
  3172    *0     *0     *0       0     4|0       0   8.5G 19.8G 3.5G      0          0   0|1   0|1   67m    23k    6 15:37:08
  3022    *0     *0     *0       0     5|0       0   8.5G 19.8G 3.6G      0          0   0|1   0|1   67m    23k    6 15:37:09
insert query update delete getmore command flushes mapped vsize  res faults idx miss % qr|qw ar|aw netIn netOut conn     time
  5145    *0     *0     *0       0     5|0       0   8.5G 19.8G 3.7G      0          0   0|0   0|0  101m    23k    6 15:37:10
  2655    *0     *0     *0       0     4|0       0   8.5G 19.8G 3.6G      0          0   0|0   0|0   67m    23k    6 15:37:11
  3253    *0     *0     *0       0     5|0       0   8.5G 19.8G 3.7G      0          0   0|0   0|1   67m    23k    6 15:37:12
  2962    *0     *0     *0       0     4|0       0   8.5G 19.8G 3.7G      0          0   0|1   0|1   67m    23k    6 15:37:13
  4958    *0     *0     *0       0     6|0       0   8.5G 19.8G 3.6G      0          0   0|0   0|0  101m    23k    6 15:37:14
  3284    *0     *0     *0       0     5|0       0   8.5G 19.8G 3.7G      0          0   0|0   0|1   67m    23k    6 15:37:15
  1914    *0     *0     *0       0     4|0       0   8.5G 19.8G 3.9G      1          0   1|0   1|1   30m    23k    6 15:37:16

top - Ran

while(true){printjson(db.runCommand("top")["totals"]["support.issues"]["insert"]);printjson(new Date());sleep(1000)}

Wisdoms-MacBook-Pro(mongod-3.0.2) admin> while(true){printjson(db.runCommand("top")["totals"]["support.issues"]["insert"]);printjson(new Date());sleep(1000)}
{
  "count": 65,
  "time": 13880654
}
ISODate("2015-04-22T19:37:03.192Z")
{
  "count": 67,
  "time": 14114514
}
ISODate("2015-04-22T19:37:04.198Z")
{
  "count": 69,
  "time": 14350828
}
ISODate("2015-04-22T19:37:05.205Z")
{
  "count": 71,
  "time": 14569963
}
ISODate("2015-04-22T19:37:06.211Z")
{
  "count": 73,
  "time": 15274439
}
ISODate("2015-04-22T19:37:07.215Z")
{
  "count": 75,
  "time": 15605750
}
ISODate("2015-04-22T19:37:08.218Z")
{
  "count": 77,
  "time": 15843429
}
ISODate("2015-04-22T19:37:09.222Z")
{
  "count": 79,
  "time": 16439871
}
ISODate("2015-04-22T19:37:10.228Z")
{
  "count": 81,
  "time": 16820034
}
ISODate("2015-04-22T19:37:11.230Z")
{
  "count": 83,
  "time": 17090708
}
ISODate("2015-04-22T19:37:12.235Z")
{
  "count": 86,
  "time": 17589414
}
ISODate("2015-04-22T19:37:13.237Z")
{
  "count": 88,
  "time": 17866561
}
ISODate("2015-04-22T19:37:14.242Z")
{
  "count": 91,
  "time": 18252756
}
ISODate("2015-04-22T19:37:15.244Z")
{
  "count": 92,
  "time": 18443123
}
ISODate("2015-04-22T19:37:16.246Z")
{
  "count": 92,
  "time": 18443123
}
ISODate("2015-04-22T19:37:17.251Z")
{
  "count": 92,
  "time": 18443123
}
ISODate("2015-04-22T19:37:18.254Z")
{
  "count": 92,
  "time": 18443123
}
ISODate("2015-04-22T19:37:19.259Z")
{
  "count": 92,
  "time": 18443123
}
ISODate("2015-04-22T19:37:20.261Z")
{
  "count": 92,
  "time": 18443123
}
ISODate("2015-04-22T19:37:21.262Z")
{
  "count": 92,
  "time": 18443123
}
ISODate("2015-04-22T19:37:22.266Z")
{
  "count": 92,
  "time": 18443123
}
ISODate("2015-04-22T19:37:23.269Z")
{
  "count": 92,
  "time": 18443123
}
ISODate("2015-04-22T19:37:24.273Z")
{
  "count": 92,
  "time": 18443123
}
ISODate("2015-04-22T19:37:25.278Z")
{
  "count": 92,
  "time": 18443123
}
ISODate("2015-04-22T19:37:26.280Z")
{
  "count": 92,
  "time": 18443123
}
ISODate("2015-04-22T19:37:27.281Z")
{
  "count": 92,
  "time": 18443123
}
ISODate("2015-04-22T19:37:28.284Z")
{
  "count": 92,
  "time": 18443123
}
ISODate("2015-04-22T19:37:29.285Z")
{
  "count": 92,
  "time": 18443123
}
ISODate("2015-04-22T19:37:30.286Z")
{
  "count": 92,
  "time": 18443123
}

Participants:
Case:

 Description   

For example, while running a bulk insert operation with several documents, simultaneously running db.runCommand("top") on the admin database indicates only 1 insert is occurring - even though a number of documents are being inserted.

Monitoring applications that rely on this count are therefore unable to report accurate counts when clients perform multi-document operations.



 Comments   
Comment by Ian Whalen (Inactive) [ 09/Mar/18 ]

bernard.gorman to resolve this if fixed via outstanding work.

Comment by Asya Kamsky [ 03/Mar/18 ]

Note that my previous statement:

[top counts are] same as serverStatus opcounters just not the same as serverStatus document counts.

is not accurate in sharded cluster. Top (correctly, it seems) only counts write commands initiated by the client. I.e. it does not count migration inserts/deletes. ServerStatus command seems to count some system operations as writes - leading to top and serverStatus counts for write commands to diverge in sharded clusters.

However, there are scenarios when serverStatus is inconsistent and counts documents inserted in a single insert as multiple inserts, so it seems more likely to me that top is consistent and serverStatus is not. It appears the mongostat command benefits from this by showing documents inserted based on serverStatus opcounters but technically it seems to be relying on a mistake and probably the fix should be in making mongostat use more correct/consistent metrics (and then serverStatus maybe can be fixed, though I'm guessing other things are relying on it so the "fix" would have to be non-backwards breaking). In fact, it looks like it's db.serverStatus().metrics.commands.insert" that's consistent with top I think proving that the inconsistency is mainly in the way mongostat uses serverStatus data.

Comment by Asya Kamsky [ 01/Mar/18 ]

I did a bit of research and top command seems consistent - it always increments counter once for each time a command was run. So running update command will increment the counter once whether it changed 0 documents or 100. Insert with multiple documents also increments insert command counter once.

This seems consistent if maybe not what someone wants when they are comparing it to mongostat or some counters from db.serverStatus. But it's the same as serverStatus opcounters just not the same as serverStatus document counts.

Comment by Daniel Pasette (Inactive) [ 20/Mar/16 ]

The integration team is already considering related issues. Adding to their backlog.

Comment by Andy Schwerin [ 04/Jun/15 ]

A lot of the "top" command behavior is surprising. Some of it has probably not worked as originally intended since 2.2. I know that the lock information overcounts time spent write locked. adinoyi.omuya@10gen.com, let's sit down and talk about what the intended meaning of mongotop's output is, and adjust this ticket to account for all of the needed changes to the top command.

Comment by Adinoyi Omuya [ 22/Apr/15 ]

I used both - it was with the wire protocol - specifically OP_INSERT - that I ran into this. You can easily reproduce this using mongorestore v3.0.2.

The counts are accurate using the shell's bulk API though:

var bulk = db.issues.initializeUnorderedBulkOp();
bulk.insert( { item: "abc123", defaultQty: 100, status: "A", points: 100 } );
bulk.insert( { item: "ijk123", defaultQty: 200, status: "A", points: 200 } );
bulk.insert( { item: "mop123", defaultQty: 0, status: "P", points: 0 } );
bulk.insert( { item: "abc123", defaultQty: 100, status: "A", points: 100 } );
bulk.insert( { item: "ijk123", defaultQty: 200, status: "A", points: 200 } );
bulk.execute();

Before running:

ISODate("2015-04-22T20:06:49.735Z")
{
  "count": 779290,
  "time": 176921520
}

After running:

ISODate("2015-04-22T20:06:50.737Z")
{
  "count": 779295,
  "time": 176921784
}

Comment by Scott Hernandez (Inactive) [ 22/Apr/15 ]

Are you using write commands or the wire protocol. I believe you are seeing a by-product of how we process write commands with new internal operations for each item.

Comment by Adinoyi Omuya [ 22/Apr/15 ]

When I run the serverStatus command, I get the actual count I expect - regardless of whether I use bulk inserts - which can be seen from the mongostat output I included (mongostat leverages the serverStatus command). This suggests that we actually keep track of the number of documents affected. So I'm unclear on why top behaves differently.

Comment by Scott Hernandez (Inactive) [ 22/Apr/15 ]

I believe it monitors client issued operations (a single multi-document insert command), not documents being affected – just like server status opcounters. If one wanted to monitor document based counters they can use those server status metrics.

Generated at Thu Feb 08 03:46:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.