[SERVER-20126] Count on GridFS fs.files and fs.chunks slow Created: 26/Aug/15  Updated: 03/Oct/15  Resolved: 03/Oct/15

Status: Closed
Project: Core Server
Component/s: GridFS
Affects Version/s: 3.0.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: carl dong Assignee: Sam Kleinman (Inactive)
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File all_logs.tar.gz    
Operating System: ALL
Steps To Reproduce:

Just heavy load to count fs.chunks and fs.files with command :
db.runcommand({ count { count: "fs.files", query: {} }})

Participants:

 Description   

Issue happen on the secondary instance which we do some read operation on it , there are only around 200+ rows in fs.chunks and fs.files collections , our application counts both collections but it cost a long time to return result :

sample log :

2015-08-26T17:12:06.304+0800 I COMMAND  [conn43] command UserFileSource.$cmd command: count { count: "fs.files", query: {} } planSummary: COUNT keyUpdates:0 writeConflicts:0 numYields:0 reslen:44 locks:{ Global: { acquireCount: { r: 2 }, acquireWaitCount: { r: 1 }, timeAcquiringMicros: { r: 272849 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } 272ms
2015-08-26T17:12:06.304+0800 I COMMAND  [conn116] command ADFileSource.$cmd command: count { count: "fs.files", query: {} } planSummary: COUNT keyUpdates:0 writeConflicts:0 numYields:0 reslen:44 locks:{ Global: { acquireCount: { r: 2 }, acquireWaitCount: { r: 1 }, timeAcquiringMicros: { r: 160138 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } 160ms

db stats like this :

> db.stats()
{
        "db" : "ADFileSource",
        "collections" : 3,
        "objects" : 490,
        "avgObjSize" : 55058.23673469388,
        "dataSize" : 26978536,
        "storageSize" : 26263552,
        "numExtents" : 0,
        "indexes" : 8,
        "indexSize" : 319488,
        "ok" : 1
}

ADFileSource is not a big database , CPU usage is not high , so the bottleneck should not be the cpu.

Attached some status logs(dblog , serverstatus and iostats) for your analysis.
Any futher information please let me know .

Thanks

Carl



 Comments   
Comment by Sam Kleinman (Inactive) [ 11/Sep/15 ]

Sorry for the delay in getting back to you, and thanks for the data. I have a couple of questions about this issue:

  1. Can you characterize the kind of load on your cluster? Are the operations predominantly or exclusively inserts? Do you have a mixed insert and update load?
  2. Are the primaries able to return the count operation more quickly than the secondaries?
  3. Are there differences in configuration, most importantly of the storage systems, for the primary and secondary systems?
  4. Are the secondaries able to replicate without becoming delayed? Could you provide the output of rs.status(); (from the primary) when you're experiencing this issue?

Regards,
sam

Generated at Thu Feb 08 03:53:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.