[SERVER-23605] 100% CPU usage in update one field in one document Created: 07/Apr/16  Updated: 22/Feb/22  Resolved: 11/Apr/16

Status: Closed
Project: Core Server
Component/s: WiredTiger, Write Ops
Affects Version/s: 3.2.3
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Ivan Artemov Assignee: Kelsey Schubert
Resolution: Done Votes: 0
Labels: query
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Azure VM D2 size


Attachments: PNG File 1.png     PNG File 1.png     PNG File 2.png     PNG File 3.png     PNG File 5.png     PNG File OneSample.png     PNG File Screenshot.png    
Operating System: ALL
Steps To Reproduce:

Fill collection with more than 400.000 documents with many text data (5KB) in some properties. Try to update one field at some object.

Participants:

 Description   

I have a simple social crawler written in C# for testing MongoDB performance and decide - use it in another projects.
It collected 750.000 documents in collection Post.
Document post have "IsApproved" property that I want to update.

Posts collection have _id index and

{ "SiteId" : 1, "IsApproved" : -1, "Date" : -1 }

I make project to find one with

{Id, IsApproved}

and UpdateOne with !Approved (see screenshot).

Inserting documents fine and my Azure VM (D2 (2 cores Xeon with SSD) CPU costs about 2-8 percent.
But if I press approve button in my crawler - it "think" 3-5 seconds and costs 40% CPU.
If i press Approve button multiple times without pause - it kill my server and CPU load at 99% for each update time (see screenshot).
Don't say than i need be toggle boolean property with XOR command - i approximate it to other updates and i worry.

How I will use database, that only one update costs so many CPU times ?
It's not normal behaviour. Maybe that index rebuild ?



 Comments   
Comment by Hamza Shakir [ 22/Feb/22 ]

I am facing issue. I am streaming stock trades from from alpaca but issue that db usage 100% during streaming

my system spceicification:

Instance type: t3.xlarge
RAM: 16GB
CPU: 4
Operating system: Ubuntu 20.04

Kindly let me know system requirement for above task. bellow is my piece of code

await models.Stocks.updateOne({ symbol: trade.Symbol }, [{
     $set:

{            las t_price: \{ $toDouble:"$price" }

,
           price: { $toDouble: trade.Price },
          difference_price_number: { $toDouble: { $subtract: [

{ $toDouble:"$last_price" }

, { $toDouble: trade.Price }] } },
         difference_price_percentage: { $toDouble: { $abs: { $toInt: { $multiply: [{ $divide: [{ $toDouble: { $subtract: [

{ $toDouble:"$last_price" }

, {                  $toDouble:"$price" }] } }, { $toDouble:"$price" }] }, 100] } } } },
last_update_price_date:newDate()
}
}])
console.log(`trades updated on ${new Date()}`);

Comment by Kelsey Schubert [ 11/Apr/16 ]

Hi ZOXEXIVO,

You may be interested in SERVER-19153, which tracks the aggregation optimization to reorder project and match stages in circumstances like this. Feel free to vote for it and watch it for updates. Until this optimization is implemented, I would recommend following Robert's suggestion.

Thank you for your help identifying the cause of this behavior,
Thomas

Comment by Ivan Artemov [ 11/Apr/16 ]

Robert, Thank you!
I come to MongoDB from EntityFramework just now and i will be carefully with LINQ syntax.
But in the future, i think, it need to be optimized with Linq Expression priorities, because many people will try to work with Mongo and catch that problems.
I think, this issue can be closed. Thanks for all !

Comment by Ivan Artemov [ 11/Apr/16 ]

Thomas, can you move this issue to C# Driver ? Server works ok, but C# driver work as not expected

Comment by Robert Stam [ 11/Apr/16 ]

LINQ queries are always executed as aggregation framework commands.

But now that we see what the LINQ query got converted to it is obvious what the problem is. The resulting aggregation framework query pipeline requires a full collection scan. The reason is that it does a projection before a match so no index can be used (the server does not rearrange the order of the pipeline).

The LINQ query can be easily rewritten so that it generates an efficient aggregation framework pipeline:

var postStatus = collection.AsQueryable()
    .Where(x => x.Id == postId)
    .Select(x => new { x.Id, x.IsApproved })
    .FirstOrDefault();

The resulting aggregation framework pipeline now is:

    [0]: {{ "$match" : { "_id" : CSUUID("00000000-0000-0000-0000-000000000000") } }}
    [1]: {{ "$project" : { "Id" : "$_id", "IsApproved" : "$IsApproved", "_id" : 0 } }}
    [2]: {{ "$limit" : 1 }}

The important change is to filter before projecting.

Since the $match stage is now before the $project stage the server would be able to use an index to execute the pipeline more efficiently.

Comment by Ivan Artemov [ 10/Apr/16 ]

Hello, Thomas. I find problem!

var postStatus = await collection.AsQueryable().Select(x => new { x.Id, x.IsApproved }).FirstOrDefaultAsync(x => x.Id == postId);

convert to

"query" : {
                "aggregate" : "Posts",
                "pipeline" : [ 
                    {
                        "$project" : {
                            "Id" : "$_id",
                            "IsApproved" : "$IsApproved",
                            "_id" : 0
                        }
                    }, 
                    {
                        "$match" : {
                            "Id" : UUID("22bcb784-efdb-45c7-ad6a-4d98a856de4f")
                        }
                    }, 
                    {
                        "$limit" : 1
                    }
                ],
                "cursor" : {}
            },

I think it is not standart behaviour and we need ask C# Driver developer, why it convert to aggregate query.
Thank you!

Comment by Ivan Artemov [ 08/Apr/16 ]

I attach 2 screenshots with system monitors. iostat not available on Windows, but we have Perfrormance Monitor

Comment by Kelsey Schubert [ 08/Apr/16 ]

Hi ZOXEXIVO,

Thank you for reporting this issue. So we can continue to investigate, can you please provide the complete logs for the affected server?

Additionally, can you please execute the following shell script:

delay=1
iostat -k -t -x ${delay:?} >iostat.log &

This script will collect iostat data each second, and will help us to correlate the CPU numbers to events recorded in the diagnostic.data.

After running this script and reproducing this issue, please upload both the iostat.log and the contents of the diagnostic.data to this ticket.

Thank you,
Thomas

Comment by Ivan Artemov [ 07/Apr/16 ]

Indexes on Posts collection.
I drop compound index with IsApproved and nothing... Still 3-5 secods to update

Generated at Thu Feb 08 04:03:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.