Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Performance
Labels:
None

Operating System:
ALL
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

on MongoDB 3.4.19, shard cluster with 2 shards.

collection `foo` is sharded with hashed key "UID"

> sh.enableSharding("foo")
> db.adminCommand({"shardCollection": "foo.foo", key: {"UID": "hashed"}})
> use foo
> db.foo.createIndex({"name": 1})

we write a test case in mongo-cxx-driver to test the latency.

auto nStart = std::chrono::high_resolution_clock::now();
oDB[DB_TABLE_NAME].update_one(document{}<<"UID"<< nUID <<finalize, oValue.view(), opt);
auto nEnd = std::chrono::high_resolution_clock::now();
std::chrono::duration<double, std::milli> diff = nEnd - nStart;
printf("uid %d timediff %f\n", nUID, diff.count());

output as below:

uid 10030116 timediff 1.249126
uid 10000124 timediff 1.021864
uid 10050127 timediff 1.172118
uid 10020116 timediff 1.223791
uid 10040115 timediff 1.408828
uid 10070114 timediff 1.526046

What we did:

The test case run with 8 threads, each thread runs 10000 upserts loop to the collection.
Each upsert operation is on different UID.
Except UID/ID, all docs are always the same.

What we found:

1. the latency has many spike, such as 4000+ ms in some update_one operation, but not any slow log in mongod server(we set slowms to 10ms).
2. as 1, the total timediff is unstable(FROM: ./testmongo | awk '{s+=$NF}END{print s}')

149726
129175
124993
219767
137422
156674
162410
119684
117086
116885

3. after we run `rs.stepDown()` on config Primary, the spike gone away(FROM: ./testmongo | awk '{s+=$NF}END{print s}')

106808
107225
107228
108055
106660
108690
105993
107037
106226
104789
106494
105178
107428
108650
106535

What we tried:

initial the test db without sharding but create index UID as hashed, name as 1, the latency is normal and stable;
run the benchmark to each replicaSet, teh latency is normal and stable;
try to stepDown replicaSet, the latency may return normal.(not always)
try to stepDown config again, the latecy may abnormal/normal again.(not always)
replace the wiredTiger storage to MMAPv1, have the same sistuation.(can return normal after stepDown something)
try to stop balance, it has not help;
downgrade to MongoDB 3.2.22, the latency is normal and stable
not any abnormal log, we only found split point lookup and network connection logs.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

4.0_primary.shard2.diagnostic.data.tgz
292 kB
Mar 14 2019 02:19:09 AM UTC
4.0_primary.shard1.diagnostic.data.tgz
378 kB
Mar 14 2019 02:19:09 AM UTC
4.0_primary.config.diagnostic.data.tgz
305 kB
Mar 14 2019 02:19:09 AM UTC
3.4_primary.shard2.diganostic.data.tgz
2.37 MB
Mar 14 2019 09:37:48 AM UTC
3.4_primary.shard1.diganostic.data.tgz
3.00 MB
Mar 14 2019 09:37:48 AM UTC
3.4_primary.config.diganostic.data.tgz
2.49 MB
Mar 14 2019 09:37:48 AM UTC

duplicates

SERVER-40707 Secondary couldn't signal OplogWaiters to advance the lastCommittedOpTime in Chained replication mode

Closed

Assignee:: Eric Sedor
Reporter:: Adun
Participants:: Adun, Eric Sedor
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Mar 11 2019 08:27:13 AM UTC
Updated:: Apr 22 2019 08:49:38 PM UTC
Resolved:: Apr 22 2019 08:40:04 PM UTC

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

PagerDuty