[SERVER-42034] Remove numStr from BSONObjBuilder Created: 01/Jul/19  Updated: 29/Oct/23  Resolved: 29/Jul/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.3.1

Type: Improvement Priority: Major - P3
Reporter: ADAM Martin (Inactive) Assignee: Billy Donahue
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Gantt Dependency
has to be done after SERVER-41989 BSONObjBuilder::asTempObj is not exce... Closed
Backwards Compatibility: Fully Compatible
Sprint: Dev Tools 2019-07-15, Dev Tools 2019-07-29
Participants:

 Comments   
Comment by Githook User [ 29/Jul/19 ]

Author:

{'name': 'Billy Donahue', 'username': 'BillyDonahue', 'email': 'billy.donahue@mongodb.com'}

Message: SERVER-42034 Optimize ItoA

(~13% faster, remove the 1kLoC static table)
Reverts commit 7c9edfd4b931f3d3208142b411010b6b4afee21e.
Python-generate table-emitting macro (compiler workaround).
Branch: master
https://github.com/mongodb/mongo/commit/dae371c478e1a828ac911096d85f94be8e936ef9

Comment by Githook User [ 22/Jul/19 ]

Author:

{'name': 'Ian Boros', 'username': 'puppyofkosh', 'email': 'puppyofkosh@gmail.com'}

Message: Revert "SERVER-42034 Optimize ItoA (~13% faster, remove the 1kLoC static table)"

This reverts commit be18bd1133d6f8118ebb79a2251c4a49b1c84ec1.
Branch: master
https://github.com/mongodb/mongo/commit/7c9edfd4b931f3d3208142b411010b6b4afee21e

Comment by Githook User [ 22/Jul/19 ]

Author:

{'name': 'Billy Donahue', 'username': 'BillyDonahue', 'email': 'billy.donahue@mongodb.com'}

Message: SERVER-42034 Optimize ItoA (~13% faster, remove the 1kLoC static table)
Branch: master
https://github.com/mongodb/mongo/commit/be18bd1133d6f8118ebb79a2251c4a49b1c84ec1

Comment by Githook User [ 09/Jul/19 ]

Author:

{'name': 'Billy Donahue', 'email': 'billy.donahue@mongodb.com', 'username': 'BillyDonahue'}

Message: SERVER-42034 remove BSONObjBuilder::numStr

Callers are better with either DecimalCounter or std::to_string.
Branch: master
https://github.com/mongodb/mongo/commit/f2bb23436a46afacfc8e3dc65fda743fafe6b2c3

Comment by Billy Donahue [ 03/Jul/19 ]

I LGTMed the CR to implement this change because I see several stylistic, performance, and even correctness reasons to get rid of these static strings. I doubt they were providing a realistic performance benefit, but that's not the most important thing.

The most pressing issue to me is the undefined init-time behavior of the numStr() function, relying on assumptions about the C++ implementation that don't hold up to close scrutiny. Specifically, the early read of numStrsReady in BSONObjBuilder::numStr() is undefined behavior invoked on purpose in an effort to protect us from other undefined behavior. The comment above numStrsReady's definition is untrue. Reading it early will yield undefined behavior, not a false value.

Would an implementation be allowed to "prove" to itself that numStrsReady is always true, and just static-initialize it as an optimization? As a template, all of std::basic_string is visible to the compiler, so I think it's possible that numStrsReady offers no protection at all from early accesses to the numStrs array, and it's just a waste of CPU time to look at it. But it's a non-const static duration bool that's initialized to true, but it isn't guaranteed to remain true, so numStr() would still have to waste time looking at it.

Even if this static-init optimization isn't happening, we would still have a race. The initialization of the numStrs array is not ordered wrt potentially early callers of the inline function BSONObjBuilder::numStr(). We are not guaranteed any order at all between definitions in different translation units and we are not guaranteed by C++ that pre-main() is single-threaded. BSONObjBuilder::numStr can be invoked by a static initializer anywhere. It is even allowed to run concurrently with the initialization of numStrsReady or numStrs, and there's no synchronization provided by the numStr function, so it could be read while being updated.

There are other fixes available for these problems, but I think the simplest is to just remove the static strings altogether.
It's not okay to leave the UB and init races in the code, however, so I disagree with the proposal to close this one.

Performance-wise, I think the 0-99 numStrs table will make short arrays quicker to write, but long arrays slower to write. But the benefits for short arrays would be swamped by the other processing that happens while preparing the enclosing object. That is, I think we'd be optimizing a drop in the bucket for short arrays. But for long arrays, we're making the whole bucket slower.

We should just rip BSONObjBuilder::numStr out. The uses internal to BSONObjBuilder are better served by DecimalCounter.

https://docs.google.com/spreadsheets/d/1eOsD8EPCmVTLLfl3_HRG1yYrw2I79oJiDJRb_BdCyks/edit?usp=sharing

 

The 20 or so uses outside BSONObjBuilder have nothing to do with BSONObj. They usually are incrementing a field name and should use DecimalCounter as well.

 

Comment by Andrew Morrow (Inactive) [ 02/Jul/19 ]

I don't think we should do it. I suggest we close this ticket.

Comment by Eric Milkie [ 02/Jul/19 ]

What is the reasoning behind doing this work now – what is blocked behind it?

Generated at Thu Feb 08 04:59:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.