[SERVER-48348] ninja next is incorrectly generating header file dependencies Created: 21/May/20  Updated: 29/Oct/23  Resolved: 26/May/20

Status: Closed
Project: Core Server
Component/s: Build
Affects Version/s: None
Fix Version/s: 4.4.0-rc7, 4.7.0

Type: Bug Priority: Major - P3
Reporter: Kevin Pulo Assignee: Andrew Morrow (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File rebuild-install-core-ninja-next.txt     Text File rebuild-install-core-ninja-stable.txt    
Issue Links:
Backports
Depends
is depended on by SERVER-47776 Promote ninja-next to stable Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Sprint: Dev Platform 2020-06-01
Participants:

 Description   

This is on 6696c3edd28, with ninja 1.8.2 on Ubuntu 18.04, and a scons cmdline of:

python3 buildscripts/scons.py CC=gcc CXX=g++ --variables-files=etc/scons/mongodbtoolchain_stable_gcc.vars 'CCFLAGS=-Wa,--compress-debug-sections -gsplit-dwarf -fdiagnostics-color' LINKFLAGS=-fuse-ld=gold MONGO_VERSION=4.5.0 MONGO_GIT_HASH=unknown --ssl --detect-odr-violations --link-model=dynamic --dbg=on --install-mode=hygienic ICECC=icecc CCACHE=ccache --ninja=next generate-ninja

and a full working build of install-core.

I made a change to VectorClock in db/vector_clock.h to add an unused protected pure virtual function, and then implementations in db/vector_clock_mongod.cpp, db/vector_clock_trivial.cpp, and s/commands/vector_clock_mongos.cpp:

diff --git a/src/mongo/db/vector_clock.h b/src/mongo/db/vector_clock.h
index b7381652e9c..e4887dcc3c1 100644
--- a/src/mongo/db/vector_clock.h
+++ b/src/mongo/db/vector_clock.h
@@ -122,6 +122,7 @@ protected:
     virtual void _gossipOutExternal(BSONObjBuilder* out) const = 0;
     virtual LogicalTimeArray _gossipInInternal(const BSONObj& in) = 0;
     virtual LogicalTimeArray _gossipInExternal(const BSONObj& in) = 0;
+    virtual bool _somethingUnused() const = 0;
 
     void _gossipOutComponent(BSONObjBuilder* out, VectorTime time, Component component) const;
     void _gossipInComponent(const BSONObj& in, LogicalTimeArray* newTime, Component component);
diff --git a/src/mongo/db/vector_clock_mongod.cpp b/src/mongo/db/vector_clock_mongod.cpp
index 69d2eb2e7c1..534cefbe624 100644
--- a/src/mongo/db/vector_clock_mongod.cpp
+++ b/src/mongo/db/vector_clock_mongod.cpp
@@ -57,6 +57,9 @@ protected:
     void _gossipOutExternal(BSONObjBuilder* out) const override;
     LogicalTimeArray _gossipInInternal(const BSONObj& in) override;
     LogicalTimeArray _gossipInExternal(const BSONObj& in) override;
+    bool _somethingUnused() const override {
+        return true;
+    }
 
 private:
     enum class ReplState {
diff --git a/src/mongo/db/vector_clock_trivial.cpp b/src/mongo/db/vector_clock_trivial.cpp
index 88f00a512ea..93a99a41f96 100644
--- a/src/mongo/db/vector_clock_trivial.cpp
+++ b/src/mongo/db/vector_clock_trivial.cpp
@@ -53,6 +53,9 @@ protected:
     void _gossipOutExternal(BSONObjBuilder* out) const override;
     LogicalTimeArray _gossipInInternal(const BSONObj& in) override;
     LogicalTimeArray _gossipInExternal(const BSONObj& in) override;
+    bool _somethingUnused() const override {
+        return true;
+    }
 };
 
 const auto vectorClockTrivialDecoration = ServiceContext::declareDecoration<VectorClockTrivial>();
diff --git a/src/mongo/s/commands/vector_clock_mongos.cpp b/src/mongo/s/commands/vector_clock_mongos.cpp
index a8d0b0dbc81..2dcb1b59701 100644
--- a/src/mongo/s/commands/vector_clock_mongos.cpp
+++ b/src/mongo/s/commands/vector_clock_mongos.cpp
@@ -50,6 +50,9 @@ protected:
     void _gossipOutExternal(BSONObjBuilder* out) const override;
     LogicalTimeArray _gossipInInternal(const BSONObj& in) override;
     LogicalTimeArray _gossipInExternal(const BSONObj& in) override;
+    bool _somethingUnused() const override {
+        return true;
+    }
 };
 
 const auto vectorClockMongoSDecoration = ServiceContext::declareDecoration<VectorClockMongoS>();

After making this change, running ninja install-core with ninja next rebuilds only 40 targets, whereas stable rebuilds 567. The binaries produced by ninja stable run fine, but those built with ninja next exhibit segfaults and/or other random problems. As a canary test I used:

./buildscripts/resmoke.py run --suite=no_passthrough jstests/noPassthrough/client_disconnect_during_sign_logical_time.js

which reliably crashes (for me) with a MONGO_UNREACHABLE in the JournalFlusher on the first config server. Again, this crash does not happen if the procedure is repeated with ninja stable instead of next.

If the change applied is even further reduced to just:

diff --git a/src/mongo/db/vector_clock.h b/src/mongo/db/vector_clock.h
index b7381652e9c..51a7b075841 100644
--- a/src/mongo/db/vector_clock.h
+++ b/src/mongo/db/vector_clock.h
@@ -122,6 +122,7 @@ protected:
     virtual void _gossipOutExternal(BSONObjBuilder* out) const = 0;
     virtual LogicalTimeArray _gossipInInternal(const BSONObj& in) = 0;
     virtual LogicalTimeArray _gossipInExternal(const BSONObj& in) = 0;
+    bool _somethingUnused() const {}
 
     void _gossipOutComponent(BSONObjBuilder* out, VectorTime time, Component component) const;
     void _gossipInComponent(const BSONObj& in, LogicalTimeArray* newTime, Component component);

then running ninja reports:

ninja: no work to do.

which is clearly incorrect, since the cpp files that #include "mongo/db/vector_clock.h" need to be rebuilt, and everything that depends on them relinked and reinstalled.

Looking at the targets built in each case (for the first patch) confirms this hypothesis — ninja next is correctly rebuilding the changed cpp files, but not the cpp files (and associated libs) that include the changed header file, eg. rpc/metadata.cpp and the rpc/metadata lib.

Since the in-memory layout of the class has changed, the not-rebuilt object files are using the (now incorrect) original layout, which leads to incorrect memory usages, and so explains the segfaults and/or other weird problems.

cc acm



 Comments   
Comment by Githook User [ 26/May/20 ]

Author:

{'name': 'Andrew Morrow', 'email': 'acm@mongodb.com', 'username': 'acmorrow'}

Message: SERVER-48348 Do not sort dwo files in ahead of object files during Ninja generation

(cherry picked from commit 18cbf0d581162b2d15d66577b1fe08fe22006699)
Branch: v4.4
https://github.com/mongodb/mongo/commit/b79b53f55a5c148fd297b81a45c08d08e2cf8f94

Comment by Githook User [ 26/May/20 ]

Author:

{'name': 'Andrew Morrow', 'email': 'acm@mongodb.com', 'username': 'acmorrow'}

Message: SERVER-48348 Do not sort dwo files in ahead of object files during Ninja generation
Branch: master
https://github.com/mongodb/mongo/commit/18cbf0d581162b2d15d66577b1fe08fe22006699

Comment by Githook User [ 26/May/20 ]

Author:

{'name': 'Andrew Morrow', 'email': 'acm@mongodb.com', 'username': 'acmorrow'}

Message: SERVER-48348 Do not sort dwo files in ahead of object files during Ninja generation
Branch: master
https://github.com/mongodb/mongo/commit/18cbf0d581162b2d15d66577b1fe08fe22006699

Comment by Andrew Morrow (Inactive) [ 21/May/20 ]

I can reliably reproduce this only with CCFLAGS=-gsplit-dwarf, and I have identified a fix that I should have out for code review shortly.

Comment by Andrew Morrow (Inactive) [ 21/May/20 ]

kevin.pulo - Does this still reproduce if you remove -gsplit-dwarf from your CCFLAGS?

Generated at Thu Feb 08 05:16:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.