[CXX-584] Design C++11 driver release process Created: 22/Apr/15  Updated: 09/Jul/19  Resolved: 12/Nov/18

Status: Closed
Project: C++ Driver
Component/s: Release
Affects Version/s: None
Fix Version/s: 3.5.0

Type: Task Priority: Major - P3
Reporter: Andrew Morrow (Inactive) Assignee: Roberto Sanchez
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Epic Link: Automate C++ driver release process

 Description   

We have never issued a release from the master branch. We need to figure out what our release process is, and how it integrates with the build system. If possible, I'd like to use a git-describe based workflow, rather than bump/post commits. If we do that, we need to figure out how to make source packages, and how that would work with githubs source tarball generation.



 Comments   
Comment by Githook User [ 12/Nov/18 ]

Author:

{'name': 'Roberto C. Sánchez', 'email': 'roberto@connexer.com', 'username': 'rcsanchez97'}

Message: CXX-584 remove non-existent file from dist manifest
Branch: master
https://github.com/mongodb/mongo-cxx-driver/commit/f96764be682cda389d066b44fc3a1caf20711683

Comment by Githook User [ 12/Nov/18 ]

Author:

{'name': 'Roberto C. Sánchez', 'email': 'roberto@connexer.com', 'username': 'rcsanchez97'}

Message: CXX-584 implement dist/distcheck targets
Branch: master
https://github.com/mongodb/mongo-cxx-driver/commit/22b9e90f6fa0c9529898e1c4efa0a3828b89e08d

Comment by Roberto Sanchez [ 17/Oct/18 ]

It looks like the last remaining discrete task in order to close out this ticket is the implementation of dist/distcheck targets that allow creation of the upstream tarball (similar to what is done in the C Driver).

acm, jesse, kevin.albertson, do you agree?

Comment by Githook User [ 17/Oct/18 ]

Author:

{'name': 'Roberto C. Sánchez', 'email': 'roberto@connexer.com', 'username': 'rcsanchez97'}

Message: CXX-584 implement loading of release version from file
Branch: master
https://github.com/mongodb/mongo-cxx-driver/commit/a1d3d2da4cef22641841746767421cbaaef4b9fd

Comment by Githook User [ 17/Oct/18 ]

Author:

{'name': 'Roberto C. Sánchez', 'email': 'roberto@connexer.com', 'username': 'rcsanchez97'}

Message: CXX-584 implement script to calculate release version
Branch: master
https://github.com/mongodb/mongo-cxx-driver/commit/66c47d586c9ca4cedfb93617f67c32d41217caa7

Comment by Andrew Morrow (Inactive) [ 05/Oct/18 ]

roberto.sanchez - I agree the tarballs should know their version, but I expected that CMake would drive the source tarball creation. I think it sounds like we are going to meet in the middle on this one overall, and that is fine. It is definitely a step forward from where we are now.

Comment by Roberto Sanchez [ 05/Oct/18 ]

acm, after discussing this with jesse he has captured most of my thoughts on this. I am just going to add a few additional details.

  • The reasoning for reading VERSION_CURRENT if available (and also VERSION_RELEASED in the C driver) is that a release tarball should always know its own version number without the user being required to specify it; the release process for both drivers will require generating that file for distribution in the release tarball
  • I like your idea of invoking calc_release_version.py from within CMake if, as Jesse indicated, neither VERSION_CURRENT is available nor the version specified via environment/CMake variables

I will work on making the necessary changes and post them for review.

Jesse and I also discussed that once these changes are finalized I will work on integrating the improvements into the C driver.

Comment by A. Jesse Jiryu Davis [ 02/Oct/18 ]

From conversation w/ Roberto:

  • Agreed we'll add a "MONGOCXX_VERSION" CMake option (we'll just assume BSONCXX_VERSION is always equal)
  • If that option's not set, CMake will try to load the VERSION_CURRENT file
  • Agreed, we'll keep the Python script as a fallback if that option isn't set and the file is absent. CMake's execute_process() calls the script.
  • If Python or GitPython are absent or some other error, set version to 0.0.0, with a warning.
  • Agreed that the LoadVersion script is renamed ParseVersion, it now reads the MONGOCXX_VERSION set from command line or Python script output
  • However, let's allow users to succeed building with version 0.0.0: Even if they don't set the option from the command line and they don't have Python, they should succeed. Update CMake's warning to advise they set MONGOCXX_VERSION option or else make Python and GitPython available.
Comment by Andrew Morrow (Inactive) [ 02/Oct/18 ]

Replying to kevin.albertson -

If we prohibited building with version 0.0.0, then someone cloning the repo would be unable to build without first running calc_release_version.py or explicitly providing the version number.

That would be resolved by rewriting the logic of calc_release_version.py in CMake. But I'd prefer not having to rewrite that in CMake.

Not necessarily. You could keep calc_release_version.py, but simply teach CMake to invoke it to generate a target that was the result of running calc_release_version.py. You could keep most of the logic in python.

Plus, if someone cloned without the git history (e.g. with --depth=1) or downloads a zip of the source, they'd always have to provide an explicit version number.

If they cloned with --depth=1, then yes, they would need to provide an explicit version number. But that is what they asked for! I really think it is dangerous to allow people to produce builds with a version of

{0.0.0}

. End users do build from source, and we need to be able to ask for the version that they are running and ensure we can always get a meaningful answer. Regarding a zip file, I'd assume that any source distributions we produce would also be managed through CMake (I believe the C driver already does this), and could leverage the same mechanisms.

Replying to jesse -

  • The Python is already written
  • The Python is maintainable: a random programmer is more likely to understand moderately complex logic in Python than in CMake

See my thoughts above. You could I think still keep most of the python, but still drive its invocation through CMake.

  • The 0.0.0 version number prevents mistakes: a dependent project that requires a version number (and later, an ABI number) won't accept 0.0.0
  • The CMake script prints instructions for how to run the Python script

I sort of disagree, because I'd consider any build that ended up with 0.0.0 to be a mistake. People build from source all the time.

So let's declare victory with the current setup.

Up to you of course.

Comment by A. Jesse Jiryu Davis [ 02/Oct/18 ]

Drew, your proposal is attractive and I almost agreed when I first read it. But I think that:

  • The Python is already written
  • The Python is maintainable: a random programmer is more likely to understand moderately complex logic in Python than in CMake
  • The 0.0.0 version number prevents mistakes: a dependent project that requires a version number (and later, an ABI number) won't accept 0.0.0
  • The CMake script prints instructions for how to run the Python script

So let's declare victory with the current setup.

Comment by Kevin Albertson [ 02/Oct/18 ]

If we prohibited building with version 0.0.0, then someone cloning the repo would be unable to build without first running calc_release_version.py or explicitly providing the version number.

That would be resolved by rewriting the logic of calc_release_version.py in CMake. But I'd prefer not having to rewrite that in CMake.

Plus, if someone cloned without the git history (e.g. with --depth=1) or downloads a zip of the source, they'd always have to provide an explicit version number.

Comment by Roberto Sanchez [ 01/Oct/18 ]

acm, thanks! I will try to digest this and provide a response later today.

Comment by Andrew Morrow (Inactive) [ 01/Oct/18 ]

Per request from jesse, I'm copying my overall thoughts on the approach taken so far, so that we can consider whether we want to adopt none, some, or all of my suggestions:

  • My first suggestion is that I would remove the file reading from LoadVersion, and rename it to ParseVersion. I would then define new CMake flags for the command line called MONGOCXX_VERSION and BSONCXX_VERSION. Then use ParseVersionto decompose those into the version subcomponents.
  • If you do the above, then do the build process as follows:

    python etc/calc_release_version.py > build/VERSION_CURRENT
    cd build
    cmake -DMONGOCXX_VERSION=$(cat VERSION_CURRENT) -DBSONCXX_VERSION=$(cat
    VERSION_CURRENT) ...
    

    I'd do this because that way people who are not us who are packaging the driver can set the version to whatever works for them, in the event that the {{calc_release_version} script isn't appropriate for their needs.

  • I would also simply refuse to build if a version was not provided, or one was provided that didn't parse. We don't want people building and getting 0.0.0, and that definitely will happen if we allow builds to go forward without a version.
  • Next, and this is admittedly harder, I think I would try to eliminate the python script entirely, and instead do the git machinery and derivation inside
    CMake. In particular, I would do it in a way that would be respected inside the generated build system, such that if you were to edit, build, commit, tag, and re-build, say with ninja, you would get the updated version on the rebuild. The version computed this way would only be used if explicit values for {MONGO,BSON}CXX_VERSION were not provided. Overall, I think it is important that simply running

    git clone ...
    cd build
    cmake ..
    make all && make install
    

    gets you a valid version with no other steps or flags needed.

  • On additional thing to consider is that I think we are also going to want a tag based mechanism for bumping ABI.
Comment by A. Jesse Jiryu Davis [ 11/Sep/18 ]

Sounds good to me!

Comment by Roberto Sanchez [ 11/Sep/18 ]

So, here is what I am thinking. The algorithm would look like this:

  1. Is the current branch master?
  2. If yes the current branch is master, then inspect the branches that fit the convention for a release branch and choose the latest; increment the minor version and append .0 to form the new version (e.g., releases/v3.3 becomes 3.4.0) and append a pre-release marker (I favor one based on the Git commit ID)
  3. If no the current branch is not master, then use the git tag command you suggested to determine the most recent tag in history; strip any pre-release marker, increment the patch version and append a new pre-release marker

Once the version has been determined, place the version number in a file which will be distributed with the release tarball. The CMakeLists.txt will then be modified to load the version number from this file.

This makes it so that any build of off master will have a pre-release version later than the latest release branch. The only "bad" think about this approach is that if you check out to an old commit on master (that still has these changes implemented), then you could end up with a version number that seems "too new." However, the easy workaround is to checkout the commit to a new local branch and work from there, then the logic of the algorithm will produce sensible results.

I thought about all sorts other possibilities including walking backward in history to find the most recent branch and use that to help determine the version number. However, it quickly became over-complicated. As I tried to simplify and limit the scope of the problem we are trying to solve, I came with "builds from the HEAD of master and the HEAD of each release branch should produce sensible version numbers." Depending on the details of the tagging strategy and where topic branches get created, we may end up with some weird version numbers getting generated for topic branches, but that seems OK. If we decide we want sensible numbers on a topic branch, it is as simple as manually tagging some spot in its history with an appropriate version number that allows the algorithm to produce something sensible.

In any event, we can discuss the details in more depth if any of this seems unclear.

Comment by Roberto Sanchez [ 11/Sep/18 ]

So, I have looked at this I think that your suggestion could be improved. I have some ideas that I am working on and I will write them up here in another comment or discuss them with you on the call (depending on which happens first).

Comment by A. Jesse Jiryu Davis [ 10/Sep/18 ]

Whatever the process has been for C++ so far, I expect we'll now follow for both projects the following process for minor releases:

  • branch from master
  • make a few commits with final preparations for a minor release
  • tag the release on the branch: for C, that's a "1.2.3" tag on "r1.2" branch, for C++ that's a "r1.2.3" tag on "releases/v1.2" branch

For patch releases:

  • make a few commits on the branch ("r1.2" or "releases/v1.2") to fix bugs
  • tag the release on the branch: "1.2.4" on "r1.2" for C, "r1.2.4" on "releases/v1.2" for C++

I'm fine with the idea of sensible version numbers. You could get the latest actual release on this branch like:

# C Driver
git tag --merged HEAD --list '1.*' --sort 'version:refname' | tail -n 1
# C++
git tag --merged HEAD --list 'r*' --sort 'version:refname' | tail -n 1

... and then generate the next version. I suggest a Python script for that logic if the Bash for it isn't legible.

Comment by Roberto Sanchez [ 10/Sep/18 ]

We can probably make it work with dummy version numbers, but we will need to come up with a way to programmatically distinguish release commits from non-release commits. As long as the workflow is "tag the .0 release on master and then branch" it should never be an issue. We would be able to apply a simple and consistent algorithm for determining the "next" version and we would not need dummy version numbers. The advantage to doing that over using a non-related dummy version is that it will allow looking back historically to make sense by comparing version numbers of builds along different branches. The advantage of dummy version numbers is that we would never accidentally mistake them for a release (though I think such a mistake would be highly unlikely). I favor sensible version numbers over dummy version numbers, but only if the release workflow supports the tag-then-branch strategy I described.

Comment by A. Jesse Jiryu Davis [ 10/Sep/18 ]

Thanks Roberto. Does it matter what version number we choose when building the driver from a non-release commit, or during a patch build? Could we say "if this commit is not tagged as a release, or if this is a patch build, the version number is 1.2.3"? I don't think version numbers need to be unique: one Evergreen task doesn't care if another Evergreen task used the same version number. I also don't think the version number must be greater than all actual releases.

If it is important to generate a release number greater than previous ones we could choose 100.0.0. Tell me if I'm missing something!

Comment by Roberto Sanchez [ 09/Sep/18 ]

jesse, I have read the ticket summary and Drew's proposal as well. The outline you provided is on target.

The only thing that I would point out as far as the versioning based on "git describe" is that it is sensitive to tag location. What I mean is that version numbering could be a bit wonky on master if the release tags end up on the branch (as is the case with the 1.12.0 tag for the C Driver). For example, git describe --tags on C Driver's master branch gives me this: 1.11.0-215-g6ea7737dc. That seems wrong to me since 1.12.0 is out and we currently have a 1.13.0-dev pre-release version on master.

The C++ driver is in slightly better shape. The r3.3.0 tag was made on a commit that is a linear ancestor of master, so the describe output there is more sensible: r3.3.0-42-gca7eee19c. However, the r3.3.1 tag was made on a branch.

It is clear that for an actual release that the describe output will work nicely. However, I am curious about what strategy we should use to handle version numbers for commits after a release (i.e., since version 3.3.0 of C++ driver has been release, should builds from the current master use a "next" pre-release version of some sort?).

There is also the matter of how to handle versioning for Evergreen patch builds. For example, If you tag 3.4.0, then we can easily determine that the version is 3.4.0 and fill the variables appropriately. However, all the Evergreen patch builds with that commit as their base will also have version 3.4.0. Should our strategy account for that and use the "next pre-release" version for those builds as well?

Along with the "next pre-release" version concept, we need to decide if the "next" is different depending on whether we are on master or another branch. For example, I can see that on master "next" might be 3.4.0-pre-something, while on releases/v3.3 it might be 3.3.2-pre-something.

Once we have sort these things out, I can start implementing a solution.

After that is done, it should be straightforward to adapt the MakeDist components from the C driver to work with the C++ Driver repository.

Comment by A. Jesse Jiryu Davis [ 07/Sep/18 ]

roberto.sanchez when you're ready to start on this, I think the first steps are:

  • Read Drew's proposal, above.
  • Write CMake script to set the version number in BSONCXX_VERSION_MAJOR etc. from "git describe". I declare that the bsoncxx and mongocxx version numbers will always be the same for a given commit, so we can just take the version number for both from the git tag.
  • Generate a release archive for the C++ Driver using CMake, similar to how we do it for the C Driver.

I realize we're adding the same features to the C and C++ driver release scripts but in adding them in a different order. We can rationalize this as we go along.

Comment by Andrew Morrow (Inactive) [ 14/Jan/16 ]

We are using the BUMP/post release style for the 3.0 GA. For 3.1.0, I'd like us to move to 'bumpless' releases where we use git-describe to generate the version number. However, this isn't required for 3.0.0.

Generated at Wed Feb 07 21:59:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.