Discussion:
Proposal for changing Mahout's Git branching rules
Pat Ferrel
2017-04-22 17:06:45 UTC
Permalink
I’ve been introduced to what is now being called git-flow, which at it’s simplest is just a branching strategy with several key benefits. The most important part of it is that the master branch is rock solid all the time because we use the “develop” branch for integrating Jiras, PRs, features, etc. Any “rock solid” bit can be cherry-picked and put into master or hot-fixes that fix a release but still require a source build.

Key features of git-flow:
The master becomes stable and can be relied on to be stable. It is generally equal to the last release with only stable or required exceptions.
Develop is where all the integration and potentially risky work happens. It is where most PRs are targeted.
A release causes develop to be merged with master and so it maintains the stability of master.

The benefits of git-flow are more numerous but also seem scary because the explanation can be complex. I’ve switched all my projects and Apache PredictionIO is where I was introduced to this, and it is actually quite easy to manage and collaborate with this model. We just need to take the plunge by creating a persistent branch in the Apache git repo called “develop”. From then on all commits will go to “develop” and all PRs should be created against it. Just after a release is a good time for this.

https://datasift.github.io/gitflow/IntroducingGitFlow.html <https://datasift.github.io/gitflow/IntroducingGitFlow.html>

What say you all?
Andrew Musselman
2017-04-22 17:25:28 UTC
Permalink
I've worked in shops where that was the standard flow, in hg or git, and it
worked great. I'm in favor of it especially as we add contributors and make
it easier for people to submit new work.

Have we had that many times when master got messed up? I don't recall more
than a few, but in any case the master/dev branch approach is solid.
Post by Pat Ferrel
I’ve been introduced to what is now being called git-flow, which at it’s
simplest is just a branching strategy with several key benefits. The most
important part of it is that the master branch is rock solid all the time
because we use the “develop” branch for integrating Jiras, PRs, features,
etc. Any “rock solid” bit can be cherry-picked and put into master or
hot-fixes that fix a release but still require a source build.
The master becomes stable and can be relied on to be stable. It is
generally equal to the last release with only stable or required exceptions.
Develop is where all the integration and potentially risky work happens.
It is where most PRs are targeted.
A release causes develop to be merged with master and so it maintains the
stability of master.
The benefits of git-flow are more numerous but also seem scary because the
explanation can be complex. I’ve switched all my projects and Apache
PredictionIO is where I was introduced to this, and it is actually quite
easy to manage and collaborate with this model. We just need to take the
plunge by creating a persistent branch in the Apache git repo called
“develop”. From then on all commits will go to “develop” and all PRs should
be created against it. Just after a release is a good time for this.
https://datasift.github.io/gitflow/IntroducingGitFlow.html <
https://datasift.github.io/gitflow/IntroducingGitFlow.html>
What say you all?
Pat Ferrel
2017-04-22 17:30:44 UTC
Permalink
It hasn't been often but I’ve been bit by it and had to ask users of a dependent project to checkout a specific commit, nasty.

The main affect would be to automation efforts that are currently wip.

On Apr 22, 2017, at 10:25 AM, Andrew Musselman <***@gmail.com> wrote:

I've worked in shops where that was the standard flow, in hg or git, and it
worked great. I'm in favor of it especially as we add contributors and make
it easier for people to submit new work.

Have we had that many times when master got messed up? I don't recall more
than a few, but in any case the master/dev branch approach is solid.
I’ve been introduced to what is now being called git-flow, which at it’s
simplest is just a branching strategy with several key benefits. The most
important part of it is that the master branch is rock solid all the time
because we use the “develop” branch for integrating Jiras, PRs, features,
etc. Any “rock solid” bit can be cherry-picked and put into master or
hot-fixes that fix a release but still require a source build.
The master becomes stable and can be relied on to be stable. It is
generally equal to the last release with only stable or required exceptions.
Develop is where all the integration and potentially risky work happens.
It is where most PRs are targeted.
A release causes develop to be merged with master and so it maintains the
stability of master.
The benefits of git-flow are more numerous but also seem scary because the
explanation can be complex. I’ve switched all my projects and Apache
PredictionIO is where I was introduced to this, and it is actually quite
easy to manage and collaborate with this model. We just need to take the
plunge by creating a persistent branch in the Apache git repo called
“develop”. From then on all commits will go to “develop” and all PRs should
be created against it. Just after a release is a good time for this.
https://datasift.github.io/gitflow/IntroducingGitFlow.html <
https://datasift.github.io/gitflow/IntroducingGitFlow.html>
What say you all?
Andrew Musselman
2017-04-22 17:33:18 UTC
Permalink
Cool, I'll make a new dev branch now.

Dev, develop, any preference?
It hasn't been often but I’ve been bit by it and had to ask users of a
dependent project to checkout a specific commit, nasty.
The main affect would be to automation efforts that are currently wip.
I've worked in shops where that was the standard flow, in hg or git, and it
worked great. I'm in favor of it especially as we add contributors and make
it easier for people to submit new work.
Have we had that many times when master got messed up? I don't recall more
than a few, but in any case the master/dev branch approach is solid.
Post by Pat Ferrel
I’ve been introduced to what is now being called git-flow, which at it’s
simplest is just a branching strategy with several key benefits. The most
important part of it is that the master branch is rock solid all the time
because we use the “develop” branch for integrating Jiras, PRs, features,
etc. Any “rock solid” bit can be cherry-picked and put into master or
hot-fixes that fix a release but still require a source build.
The master becomes stable and can be relied on to be stable. It is
generally equal to the last release with only stable or required
exceptions.
Post by Pat Ferrel
Develop is where all the integration and potentially risky work happens.
It is where most PRs are targeted.
A release causes develop to be merged with master and so it maintains the
stability of master.
The benefits of git-flow are more numerous but also seem scary because
the
Post by Pat Ferrel
explanation can be complex. I’ve switched all my projects and Apache
PredictionIO is where I was introduced to this, and it is actually quite
easy to manage and collaborate with this model. We just need to take the
plunge by creating a persistent branch in the Apache git repo called
“develop”. From then on all commits will go to “develop” and all PRs
should
Post by Pat Ferrel
be created against it. Just after a release is a good time for this.
https://datasift.github.io/gitflow/IntroducingGitFlow.html <
https://datasift.github.io/gitflow/IntroducingGitFlow.html>
What say you all?
Pat Ferrel
2017-04-22 17:36:09 UTC
Permalink
There are tools to implement git-flow that I haven’t used and may have some standardization built in but I think “develop” is typical and safe.


On Apr 22, 2017, at 10:33 AM, Andrew Musselman <***@gmail.com> wrote:

Cool, I'll make a new dev branch now.

Dev, develop, any preference?
Post by Pat Ferrel
It hasn't been often but I’ve been bit by it and had to ask users of a
dependent project to checkout a specific commit, nasty.
The main affect would be to automation efforts that are currently wip.
I've worked in shops where that was the standard flow, in hg or git, and it
worked great. I'm in favor of it especially as we add contributors and make
it easier for people to submit new work.
Have we had that many times when master got messed up? I don't recall more
than a few, but in any case the master/dev branch approach is solid.
I’ve been introduced to what is now being called git-flow, which at it’s
simplest is just a branching strategy with several key benefits. The most
important part of it is that the master branch is rock solid all the time
because we use the “develop” branch for integrating Jiras, PRs, features,
etc. Any “rock solid” bit can be cherry-picked and put into master or
hot-fixes that fix a release but still require a source build.
The master becomes stable and can be relied on to be stable. It is
generally equal to the last release with only stable or required
exceptions.
Develop is where all the integration and potentially risky work happens.
It is where most PRs are targeted.
A release causes develop to be merged with master and so it maintains the
stability of master.
The benefits of git-flow are more numerous but also seem scary because
the
explanation can be complex. I’ve switched all my projects and Apache
PredictionIO is where I was introduced to this, and it is actually quite
easy to manage and collaborate with this model. We just need to take the
plunge by creating a persistent branch in the Apache git repo called
“develop”. From then on all commits will go to “develop” and all PRs
should
be created against it. Just after a release is a good time for this.
https://datasift.github.io/gitflow/IntroducingGitFlow.html <
https://datasift.github.io/gitflow/IntroducingGitFlow.html>
What say you all?
Andrew Musselman
2017-04-22 17:42:57 UTC
Permalink
Okay develop it is; I'll cut a develop branch from master right now.

As we go, if people forget and push to master, we can merge those changes
into develop.

In addition, I'm making a 'website' branch for all work on the new version
of the site.
There are tools to implement git-flow that I haven’t used and may have
some standardization built in but I think “develop” is typical and safe.
Cool, I'll make a new dev branch now.
Dev, develop, any preference?
It hasn't been often but I’ve been bit by it and had to ask users of a
dependent project to checkout a specific commit, nasty.
The main affect would be to automation efforts that are currently wip.
On Apr 22, 2017, at 10:25 AM, Andrew Musselman <
I've worked in shops where that was the standard flow, in hg or git, and
it
worked great. I'm in favor of it especially as we add contributors and
make
it easier for people to submit new work.
Have we had that many times when master got messed up? I don't recall
more
than a few, but in any case the master/dev branch approach is solid.
Post by Pat Ferrel
I’ve been introduced to what is now being called git-flow, which at it’s
simplest is just a branching strategy with several key benefits. The
most
Post by Pat Ferrel
important part of it is that the master branch is rock solid all the
time
Post by Pat Ferrel
because we use the “develop” branch for integrating Jiras, PRs,
features,
Post by Pat Ferrel
etc. Any “rock solid” bit can be cherry-picked and put into master or
hot-fixes that fix a release but still require a source build.
The master becomes stable and can be relied on to be stable. It is
generally equal to the last release with only stable or required
exceptions.
Post by Pat Ferrel
Develop is where all the integration and potentially risky work happens.
It is where most PRs are targeted.
A release causes develop to be merged with master and so it maintains
the
Post by Pat Ferrel
stability of master.
The benefits of git-flow are more numerous but also seem scary because
the
Post by Pat Ferrel
explanation can be complex. I’ve switched all my projects and Apache
PredictionIO is where I was introduced to this, and it is actually quite
easy to manage and collaborate with this model. We just need to take the
plunge by creating a persistent branch in the Apache git repo called
“develop”. From then on all commits will go to “develop” and all PRs
should
Post by Pat Ferrel
be created against it. Just after a release is a good time for this.
https://datasift.github.io/gitflow/IntroducingGitFlow.html <
https://datasift.github.io/gitflow/IntroducingGitFlow.html>
What say you all?
Pat Ferrel
2017-06-20 00:52:42 UTC
Permalink
I just heard we are not using git flow (the process not the tool), we are checking unclean (untested in any significant way) changes to master? What is the develop branch used for?

The master is unstable most all the time with the old method, in fact there is *no stable bundle of source ever* without git flow. With git flow you can peel off a bug fix and merge with master and users can pull it expecting that everything else is stable and like the last build. This has bit me with Mahout in the past as I’m sure it has for everyone. This doesn’t fix that but it does limit the pain to committers.

If we aren’t going to use it, fine but let’s not agree to it then do something else. If it’s a matter of timing ok, I understood from Andrew’s mail below there was no timing issue but I expect there will be Jenkins or Travis issues to iron out.

For reference: http://nvie.com/posts/a-successful-git-branching-model/ <http://nvie.com/posts/a-successful-git-branching-model/> I have never heard of someone who has tried it that didn’t like it but it takes a leap of faith unless you have git in your bones.


On Apr 22, 2017, at 10:42 AM, Andrew Musselman <***@gmail.com> wrote:

Okay develop it is; I'll cut a develop branch from master right now.

As we go, if people forget and push to master, we can merge those changes
into develop.

In addition, I'm making a 'website' branch for all work on the new version
of the site.
There are tools to implement git-flow that I haven’t used and may have
some standardization built in but I think “develop” is typical and safe.
Cool, I'll make a new dev branch now.
Dev, develop, any preference?
It hasn't been often but I’ve been bit by it and had to ask users of a
dependent project to checkout a specific commit, nasty.
The main affect would be to automation efforts that are currently wip.
On Apr 22, 2017, at 10:25 AM, Andrew Musselman <
I've worked in shops where that was the standard flow, in hg or git, and
it
worked great. I'm in favor of it especially as we add contributors and
make
it easier for people to submit new work.
Have we had that many times when master got messed up? I don't recall
more
than a few, but in any case the master/dev branch approach is solid.
Post by Pat Ferrel
I’ve been introduced to what is now being called git-flow, which at it’s
simplest is just a branching strategy with several key benefits. The
most
Post by Pat Ferrel
important part of it is that the master branch is rock solid all the
time
Post by Pat Ferrel
because we use the “develop” branch for integrating Jiras, PRs,
features,
Post by Pat Ferrel
etc. Any “rock solid” bit can be cherry-picked and put into master or
hot-fixes that fix a release but still require a source build.
The master becomes stable and can be relied on to be stable. It is
generally equal to the last release with only stable or required
exceptions.
Post by Pat Ferrel
Develop is where all the integration and potentially risky work happens.
It is where most PRs are targeted.
A release causes develop to be merged with master and so it maintains
the
Post by Pat Ferrel
stability of master.
The benefits of git-flow are more numerous but also seem scary because
the
Post by Pat Ferrel
explanation can be complex. I’ve switched all my projects and Apache
PredictionIO is where I was introduced to this, and it is actually quite
easy to manage and collaborate with this model. We just need to take the
plunge by creating a persistent branch in the Apache git repo called
“develop”. From then on all commits will go to “develop” and all PRs
should
Post by Pat Ferrel
be created against it. Just after a release is a good time for this.
https://datasift.github.io/gitflow/IntroducingGitFlow.html <
https://datasift.github.io/gitflow/IntroducingGitFlow.html>
What say you all?
Pat Ferrel
2017-06-20 01:02:03 UTC
Permalink
Perhaps there is a misunderstanding about where a release comes from—master. So any release tools we have should work fine. It’s just that until you are ready to pull the trigger, development is in develop or more strictly a “getting a release ready” branch called a release branch. This sounds like a lot of branches but in practice it’s trivial to merge and purge. Everything stays clean and rapid fire last minute fixes are isolated to the release branch before going into master.

The original reason I brought this up is that our Git tools now allow committers to delete old cruft laden branches that are created and ephemeral with this method.


On Jun 19, 2017, at 5:52 PM, Pat Ferrel <***@occamsmachete.com> wrote:

I just heard we are not using git flow (the process not the tool), we are checking unclean (untested in any significant way) changes to master? What is the develop branch used for?

The master is unstable most all the time with the old method, in fact there is *no stable bundle of source ever* without git flow. With git flow you can peel off a bug fix and merge with master and users can pull it expecting that everything else is stable and like the last build. This has bit me with Mahout in the past as I’m sure it has for everyone. This doesn’t fix that but it does limit the pain to committers.

If we aren’t going to use it, fine but let’s not agree to it then do something else. If it’s a matter of timing ok, I understood from Andrew’s mail below there was no timing issue but I expect there will be Jenkins or Travis issues to iron out.

For reference: http://nvie.com/posts/a-successful-git-branching-model/ <http://nvie.com/posts/a-successful-git-branching-model/> I have never heard of someone who has tried it that didn’t like it but it takes a leap of faith unless you have git in your bones.


On Apr 22, 2017, at 10:42 AM, Andrew Musselman <***@gmail.com> wrote:

Okay develop it is; I'll cut a develop branch from master right now.

As we go, if people forget and push to master, we can merge those changes
into develop.

In addition, I'm making a 'website' branch for all work on the new version
of the site.
Post by Pat Ferrel
There are tools to implement git-flow that I haven’t used and may have
some standardization built in but I think “develop” is typical and safe.
Cool, I'll make a new dev branch now.
Dev, develop, any preference?
Post by Pat Ferrel
It hasn't been often but I’ve been bit by it and had to ask users of a
dependent project to checkout a specific commit, nasty.
The main affect would be to automation efforts that are currently wip.
On Apr 22, 2017, at 10:25 AM, Andrew Musselman <
I've worked in shops where that was the standard flow, in hg or git, and
it
Post by Pat Ferrel
worked great. I'm in favor of it especially as we add contributors and
make
Post by Pat Ferrel
it easier for people to submit new work.
Have we had that many times when master got messed up? I don't recall
more
Post by Pat Ferrel
than a few, but in any case the master/dev branch approach is solid.
I’ve been introduced to what is now being called git-flow, which at it’s
simplest is just a branching strategy with several key benefits. The
most
Post by Pat Ferrel
important part of it is that the master branch is rock solid all the
time
Post by Pat Ferrel
because we use the “develop” branch for integrating Jiras, PRs,
features,
Post by Pat Ferrel
etc. Any “rock solid” bit can be cherry-picked and put into master or
hot-fixes that fix a release but still require a source build.
The master becomes stable and can be relied on to be stable. It is
generally equal to the last release with only stable or required
exceptions.
Develop is where all the integration and potentially risky work happens.
It is where most PRs are targeted.
A release causes develop to be merged with master and so it maintains
the
Post by Pat Ferrel
stability of master.
The benefits of git-flow are more numerous but also seem scary because
the
explanation can be complex. I’ve switched all my projects and Apache
PredictionIO is where I was introduced to this, and it is actually quite
easy to manage and collaborate with this model. We just need to take the
plunge by creating a persistent branch in the Apache git repo called
“develop”. From then on all commits will go to “develop” and all PRs
should
be created against it. Just after a release is a good time for this.
https://datasift.github.io/gitflow/IntroducingGitFlow.html <
https://datasift.github.io/gitflow/IntroducingGitFlow.html>
What say you all?
Trevor Grant
2017-06-20 04:04:10 UTC
Permalink
First issue, one does not simply just start using a develop branch. CI
only triggers off the 'main' branch, which is master by default. If we
move to the way you propose, then we need to file a ticket with INFRA I
believe. That can be done, but its not like we just start doing it one
day.

The current method is, when we cut a release- we make a new branch of that
release. Master is treated like dev. If you want the latest stable, you
would check out branch-0.13.0 . This is the way most major projects
(citing Spark, Flink, Zeppelin), including Mahout up to version 0.10.x
worked. To your point, there being a lack of a recent stable- that's fair,
but partly that's because no one created branches with the release for
0.10.? - 0.12.2.

For all intents and purposes, we are (now once again) following what you
propose, the only difference is we are treating master as dev, and
"branch-0.13.0" as master (e.g. last stable). Larger features go on their
own branch until they are ready to merge- e.g. ATM there is just one
feature branch CUDA. That was the big take away from this discussion last
time- there needed to be feature branches, as opposed to everyone running
around either working off WIP PRs or half baked merges, etc. To that end-
"website" was a feature branch, and iirc there has been one other feature
branch that has merged in the last couple of months but I forget what it
was at the moment.






Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things." -Virgil*
Post by Pat Ferrel
Perhaps there is a misunderstanding about where a release comes
from—master. So any release tools we have should work fine. It’s just that
until you are ready to pull the trigger, development is in develop or more
strictly a “getting a release ready” branch called a release branch. This
sounds like a lot of branches but in practice it’s trivial to merge and
purge. Everything stays clean and rapid fire last minute fixes are isolated
to the release branch before going into master.
The original reason I brought this up is that our Git tools now allow
committers to delete old cruft laden branches that are created and
ephemeral with this method.
I just heard we are not using git flow (the process not the tool), we are
checking unclean (untested in any significant way) changes to master? What
is the develop branch used for?
The master is unstable most all the time with the old method, in fact
there is *no stable bundle of source ever* without git flow. With git flow
you can peel off a bug fix and merge with master and users can pull it
expecting that everything else is stable and like the last build. This has
bit me with Mahout in the past as I’m sure it has for everyone. This
doesn’t fix that but it does limit the pain to committers.
If we aren’t going to use it, fine but let’s not agree to it then do
something else. If it’s a matter of timing ok, I understood from Andrew’s
mail below there was no timing issue but I expect there will be Jenkins or
Travis issues to iron out.
For reference: http://nvie.com/posts/a-successful-git-branching-model/ <
http://nvie.com/posts/a-successful-git-branching-model/> I have never
heard of someone who has tried it that didn’t like it but it takes a leap
of faith unless you have git in your bones.
Okay develop it is; I'll cut a develop branch from master right now.
As we go, if people forget and push to master, we can merge those changes
into develop.
In addition, I'm making a 'website' branch for all work on the new version
of the site.
There are tools to implement git-flow that I haven’t used and may have
some standardization built in but I think “develop” is typical and safe.
On Apr 22, 2017, at 10:33 AM, Andrew Musselman <
Cool, I'll make a new dev branch now.
Dev, develop, any preference?
It hasn't been often but I’ve been bit by it and had to ask users of a
dependent project to checkout a specific commit, nasty.
The main affect would be to automation efforts that are currently wip.
On Apr 22, 2017, at 10:25 AM, Andrew Musselman <
I've worked in shops where that was the standard flow, in hg or git, and
it
worked great. I'm in favor of it especially as we add contributors and
make
it easier for people to submit new work.
Have we had that many times when master got messed up? I don't recall
more
than a few, but in any case the master/dev branch approach is solid.
Post by Pat Ferrel
I’ve been introduced to what is now being called git-flow, which at
it’s
Post by Pat Ferrel
simplest is just a branching strategy with several key benefits. The
most
Post by Pat Ferrel
important part of it is that the master branch is rock solid all the
time
Post by Pat Ferrel
because we use the “develop” branch for integrating Jiras, PRs,
features,
Post by Pat Ferrel
etc. Any “rock solid” bit can be cherry-picked and put into master or
hot-fixes that fix a release but still require a source build.
The master becomes stable and can be relied on to be stable. It is
generally equal to the last release with only stable or required
exceptions.
Post by Pat Ferrel
Develop is where all the integration and potentially risky work
happens.
Post by Pat Ferrel
It is where most PRs are targeted.
A release causes develop to be merged with master and so it maintains
the
Post by Pat Ferrel
stability of master.
The benefits of git-flow are more numerous but also seem scary because
the
Post by Pat Ferrel
explanation can be complex. I’ve switched all my projects and Apache
PredictionIO is where I was introduced to this, and it is actually
quite
Post by Pat Ferrel
easy to manage and collaborate with this model. We just need to take
the
Post by Pat Ferrel
plunge by creating a persistent branch in the Apache git repo called
“develop”. From then on all commits will go to “develop” and all PRs
should
Post by Pat Ferrel
be created against it. Just after a release is a good time for this.
https://datasift.github.io/gitflow/IntroducingGitFlow.html <
https://datasift.github.io/gitflow/IntroducingGitFlow.html>
What say you all?
Dmitriy Lyubimov
2017-06-21 21:06:25 UTC
Permalink
so people need to make sure their PR merges to develop instead of master?
Do they need to PR against develop branch, and if not, who is responsible
for confict resolution then that is to arise from diffing and merging into
different targets?
As I said I was sure there would be Jenkins issues but they must be small
since it’s just renaming of target branches. Releases are still made from
master so I don’t see the issue there at all. Only intermediate CI tasks
are triggered on other branches. But they would have to be in your examples
too so I don’t see the benefit of using an ad hoc method in terms of CI.
We’ve used this method for years with Apache PredictionIO with minimal CI
issues.
No the process below is not equivalent, treating master as develop removes
the primary (in my mind) benefit. In git flow the master is always stable
and the reflection of the last primary/core/default release with only
critical inter-release fixes. If someone wants to work with stable
up-to-date source, where do they go with the current process? I would claim
that there actually may be no place to find such a thing except by tracking
down some working commit number. It would depend on what stage the project
is in, in git flow there is never a question—master is always stable. Git
flow also accounts for all the process exceptions and complexities you
mention below but in a standardized way that is documented so anyone can
read the rules and follow them. We/Mahout doesn’t even have to write them,
they can just be referenced.
But we are re-arguing something I thought was already voted on and that is
another issue. If we need to re-debate this let’s make it stick one way or
the other.
I really appreciate you being release master and the thought and work
you’ve put into this and if we decide to stick with it, fine. But it should
be a project decision that release masters follow, not up to each release
master. We are now embarking on a much more complex release than before
with multiple combinations of dependencies for binaries and so multiple
artifacts. We need to make the effort tame the complexity somehow or it
will just multiply.
Given the short nature of the current point release I’d even suggest that
we target putting our decision in practice after the release, which is a
better time to make a change if we are to do so.
First issue, one does not simply just start using a develop branch. CI
only triggers off the 'main' branch, which is master by default. If we
move to the way you propose, then we need to file a ticket with INFRA I
believe. That can be done, but its not like we just start doing it one
day.
The current method is, when we cut a release- we make a new branch of that
release. Master is treated like dev. If you want the latest stable, you
would check out branch-0.13.0 . This is the way most major projects
(citing Spark, Flink, Zeppelin), including Mahout up to version 0.10.x
worked. To your point, there being a lack of a recent stable- that's fair,
but partly that's because no one created branches with the release for
0.10.? - 0.12.2.
For all intents and purposes, we are (now once again) following what you
propose, the only difference is we are treating master as dev, and
"branch-0.13.0" as master (e.g. last stable). Larger features go on their
own branch until they are ready to merge- e.g. ATM there is just one
feature branch CUDA. That was the big take away from this discussion last
time- there needed to be feature branches, as opposed to everyone running
around either working off WIP PRs or half baked merges, etc. To that end-
"website" was a feature branch, and iirc there has been one other feature
branch that has merged in the last couple of months but I forget what it
was at the moment.
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things." -Virgil*
Post by Pat Ferrel
Perhaps there is a misunderstanding about where a release comes
from—master. So any release tools we have should work fine. It’s just
that
Post by Pat Ferrel
until you are ready to pull the trigger, development is in develop or
more
Post by Pat Ferrel
strictly a “getting a release ready” branch called a release branch. This
sounds like a lot of branches but in practice it’s trivial to merge and
purge. Everything stays clean and rapid fire last minute fixes are
isolated
Post by Pat Ferrel
to the release branch before going into master.
The original reason I brought this up is that our Git tools now allow
committers to delete old cruft laden branches that are created and
ephemeral with this method.
I just heard we are not using git flow (the process not the tool), we are
checking unclean (untested in any significant way) changes to master?
What
Post by Pat Ferrel
is the develop branch used for?
The master is unstable most all the time with the old method, in fact
there is *no stable bundle of source ever* without git flow. With git
flow
Post by Pat Ferrel
you can peel off a bug fix and merge with master and users can pull it
expecting that everything else is stable and like the last build. This
has
Post by Pat Ferrel
bit me with Mahout in the past as I’m sure it has for everyone. This
doesn’t fix that but it does limit the pain to committers.
If we aren’t going to use it, fine but let’s not agree to it then do
something else. If it’s a matter of timing ok, I understood from Andrew’s
mail below there was no timing issue but I expect there will be Jenkins
or
Post by Pat Ferrel
Travis issues to iron out.
For reference: http://nvie.com/posts/a-successful-git-branching-model/ <
http://nvie.com/posts/a-successful-git-branching-model/> I have never
heard of someone who has tried it that didn’t like it but it takes a leap
of faith unless you have git in your bones.
On Apr 22, 2017, at 10:42 AM, Andrew Musselman <
Okay develop it is; I'll cut a develop branch from master right now.
As we go, if people forget and push to master, we can merge those changes
into develop.
In addition, I'm making a 'website' branch for all work on the new
version
Post by Pat Ferrel
of the site.
There are tools to implement git-flow that I haven’t used and may have
some standardization built in but I think “develop” is typical and safe.
On Apr 22, 2017, at 10:33 AM, Andrew Musselman <
Cool, I'll make a new dev branch now.
Dev, develop, any preference?
It hasn't been often but I’ve been bit by it and had to ask users of a
dependent project to checkout a specific commit, nasty.
The main affect would be to automation efforts that are currently wip.
On Apr 22, 2017, at 10:25 AM, Andrew Musselman <
I've worked in shops where that was the standard flow, in hg or git,
and
Post by Pat Ferrel
it
worked great. I'm in favor of it especially as we add contributors and
make
it easier for people to submit new work.
Have we had that many times when master got messed up? I don't recall
more
than a few, but in any case the master/dev branch approach is solid.
Post by Pat Ferrel
I’ve been introduced to what is now being called git-flow, which at
it’s
Post by Pat Ferrel
simplest is just a branching strategy with several key benefits. The
most
Post by Pat Ferrel
important part of it is that the master branch is rock solid all the
time
Post by Pat Ferrel
because we use the “develop” branch for integrating Jiras, PRs,
features,
Post by Pat Ferrel
etc. Any “rock solid” bit can be cherry-picked and put into master or
hot-fixes that fix a release but still require a source build.
The master becomes stable and can be relied on to be stable. It is
generally equal to the last release with only stable or required
exceptions.
Post by Pat Ferrel
Develop is where all the integration and potentially risky work
happens.
Post by Pat Ferrel
It is where most PRs are targeted.
A release causes develop to be merged with master and so it maintains
the
Post by Pat Ferrel
stability of master.
The benefits of git-flow are more numerous but also seem scary because
the
Post by Pat Ferrel
explanation can be complex. I’ve switched all my projects and Apache
PredictionIO is where I was introduced to this, and it is actually
quite
Post by Pat Ferrel
easy to manage and collaborate with this model. We just need to take
the
Post by Pat Ferrel
plunge by creating a persistent branch in the Apache git repo called
“develop”. From then on all commits will go to “develop” and all PRs
should
Post by Pat Ferrel
be created against it. Just after a release is a good time for this.
https://datasift.github.io/gitflow/IntroducingGitFlow.html <
https://datasift.github.io/gitflow/IntroducingGitFlow.html>
What say you all?
Pat Ferrel
2017-06-21 21:17:45 UTC
Permalink
Since merges are done by committers, it’s easy to retarget a contributor’s PRs but committers would PR against develop, and some projects like PredictionIO make develop the default branch on github so it's the one contributors get by default.

In fact this is the primary difference, Master is left stable and ignored until a release or bug fix is needed before the next release.

We already have various branches and now that we can clean them up without involving Infra, the rest of your question is resolved by the originator of the change just like today.

I see the key benefits as:
1) as I’ve already over stated, master is stable
2) we have a documented process that is IMO a “best practice”. Even if we stick with the process of today we need to document it as release artifacts and branches proliferate.


On Jun 21, 2017, at 2:06 PM, Dmitriy Lyubimov <***@gmail.com> wrote:

so people need to make sure their PR merges to develop instead of master?
Do they need to PR against develop branch, and if not, who is responsible
for confict resolution then that is to arise from diffing and merging into
different targets?
As I said I was sure there would be Jenkins issues but they must be small
since it’s just renaming of target branches. Releases are still made from
master so I don’t see the issue there at all. Only intermediate CI tasks
are triggered on other branches. But they would have to be in your examples
too so I don’t see the benefit of using an ad hoc method in terms of CI.
We’ve used this method for years with Apache PredictionIO with minimal CI
issues.
No the process below is not equivalent, treating master as develop removes
the primary (in my mind) benefit. In git flow the master is always stable
and the reflection of the last primary/core/default release with only
critical inter-release fixes. If someone wants to work with stable
up-to-date source, where do they go with the current process? I would claim
that there actually may be no place to find such a thing except by tracking
down some working commit number. It would depend on what stage the project
is in, in git flow there is never a question—master is always stable. Git
flow also accounts for all the process exceptions and complexities you
mention below but in a standardized way that is documented so anyone can
read the rules and follow them. We/Mahout doesn’t even have to write them,
they can just be referenced.
But we are re-arguing something I thought was already voted on and that is
another issue. If we need to re-debate this let’s make it stick one way or
the other.
I really appreciate you being release master and the thought and work
you’ve put into this and if we decide to stick with it, fine. But it should
be a project decision that release masters follow, not up to each release
master. We are now embarking on a much more complex release than before
with multiple combinations of dependencies for binaries and so multiple
artifacts. We need to make the effort tame the complexity somehow or it
will just multiply.
Given the short nature of the current point release I’d even suggest that
we target putting our decision in practice after the release, which is a
better time to make a change if we are to do so.
First issue, one does not simply just start using a develop branch. CI
only triggers off the 'main' branch, which is master by default. If we
move to the way you propose, then we need to file a ticket with INFRA I
believe. That can be done, but its not like we just start doing it one
day.
The current method is, when we cut a release- we make a new branch of that
release. Master is treated like dev. If you want the latest stable, you
would check out branch-0.13.0 . This is the way most major projects
(citing Spark, Flink, Zeppelin), including Mahout up to version 0.10.x
worked. To your point, there being a lack of a recent stable- that's fair,
but partly that's because no one created branches with the release for
0.10.? - 0.12.2.
For all intents and purposes, we are (now once again) following what you
propose, the only difference is we are treating master as dev, and
"branch-0.13.0" as master (e.g. last stable). Larger features go on their
own branch until they are ready to merge- e.g. ATM there is just one
feature branch CUDA. That was the big take away from this discussion last
time- there needed to be feature branches, as opposed to everyone running
around either working off WIP PRs or half baked merges, etc. To that end-
"website" was a feature branch, and iirc there has been one other feature
branch that has merged in the last couple of months but I forget what it
was at the moment.
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things." -Virgil*
Post by Pat Ferrel
Perhaps there is a misunderstanding about where a release comes
from—master. So any release tools we have should work fine. It’s just
that
Post by Pat Ferrel
until you are ready to pull the trigger, development is in develop or
more
Post by Pat Ferrel
strictly a “getting a release ready” branch called a release branch. This
sounds like a lot of branches but in practice it’s trivial to merge and
purge. Everything stays clean and rapid fire last minute fixes are
isolated
Post by Pat Ferrel
to the release branch before going into master.
The original reason I brought this up is that our Git tools now allow
committers to delete old cruft laden branches that are created and
ephemeral with this method.
I just heard we are not using git flow (the process not the tool), we are
checking unclean (untested in any significant way) changes to master?
What
Post by Pat Ferrel
is the develop branch used for?
The master is unstable most all the time with the old method, in fact
there is *no stable bundle of source ever* without git flow. With git
flow
Post by Pat Ferrel
you can peel off a bug fix and merge with master and users can pull it
expecting that everything else is stable and like the last build. This
has
Post by Pat Ferrel
bit me with Mahout in the past as I’m sure it has for everyone. This
doesn’t fix that but it does limit the pain to committers.
If we aren’t going to use it, fine but let’s not agree to it then do
something else. If it’s a matter of timing ok, I understood from Andrew’s
mail below there was no timing issue but I expect there will be Jenkins
or
Post by Pat Ferrel
Travis issues to iron out.
For reference: http://nvie.com/posts/a-successful-git-branching-model/ <
http://nvie.com/posts/a-successful-git-branching-model/> I have never
heard of someone who has tried it that didn’t like it but it takes a leap
of faith unless you have git in your bones.
On Apr 22, 2017, at 10:42 AM, Andrew Musselman <
Okay develop it is; I'll cut a develop branch from master right now.
As we go, if people forget and push to master, we can merge those changes
into develop.
In addition, I'm making a 'website' branch for all work on the new
version
Post by Pat Ferrel
of the site.
Post by Pat Ferrel
There are tools to implement git-flow that I haven’t used and may have
some standardization built in but I think “develop” is typical and safe.
On Apr 22, 2017, at 10:33 AM, Andrew Musselman <
Cool, I'll make a new dev branch now.
Dev, develop, any preference?
Post by Pat Ferrel
It hasn't been often but I’ve been bit by it and had to ask users of a
dependent project to checkout a specific commit, nasty.
The main affect would be to automation efforts that are currently wip.
On Apr 22, 2017, at 10:25 AM, Andrew Musselman <
I've worked in shops where that was the standard flow, in hg or git,
and
Post by Pat Ferrel
Post by Pat Ferrel
it
Post by Pat Ferrel
worked great. I'm in favor of it especially as we add contributors and
make
Post by Pat Ferrel
it easier for people to submit new work.
Have we had that many times when master got messed up? I don't recall
more
Post by Pat Ferrel
than a few, but in any case the master/dev branch approach is solid.
I’ve been introduced to what is now being called git-flow, which at
it’s
Post by Pat Ferrel
Post by Pat Ferrel
simplest is just a branching strategy with several key benefits. The
most
Post by Pat Ferrel
important part of it is that the master branch is rock solid all the
time
Post by Pat Ferrel
because we use the “develop” branch for integrating Jiras, PRs,
features,
Post by Pat Ferrel
etc. Any “rock solid” bit can be cherry-picked and put into master or
hot-fixes that fix a release but still require a source build.
The master becomes stable and can be relied on to be stable. It is
generally equal to the last release with only stable or required
exceptions.
Develop is where all the integration and potentially risky work
happens.
Post by Pat Ferrel
Post by Pat Ferrel
It is where most PRs are targeted.
A release causes develop to be merged with master and so it maintains
the
Post by Pat Ferrel
stability of master.
The benefits of git-flow are more numerous but also seem scary because
the
explanation can be complex. I’ve switched all my projects and Apache
PredictionIO is where I was introduced to this, and it is actually
quite
Post by Pat Ferrel
Post by Pat Ferrel
easy to manage and collaborate with this model. We just need to take
the
Post by Pat Ferrel
Post by Pat Ferrel
plunge by creating a persistent branch in the Apache git repo called
“develop”. From then on all commits will go to “develop” and all PRs
should
be created against it. Just after a release is a good time for this.
https://datasift.github.io/gitflow/IntroducingGitFlow.html <
https://datasift.github.io/gitflow/IntroducingGitFlow.html>
What say you all?
Dmitriy Lyubimov
2017-06-21 21:25:03 UTC
Permalink
On Wed, Jun 21, 2017 at 2:17 PM, Pat Ferrel <***@occamsmachete.com> wrote:

Since merges are done by committers, it’s easy to retarget a contributor’s
Post by Pat Ferrel
PRs but committers would PR against develop,
IMO it is anything but easy to resolve conflicts, let alone somebody
else's. Spark just asks me to resolve them myself. But if you don't have
proper target, you can't ask the contributor.

and some projects like PredictionIO make develop the default branch on
Post by Pat Ferrel
github so it's the one contributors get by default.
That would fix it but i am not sure if we have access to HEAD on github
mirror. Might involve INFRA to do it And in that case it would amount
little more but renaming. It would seem it is much easier to create a
branch, "stable master" or something, and consider master to be ongoing PR
base.

-1 on former, -0 on the latter. Judging from the point of both contributor
and committer (of which I am both).it will not make my life easy on either
end.
Dmitriy Lyubimov
2017-06-21 21:26:30 UTC
Permalink
PS. but i see the rational. to have stable fixes to get into release.
perhaps named release branches is still a way to go if one cuts them early
enough.
Post by Dmitriy Lyubimov
Since merges are done by committers, it’s easy to retarget a contributor’s
Post by Pat Ferrel
PRs but committers would PR against develop,
IMO it is anything but easy to resolve conflicts, let alone somebody
else's. Spark just asks me to resolve them myself. But if you don't have
proper target, you can't ask the contributor.
and some projects like PredictionIO make develop the default branch on
Post by Pat Ferrel
github so it's the one contributors get by default.
That would fix it but i am not sure if we have access to HEAD on github
mirror. Might involve INFRA to do it And in that case it would amount
little more but renaming. It would seem it is much easier to create a
branch, "stable master" or something, and consider master to be ongoing PR
base.
-1 on former, -0 on the latter. Judging from the point of both contributor
and committer (of which I am both).it will not make my life easy on either
end.
Pat Ferrel
2017-06-21 22:00:01 UTC
Permalink
Which is an option part of git flow but maybe take a look at a better explanation than mine: http://nvie.com/posts/a-successful-git-branching-model/ <http://nvie.com/posts/a-successful-git-branching-model/>

I still don’t see how this complicates resolving conflicts. It just removes the resolution from being a blocker. If some conflict is pushed to master the project is dead until it is resolved (how often have we seen this?) With git flow, not so, because this never happens (or very rarely) You could see the same problem occur in develop but wouldn't best practice be to resolve known conflicts in a separate branch where stakeholders collaborate, then once resolved merge with develop and purge the ephemeral branch? I’ve seen this work well though the work is not and never will be easy, at least it doesn’t get in other people’s way.


On Jun 21, 2017, at 2:26 PM, Dmitriy Lyubimov <***@gmail.com> wrote:

PS. but i see the rational. to have stable fixes to get into release.
perhaps named release branches is still a way to go if one cuts them early
enough.
Post by Dmitriy Lyubimov
Since merges are done by committers, it’s easy to retarget a contributor’s
Post by Pat Ferrel
PRs but committers would PR against develop,
IMO it is anything but easy to resolve conflicts, let alone somebody
else's. Spark just asks me to resolve them myself. But if you don't have
proper target, you can't ask the contributor.
and some projects like PredictionIO make develop the default branch on
Post by Pat Ferrel
github so it's the one contributors get by default.
That would fix it but i am not sure if we have access to HEAD on github
mirror. Might involve INFRA to do it And in that case it would amount
little more but renaming. It would seem it is much easier to create a
branch, "stable master" or something, and consider master to be ongoing PR
base.
-1 on former, -0 on the latter. Judging from the point of both contributor
and committer (of which I am both).it will not make my life easy on either
end.
Dmitriy Lyubimov
2017-06-22 21:48:18 UTC
Permalink
Post by Pat Ferrel
Which is an option part of git flow but maybe take a look at a better
explanation than mine: http://nvie.com/posts/a-successful-git-branching-
model/ <http://nvie.com/posts/a-successful-git-branching-model/>
I still don’t see how this complicates resolving conflicts. It just
removes the resolution from being a blocker. If some conflict is pushed to
master the project is dead until it is resolved (how often have we seen
this?)
This is completely detached from github reality.

In this model, all contributors work actually on the same branch. In
github, every contributor will fork off their own dev branch.

In this model, people start with a fork off the dev branch and push to dev
branch. In github, a contributor will fork off the master branch and will
PR against master branch. This is default behavior and my gut feeling no
amount of forewarning is going to change that w.r.t. contributors. And if
one starts off his/her work with the branch with intent to commit to
another, then conflict is guaranteed every time he or she changes the file
that has been changed on the branch to be merged to.

For example:
Master is at A
Dev branch is at A - B -C ... F.

if I start working at master (A) then i wil generate conflicts if i have
changed same files (lines) as in B, C, .. or F.

If I start working at dev (F) then i will not have a chance to generate
conflicts with B,C,..F but only with commits that happened after i had
started.

Also, if I start working at master (A) then github flow will suggest me to
merge into master during PR. I guarantee 100% of first time PRs will trip
on that in github. even if you put "start your work off dev not master" 20
times into project readme.

And then you will face the dilemma whether to ask people to resolve merge
issues w.r.t. master and resubmit, which will result to high contribtors'
attrition, or resolve them yourself without deep knowledge of the author's
intent, which will result in delays and plain errors.

-d
Dmitriy Lyubimov
2017-06-22 23:13:07 UTC
Permalink
should read

And then you will face the dilemma whether to ask people to resolve merge
issues w.r.t. *dev* and resubmit against *dev*, which will result to high
contribtors' attrition, or resolve them yourself without deep knowledge of
the author's intent, which will result in delays and plain errors.
Post by Dmitriy Lyubimov
Post by Pat Ferrel
Which is an option part of git flow but maybe take a look at a better
explanation than mine: http://nvie.com/posts/a-succes
sful-git-branching-model/ <http://nvie.com/posts/a-succe
ssful-git-branching-model/>
I still don’t see how this complicates resolving conflicts. It just
removes the resolution from being a blocker. If some conflict is pushed to
master the project is dead until it is resolved (how often have we seen
this?)
This is completely detached from github reality.
In this model, all contributors work actually on the same branch. In
github, every contributor will fork off their own dev branch.
In this model, people start with a fork off the dev branch and push to dev
branch. In github, a contributor will fork off the master branch and will
PR against master branch. This is default behavior and my gut feeling no
amount of forewarning is going to change that w.r.t. contributors. And if
one starts off his/her work with the branch with intent to commit to
another, then conflict is guaranteed every time he or she changes the file
that has been changed on the branch to be merged to.
Master is at A
Dev branch is at A - B -C ... F.
if I start working at master (A) then i wil generate conflicts if i have
changed same files (lines) as in B, C, .. or F.
If I start working at dev (F) then i will not have a chance to generate
conflicts with B,C,..F but only with commits that happened after i had
started.
Also, if I start working at master (A) then github flow will suggest me to
merge into master during PR. I guarantee 100% of first time PRs will trip
on that in github. even if you put "start your work off dev not master" 20
times into project readme.
And then you will face the dilemma whether to ask people to resolve merge
issues w.r.t. master and resubmit, which will result to high contribtors'
attrition, or resolve them yourself without deep knowledge of the author's
intent, which will result in delays and plain errors.
-d
Dmitriy Lyubimov
2017-06-22 23:21:32 UTC
Permalink
and contributors convenience should be golden IMO. I remember experiencing
a mild irritation when i was asked to resolve the conflicts on spark prs
because I felt they arose solely because the committer was taking too long
to review my PR and ok it. But if it were resulting from the project not
following simple KISS github PR workflow, it probably would be a bigger
turn-off.

and then imagine the overhead of explaining to every newcomer that they
should and why they should be PRing not against the master but something
else when every other ASF project accepts PRs against master...

I dunno... when working on github, any deviation from github commonly
accepted PR flows imo would be a fatal wound to the process.
Post by Dmitriy Lyubimov
should read
And then you will face the dilemma whether to ask people to resolve merge
issues w.r.t. *dev* and resubmit against *dev*, which will result to high
contribtors' attrition, or resolve them yourself without deep knowledge of
the author's intent, which will result in delays and plain errors.
Post by Dmitriy Lyubimov
Post by Pat Ferrel
Which is an option part of git flow but maybe take a look at a better
explanation than mine: http://nvie.com/posts/a-succes
sful-git-branching-model/ <http://nvie.com/posts/a-succe
ssful-git-branching-model/>
I still don’t see how this complicates resolving conflicts. It just
removes the resolution from being a blocker. If some conflict is pushed to
master the project is dead until it is resolved (how often have we seen
this?)
This is completely detached from github reality.
In this model, all contributors work actually on the same branch. In
github, every contributor will fork off their own dev branch.
In this model, people start with a fork off the dev branch and push to
dev branch. In github, a contributor will fork off the master branch and
will PR against master branch. This is default behavior and my gut feeling
no amount of forewarning is going to change that w.r.t. contributors. And
if one starts off his/her work with the branch with intent to commit to
another, then conflict is guaranteed every time he or she changes the file
that has been changed on the branch to be merged to.
Master is at A
Dev branch is at A - B -C ... F.
if I start working at master (A) then i wil generate conflicts if i have
changed same files (lines) as in B, C, .. or F.
If I start working at dev (F) then i will not have a chance to generate
conflicts with B,C,..F but only with commits that happened after i had
started.
Also, if I start working at master (A) then github flow will suggest me
to merge into master during PR. I guarantee 100% of first time PRs will
trip on that in github. even if you put "start your work off dev not
master" 20 times into project readme.
And then you will face the dilemma whether to ask people to resolve merge
issues w.r.t. master and resubmit, which will result to high contribtors'
attrition, or resolve them yourself without deep knowledge of the author's
intent, which will result in delays and plain errors.
-d
Pat Ferrel
2017-06-23 15:23:08 UTC
Permalink
I don’t know where to start here. Git flow does not address the merge conflict problems you talk about. They have nothing to do with the process and are made no easier or harder by following it.

The only thing I can comment on is that PredictionIO sets “develop” as the default branch so PRs are always against that, making absolutely no difference in convenience to contributors. And since we should soon be able to use the shiny green merge button on github, the process will quite smooth and far less dangerous since master is not affected.

Note that this is from experience, not hypotheticals. PIO has a mess of dependency combinations, even worse than Mahout and we’ve found that following this makes a hard job at least contained. Merging will always be hard but thats why we get the big bucks ;-)

Contributors voted to use the process on PIO just like committers and something like 6 have since graduated to committer status over the last 6 months.

I’d be happy to put anyone in touch with them if you want to see what they think.



On Jun 22, 2017, at 4:21 PM, Dmitriy Lyubimov <***@gmail.com> wrote:

and contributors convenience should be golden IMO. I remember experiencing
a mild irritation when i was asked to resolve the conflicts on spark prs
because I felt they arose solely because the committer was taking too long
to review my PR and ok it. But if it were resulting from the project not
following simple KISS github PR workflow, it probably would be a bigger
turn-off.

and then imagine the overhead of explaining to every newcomer that they
should and why they should be PRing not against the master but something
else when every other ASF project accepts PRs against master...

I dunno... when working on github, any deviation from github commonly
accepted PR flows imo would be a fatal wound to the process.
Post by Dmitriy Lyubimov
should read
And then you will face the dilemma whether to ask people to resolve merge
issues w.r.t. *dev* and resubmit against *dev*, which will result to high
contribtors' attrition, or resolve them yourself without deep knowledge of
the author's intent, which will result in delays and plain errors.
Post by Dmitriy Lyubimov
Post by Pat Ferrel
Which is an option part of git flow but maybe take a look at a better
explanation than mine: http://nvie.com/posts/a-succes
sful-git-branching-model/ <http://nvie.com/posts/a-succe
ssful-git-branching-model/>
I still don’t see how this complicates resolving conflicts. It just
removes the resolution from being a blocker. If some conflict is pushed to
master the project is dead until it is resolved (how often have we seen
this?)
This is completely detached from github reality.
In this model, all contributors work actually on the same branch. In
github, every contributor will fork off their own dev branch.
In this model, people start with a fork off the dev branch and push to
dev branch. In github, a contributor will fork off the master branch and
will PR against master branch. This is default behavior and my gut feeling
no amount of forewarning is going to change that w.r.t. contributors. And
if one starts off his/her work with the branch with intent to commit to
another, then conflict is guaranteed every time he or she changes the file
that has been changed on the branch to be merged to.
Master is at A
Dev branch is at A - B -C ... F.
if I start working at master (A) then i wil generate conflicts if i have
changed same files (lines) as in B, C, .. or F.
If I start working at dev (F) then i will not have a chance to generate
conflicts with B,C,..F but only with commits that happened after i had
started.
Also, if I start working at master (A) then github flow will suggest me
to merge into master during PR. I guarantee 100% of first time PRs will
trip on that in github. even if you put "start your work off dev not
master" 20 times into project readme.
And then you will face the dilemma whether to ask people to resolve merge
issues w.r.t. master and resubmit, which will result to high contribtors'
attrition, or resolve them yourself without deep knowledge of the author's
intent, which will result in delays and plain errors.
-d
Dmitriy Lyubimov
2017-07-20 16:06:05 UTC
Permalink
I don’t know where to start here. Git flow does not address the merge
conflict problems you talk about. They have nothing to do with the process
and are made no easier or harder by following it.
I thought i did demonstrate that it does make conflicts much more probable
per below. The point where you start your work and point where you merge it
do matter. This process does increase the gap between those (which implies
higher chance of conflicts and deeper divergence from the start). This is
is the same reason why people try to merge most recent commit stack back as
often as possible.
Post by Dmitriy Lyubimov
Master is at A
Dev branch is at A - B -C ... F.
if I start working at master (A) then i wil generate conflicts if i have
changed same files (lines) as in B, C, .. or F.
If I start working at dev (F) then i will not have a chance to generate
conflicts with B,C,..F but only with commits that happened after i had
started.
Also, if I start working at master (A) then github flow will suggest me
to merge into master during PR. I guarantee 100% of first time PRs will
trip on that in github. even if you put "start your work off dev not
master" 20 times into project readme.
And then you will face the dilemma whether to ask people to resolve
merge
Post by Dmitriy Lyubimov
issues w.r.t. master and resubmit, which will result to high
contribtors'
Post by Dmitriy Lyubimov
attrition, or resolve them yourself without deep knowledge of the
author's
Post by Dmitriy Lyubimov
intent, which will result in delays and plain errors.
-d
Dmitriy Lyubimov
2017-07-20 23:46:38 UTC
Permalink
Guys,

as you know, my ability to contribute is very limited lately, so i don't
feel like my opinion is worth as much as that of a regular committer or
contributor. In the end people who contribute things should decide what
works for them.

I just put forward a warning that while normally this workflow would not be
a problem IF people are aware of the flow and start their work off the dev
branch, based on my git/github experience, a newbie WILL fork from master
to a private PR branch of her/his own to commence contribution work.

Which, according to proposed scheme, WILL be quite behind the dev branch
that she will then be asked to merge to.

Which WILL catch the unsuspecting contributor unawares. They will find
they'd have a significant divergence to overcome in order to attain the
mergeability of their work.
Post by Dmitriy Lyubimov
I don’t know where to start here. Git flow does not address the merge
conflict problems you talk about. They have nothing to do with the process
and are made no easier or harder by following it.
I thought i did demonstrate that it does make conflicts much more probable
per below. The point where you start your work and point where you merge it
do matter. This process does increase the gap between those (which implies
higher chance of conflicts and deeper divergence from the start). This is
is the same reason why people try to merge most recent commit stack back as
often as possible.
Post by Dmitriy Lyubimov
Master is at A
Dev branch is at A - B -C ... F.
if I start working at master (A) then i wil generate conflicts if i
have
Post by Dmitriy Lyubimov
changed same files (lines) as in B, C, .. or F.
If I start working at dev (F) then i will not have a chance to generate
conflicts with B,C,..F but only with commits that happened after i had
started.
Also, if I start working at master (A) then github flow will suggest me
to merge into master during PR. I guarantee 100% of first time PRs
will
Post by Dmitriy Lyubimov
trip on that in github. even if you put "start your work off dev not
master" 20 times into project readme.
And then you will face the dilemma whether to ask people to resolve
merge
Post by Dmitriy Lyubimov
issues w.r.t. master and resubmit, which will result to high
contribtors'
Post by Dmitriy Lyubimov
attrition, or resolve them yourself without deep knowledge of the
author's
Post by Dmitriy Lyubimov
intent, which will result in delays and plain errors.
-d
Trevor Grant
2017-07-25 14:37:46 UTC
Permalink
Fwiw I agree with D, I just don't have enough experience to state it so
eloquently.

Pat is really in favor, I've got a bad feeling about it- you expressed my
'bad feeling' perfectly.

Even though you aren't contributing as much (code) these days, you're still
a very valued member of the community- and I think I speak for most when I
say, your guidance and involvement on the mailing lists is still very
appreciated. Please always feel encouraged to chime in.

my .02

#communityovercode
Post by Dmitriy Lyubimov
Guys,
as you know, my ability to contribute is very limited lately, so i don't
feel like my opinion is worth as much as that of a regular committer or
contributor. In the end people who contribute things should decide what
works for them.
I just put forward a warning that while normally this workflow would not be
a problem IF people are aware of the flow and start their work off the dev
branch, based on my git/github experience, a newbie WILL fork from master
to a private PR branch of her/his own to commence contribution work.
Which, according to proposed scheme, WILL be quite behind the dev branch
that she will then be asked to merge to.
Which WILL catch the unsuspecting contributor unawares. They will find
they'd have a significant divergence to overcome in order to attain the
mergeability of their work.
Post by Dmitriy Lyubimov
I don’t know where to start here. Git flow does not address the merge
conflict problems you talk about. They have nothing to do with the
process
Post by Dmitriy Lyubimov
and are made no easier or harder by following it.
I thought i did demonstrate that it does make conflicts much more
probable
Post by Dmitriy Lyubimov
per below. The point where you start your work and point where you merge
it
Post by Dmitriy Lyubimov
do matter. This process does increase the gap between those (which
implies
Post by Dmitriy Lyubimov
higher chance of conflicts and deeper divergence from the start). This is
is the same reason why people try to merge most recent commit stack back
as
Post by Dmitriy Lyubimov
often as possible.
Post by Dmitriy Lyubimov
Master is at A
Dev branch is at A - B -C ... F.
if I start working at master (A) then i wil generate conflicts if i
have
Post by Dmitriy Lyubimov
changed same files (lines) as in B, C, .. or F.
If I start working at dev (F) then i will not have a chance to
generate
Post by Dmitriy Lyubimov
Post by Dmitriy Lyubimov
conflicts with B,C,..F but only with commits that happened after i
had
Post by Dmitriy Lyubimov
Post by Dmitriy Lyubimov
started.
Also, if I start working at master (A) then github flow will suggest
me
Post by Dmitriy Lyubimov
Post by Dmitriy Lyubimov
to merge into master during PR. I guarantee 100% of first time PRs
will
Post by Dmitriy Lyubimov
trip on that in github. even if you put "start your work off dev not
master" 20 times into project readme.
And then you will face the dilemma whether to ask people to resolve
merge
Post by Dmitriy Lyubimov
issues w.r.t. master and resubmit, which will result to high
contribtors'
Post by Dmitriy Lyubimov
attrition, or resolve them yourself without deep knowledge of the
author's
Post by Dmitriy Lyubimov
intent, which will result in delays and plain errors.
-d
Trevor Grant
2017-06-21 22:25:21 UTC
Permalink
So right now, if there was a bug in 0.13.0 that needed an important patch-
why not just merge it into master and git branch "branch-0.13.0"
Post by Dmitriy Lyubimov
PS. but i see the rational. to have stable fixes to get into release.
perhaps named release branches is still a way to go if one cuts them early
enough.
Post by Dmitriy Lyubimov
Since merges are done by committers, it’s easy to retarget a
contributor’s
Post by Dmitriy Lyubimov
Post by Pat Ferrel
PRs but committers would PR against develop,
IMO it is anything but easy to resolve conflicts, let alone somebody
else's. Spark just asks me to resolve them myself. But if you don't have
proper target, you can't ask the contributor.
and some projects like PredictionIO make develop the default branch on
Post by Pat Ferrel
github so it's the one contributors get by default.
That would fix it but i am not sure if we have access to HEAD on github
mirror. Might involve INFRA to do it And in that case it would amount
little more but renaming. It would seem it is much easier to create a
branch, "stable master" or something, and consider master to be ongoing
PR
Post by Dmitriy Lyubimov
base.
-1 on former, -0 on the latter. Judging from the point of both
contributor
Post by Dmitriy Lyubimov
and committer (of which I am both).it will not make my life easy on
either
Post by Dmitriy Lyubimov
end.
Pat Ferrel
2017-06-22 17:45:57 UTC
Permalink
Actually I think git flow would merge it into master and tag it with an annotated tag like “0.13.0.jira-123” to reference the bug fix or some other naming scheme. Since the bug is “important” it is treated like what the blog post calls a “hotfix” so the head of master is still stable with hotfixes applied even if the merge does not warrant a binary release.

The master branch hygiene is maintained by checking WIP into develop or a feature branch, hotfixes and releases go into master. There is also a mechanism to maintain release branches if the project warrants, which may be true of Mahout.


On Jun 21, 2017, at 3:25 PM, Trevor Grant <***@gmail.com> wrote:

So right now, if there was a bug in 0.13.0 that needed an important patch-
why not just merge it into master and git branch "branch-0.13.0"
Post by Dmitriy Lyubimov
PS. but i see the rational. to have stable fixes to get into release.
perhaps named release branches is still a way to go if one cuts them early
enough.
Post by Pat Ferrel
Since merges are done by committers, it’s easy to retarget a
contributor’s
Post by Pat Ferrel
Post by Pat Ferrel
PRs but committers would PR against develop,
IMO it is anything but easy to resolve conflicts, let alone somebody
else's. Spark just asks me to resolve them myself. But if you don't have
proper target, you can't ask the contributor.
and some projects like PredictionIO make develop the default branch on
Post by Pat Ferrel
github so it's the one contributors get by default.
That would fix it but i am not sure if we have access to HEAD on github
mirror. Might involve INFRA to do it And in that case it would amount
little more but renaming. It would seem it is much easier to create a
branch, "stable master" or something, and consider master to be ongoing
PR
Post by Pat Ferrel
base.
-1 on former, -0 on the latter. Judging from the point of both
contributor
Post by Pat Ferrel
and committer (of which I am both).it will not make my life easy on
either
Post by Pat Ferrel
end.
Pat Ferrel
2017-06-22 17:47:05 UTC
Permalink
Which translates into exactly what you suggest if we are maintaining release branches.


On Jun 22, 2017, at 10:45 AM, Pat Ferrel <***@occamsmachete.com> wrote:

Actually I think git flow would merge it into master and tag it with an annotated tag like “0.13.0.jira-123” to reference the bug fix or some other naming scheme. Since the bug is “important” it is treated like what the blog post calls a “hotfix” so the head of master is still stable with hotfixes applied even if the merge does not warrant a binary release.

The master branch hygiene is maintained by checking WIP into develop or a feature branch, hotfixes and releases go into master. There is also a mechanism to maintain release branches if the project warrants, which may be true of Mahout.


On Jun 21, 2017, at 3:25 PM, Trevor Grant <***@gmail.com> wrote:

So right now, if there was a bug in 0.13.0 that needed an important patch-
why not just merge it into master and git branch "branch-0.13.0"
Post by Dmitriy Lyubimov
PS. but i see the rational. to have stable fixes to get into release.
perhaps named release branches is still a way to go if one cuts them early
enough.
Post by Pat Ferrel
Since merges are done by committers, it’s easy to retarget a
contributor’s
Post by Pat Ferrel
Post by Pat Ferrel
PRs but committers would PR against develop,
IMO it is anything but easy to resolve conflicts, let alone somebody
else's. Spark just asks me to resolve them myself. But if you don't have
proper target, you can't ask the contributor.
and some projects like PredictionIO make develop the default branch on
Post by Pat Ferrel
github so it's the one contributors get by default.
That would fix it but i am not sure if we have access to HEAD on github
mirror. Might involve INFRA to do it And in that case it would amount
little more but renaming. It would seem it is much easier to create a
branch, "stable master" or something, and consider master to be ongoing
PR
Post by Pat Ferrel
base.
-1 on former, -0 on the latter. Judging from the point of both
contributor
Post by Pat Ferrel
and committer (of which I am both).it will not make my life easy on
either
Post by Pat Ferrel
end.
Pat Ferrel
2017-06-22 17:50:23 UTC
Permalink
And all this leads me to think that the concerns/worries may not really be warranted, this process just codifies best practices and adds one new thing, which is “develop’ as the default WIP branch.


On Jun 22, 2017, at 10:47 AM, Pat Ferrel <***@occamsmachete.com> wrote:

Which translates into exactly what you suggest if we are maintaining release branches.


On Jun 22, 2017, at 10:45 AM, Pat Ferrel <***@occamsmachete.com> wrote:

Actually I think git flow would merge it into master and tag it with an annotated tag like “0.13.0.jira-123” to reference the bug fix or some other naming scheme. Since the bug is “important” it is treated like what the blog post calls a “hotfix” so the head of master is still stable with hotfixes applied even if the merge does not warrant a binary release.

The master branch hygiene is maintained by checking WIP into develop or a feature branch, hotfixes and releases go into master. There is also a mechanism to maintain release branches if the project warrants, which may be true of Mahout.


On Jun 21, 2017, at 3:25 PM, Trevor Grant <***@gmail.com> wrote:

So right now, if there was a bug in 0.13.0 that needed an important patch-
why not just merge it into master and git branch "branch-0.13.0"
Post by Dmitriy Lyubimov
PS. but i see the rational. to have stable fixes to get into release.
perhaps named release branches is still a way to go if one cuts them early
enough.
Post by Pat Ferrel
Since merges are done by committers, it’s easy to retarget a
contributor’s
Post by Pat Ferrel
Post by Pat Ferrel
PRs but committers would PR against develop,
IMO it is anything but easy to resolve conflicts, let alone somebody
else's. Spark just asks me to resolve them myself. But if you don't have
proper target, you can't ask the contributor.
and some projects like PredictionIO make develop the default branch on
Post by Pat Ferrel
github so it's the one contributors get by default.
That would fix it but i am not sure if we have access to HEAD on github
mirror. Might involve INFRA to do it And in that case it would amount
little more but renaming. It would seem it is much easier to create a
branch, "stable master" or something, and consider master to be ongoing
PR
Post by Pat Ferrel
base.
-1 on former, -0 on the latter. Judging from the point of both
contributor
Post by Pat Ferrel
and committer (of which I am both).it will not make my life easy on
either
Post by Pat Ferrel
end.
Pat Ferrel
2017-06-21 22:04:33 UTC
Permalink
Agreed


Given the short nature of the current point release I’d even suggest that
we target putting our decision in practice after the release, which is a
better time to make a change if we are to do so.


I bring it up again because the release artifacts listed are more than we have ever done before and the current process does not seem to support this complexity.


On Jun 21, 2017, at 2:37 PM, Andrew Palumbo <***@outlook.com> wrote:

Pat - I just want to clear one point up.. Trevor volunteering to head up this release and the git-flow plans are independent of each other. The 0.13.1 release was originally planned as a quick follow up to 0.13.0 for each scala/spark conf combo I think this will be 6 artifacts.. spark 1.6.x - 2.1.x for scala 2.11 and scala 2.10. we'd hoped it would be straightforward and it was something almost automatable.

The git-flow change idea was floated by you, i believe you around the same time (correct me if I'm wrong.. this was all happening while I was sick). I agree that it should be a team decision, but it also might take some time to transition.



Sent from my Verizon Wireless 4G LTE smartphone


-------- Original message --------
From: Dmitriy Lyubimov <***@gmail.com>
Date: 06/21/2017 2:06 PM (GMT-08:00)
To: ***@mahout.apache.org
Cc: Mahout Dev List <***@mahout.apache.org>
Subject: Re: Proposal for changing Mahout's Git branching rules

so people need to make sure their PR merges to develop instead of master?
Do they need to PR against develop branch, and if not, who is responsible
for confict resolution then that is to arise from diffing and merging into
different targets?
As I said I was sure there would be Jenkins issues but they must be small
since it’s just renaming of target branches. Releases are still made from
master so I don’t see the issue there at all. Only intermediate CI tasks
are triggered on other branches. But they would have to be in your examples
too so I don’t see the benefit of using an ad hoc method in terms of CI.
We’ve used this method for years with Apache PredictionIO with minimal CI
issues.
No the process below is not equivalent, treating master as develop removes
the primary (in my mind) benefit. In git flow the master is always stable
and the reflection of the last primary/core/default release with only
critical inter-release fixes. If someone wants to work with stable
up-to-date source, where do they go with the current process? I would claim
that there actually may be no place to find such a thing except by tracking
down some working commit number. It would depend on what stage the project
is in, in git flow there is never a question—master is always stable. Git
flow also accounts for all the process exceptions and complexities you
mention below but in a standardized way that is documented so anyone can
read the rules and follow them. We/Mahout doesn’t even have to write them,
they can just be referenced.
But we are re-arguing something I thought was already voted on and that is
another issue. If we need to re-debate this let’s make it stick one way or
the other.
I really appreciate you being release master and the thought and work
you’ve put into this and if we decide to stick with it, fine. But it should
be a project decision that release masters follow, not up to each release
master. We are now embarking on a much more complex release than before
with multiple combinations of dependencies for binaries and so multiple
artifacts. We need to make the effort tame the complexity somehow or it
will just multiply.
Given the short nature of the current point release I’d even suggest that
we target putting our decision in practice after the release, which is a
better time to make a change if we are to do so.
First issue, one does not simply just start using a develop branch. CI
only triggers off the 'main' branch, which is master by default. If we
move to the way you propose, then we need to file a ticket with INFRA I
believe. That can be done, but its not like we just start doing it one
day.
The current method is, when we cut a release- we make a new branch of that
release. Master is treated like dev. If you want the latest stable, you
would check out branch-0.13.0 . This is the way most major projects
(citing Spark, Flink, Zeppelin), including Mahout up to version 0.10.x
worked. To your point, there being a lack of a recent stable- that's fair,
but partly that's because no one created branches with the release for
0.10.? - 0.12.2.
For all intents and purposes, we are (now once again) following what you
propose, the only difference is we are treating master as dev, and
"branch-0.13.0" as master (e.g. last stable). Larger features go on their
own branch until they are ready to merge- e.g. ATM there is just one
feature branch CUDA. That was the big take away from this discussion last
time- there needed to be feature branches, as opposed to everyone running
around either working off WIP PRs or half baked merges, etc. To that end-
"website" was a feature branch, and iirc there has been one other feature
branch that has merged in the last couple of months but I forget what it
was at the moment.
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things." -Virgil*
Post by Pat Ferrel
Perhaps there is a misunderstanding about where a release comes
from—master. So any release tools we have should work fine. It’s just
that
Post by Pat Ferrel
until you are ready to pull the trigger, development is in develop or
more
Post by Pat Ferrel
strictly a “getting a release ready” branch called a release branch. This
sounds like a lot of branches but in practice it’s trivial to merge and
purge. Everything stays clean and rapid fire last minute fixes are
isolated
Post by Pat Ferrel
to the release branch before going into master.
The original reason I brought this up is that our Git tools now allow
committers to delete old cruft laden branches that are created and
ephemeral with this method.
I just heard we are not using git flow (the process not the tool), we are
checking unclean (untested in any significant way) changes to master?
What
Post by Pat Ferrel
is the develop branch used for?
The master is unstable most all the time with the old method, in fact
there is *no stable bundle of source ever* without git flow. With git
flow
Post by Pat Ferrel
you can peel off a bug fix and merge with master and users can pull it
expecting that everything else is stable and like the last build. This
has
Post by Pat Ferrel
bit me with Mahout in the past as I’m sure it has for everyone. This
doesn’t fix that but it does limit the pain to committers.
If we aren’t going to use it, fine but let’s not agree to it then do
something else. If it’s a matter of timing ok, I understood from Andrew’s
mail below there was no timing issue but I expect there will be Jenkins
or
Post by Pat Ferrel
Travis issues to iron out.
For reference: http://nvie.com/posts/a-successful-git-branching-model/ <
http://nvie.com/posts/a-successful-git-branching-model/> I have never
heard of someone who has tried it that didn’t like it but it takes a leap
of faith unless you have git in your bones.
On Apr 22, 2017, at 10:42 AM, Andrew Musselman <
Okay develop it is; I'll cut a develop branch from master right now.
As we go, if people forget and push to master, we can merge those changes
into develop.
In addition, I'm making a 'website' branch for all work on the new
version
Post by Pat Ferrel
of the site.
Post by Pat Ferrel
There are tools to implement git-flow that I haven’t used and may have
some standardization built in but I think “develop” is typical and safe.
On Apr 22, 2017, at 10:33 AM, Andrew Musselman <
Cool, I'll make a new dev branch now.
Dev, develop, any preference?
Post by Pat Ferrel
It hasn't been often but I’ve been bit by it and had to ask users of a
dependent project to checkout a specific commit, nasty.
The main affect would be to automation efforts that are currently wip.
On Apr 22, 2017, at 10:25 AM, Andrew Musselman <
I've worked in shops where that was the standard flow, in hg or git,
and
Post by Pat Ferrel
Post by Pat Ferrel
it
Post by Pat Ferrel
worked great. I'm in favor of it especially as we add contributors and
make
Post by Pat Ferrel
it easier for people to submit new work.
Have we had that many times when master got messed up? I don't recall
more
Post by Pat Ferrel
than a few, but in any case the master/dev branch approach is solid.
I’ve been introduced to what is now being called git-flow, which at
it’s
Post by Pat Ferrel
Post by Pat Ferrel
simplest is just a branching strategy with several key benefits. The
most
Post by Pat Ferrel
important part of it is that the master branch is rock solid all the
time
Post by Pat Ferrel
because we use the “develop” branch for integrating Jiras, PRs,
features,
Post by Pat Ferrel
etc. Any “rock solid” bit can be cherry-picked and put into master or
hot-fixes that fix a release but still require a source build.
The master becomes stable and can be relied on to be stable. It is
generally equal to the last release with only stable or required
exceptions.
Develop is where all the integration and potentially risky work
happens.
Post by Pat Ferrel
Post by Pat Ferrel
It is where most PRs are targeted.
A release causes develop to be merged with master and so it maintains
the
Post by Pat Ferrel
stability of master.
The benefits of git-flow are more numerous but also seem scary because
the
explanation can be complex. I’ve switched all my projects and Apache
PredictionIO is where I was introduced to this, and it is actually
quite
Post by Pat Ferrel
Post by Pat Ferrel
easy to manage and collaborate with this model. We just need to take
the
Post by Pat Ferrel
Post by Pat Ferrel
plunge by creating a persistent branch in the Apache git repo called
“develop”. From then on all commits will go to “develop” and all PRs
should
be created against it. Just after a release is a good time for this.
https://datasift.github.io/gitflow/IntroducingGitFlow.html <
https://datasift.github.io/gitflow/IntroducingGitFlow.html>
What say you all?
Loading...