Skip to content

[Bioc-devel] Duplicate commit error on and off

8 messages · Martin Morgan, Turaga, Nitesh, Thomas Girke

#
Dear Thomas,

Can you please send me the output of the two `git show` commands?
I?ll need to take a look at those to advice accordingly.


Best,

Nitesh
This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
#
Sure, it?s pretty long but here it is:

git show 8210e1e04e8dc6819b84820077293d8d61914cf5
commit 8210e1e04e8dc6819b84820077293d8d61914cf5
Author: Kevin Horan <khoran at cs.ucr.edu>
Date:   Fri Jun 30 21:37:15 2017 +0000

    fix for RSQLite 2

    git-svn-id:
file:///home/git/hedgehog.fhcrc.org/bioconductor/branches/RELEASE_3_5/madman/Rpacks/ChemmineR at 130824
bc3139a8-67e5-0310-9ffc-ced21a209358

diff --git a/DESCRIPTION b/DESCRIPTION
index 4b4d514..c96da6e 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -1,8 +1,8 @@
 Package: ChemmineR
 Type: Package
 Title: Cheminformatics Toolkit for R
-Version: 2.28.0
-Date: 2016-11-30
+Version: 2.28.1
+Date: 2017-6-30
 Author: Y. Eddie Cao, Kevin Horan, Tyler Backman, Thomas Girke
 Maintainer: Thomas Girke <thomas.girke at ucr.edu>
 Description: ChemmineR is a cheminformatics package for analyzing
drug-like small molecule data in R. Its latest version contains
functions for efficient processing of large numbers of molecules,
physicochemical
diff --git a/R/compound_db.R b/R/compound_db.R
index 76d1f6c..970c4a2 100644
--- a/R/compound_db.R
+++ b/R/compound_db.R
@@ -38,7 +38,7 @@ initDb <- function(handle){

collapse=""),";",fixed=TRUE))
                #print(statements)

-               Map(function(sql) dbOp(dbGetQuery(conn,sql)),statements)
+               Map(function(sql) dbOp(dbExecute(conn,sql)),statements)
        }
        conn
 }
@@ -54,7 +54,7 @@ dbTransaction <- function(conn,expr){
                # be paranoid about setting this as bad things will
happen if its not set
                enableForeignKeys(conn)

-               dbGetQuery(conn,"BEGIN TRANSACTION")
+               dbExecute(conn,"BEGIN TRANSACTION")
                ret=expr
                dbCommit(conn)
                ret
@@ -64,8 +64,11 @@ dbTransaction <- function(conn,expr){
                stop(paste("db error inside transaction: ",e$message))
        })
 }

-dbGetQueryChecked <- function(conn,statement,...){
-       ret=dbGetQuery(conn,statement)
+dbGetQueryChecked <- function(conn,statement,execute=FALSE,...){
+       if(execute == TRUE)
+               ret=dbExecute(conn,statement)
+       else

+               ret=dbGetQuery(conn,statement)
        err=dbGetException(conn)
        if(err$errorMsg[1] != "OK")
                stop("error in dbGetQuery: ",err$errorMsg,"  ",traceback())
@@ -895,10 +898,10 @@ createFeature <- function(conn,name, isNumeric){
        dbGetQueryChecked(conn,
                paste("CREATE TABLE feature_",name," (
                        compound_id INTEGER PRIMARY KEY REFERENCES
compounds(compound_id) ON DELETE CASCADE ON UPDATE CASCADE, ",
-                       "",name," ",sqlType," )",sep=""))
+                       "",name," ",sqlType," )",sep=""),execute=TRUE)

        #print("made table")
-       dbGetQuery(conn,paste("CREATE INDEX feature_",name,"_index ON
+       dbExecute(conn,paste("CREATE INDEX feature_",name,"_index ON

feature_",name,"(\"",name,"\")",sep=""))
        #print("made index")

@@ -913,7 +916,7 @@ insertDef <- function(conn,data)  {

"VALUES(:definition,:definition_checksum,:format)",sep=""),
bind.data=data[fields])
        }else if(inherits(conn,"PostgreSQLConnection")){
                if(debug) print(data[,"definition_checksum"])
-               apply(data[,fields],1,function(row) dbOp(dbGetQuery(conn,
+               apply(data[,fields],1,function(row) dbOp(dbExecute(conn,
                                                 "INSERT INTO
compounds(definition,definition_checksum,format) VALUES($1,$2,$3)",
                                                 row)))
        }else{
@@ -990,7 +993,7 @@ insertDescriptor <- function(conn,data){
                        })
                apply(data[,fields2],1,function(row) {
                        row[2] = descTypes[row[2]] #translate
descriptor_type to descriptor_type_id
-                       dbTransaction(conn,dbGetQuery(conn,
paste("INSERT INTO compound_descriptors(compound_id,
+                       dbTransaction(conn,dbExecute(conn,
paste("INSERT INTO compound_descriptors(compound_id,

           descriptor_id) ",
                                        "VALUES( (SELECT compound_id
FROM compounds WHERE definition_checksum = $1),

(SELECT descriptor_id FROM descriptors
@@ -1006,7 +1009,7 @@ insertDescriptorType <- function(conn,data){
                                                                 bind.data=data)
        }else if(inherits(conn,"PostgreSQLConnection")){
                apply(data,1,function(row)
-                               dbGetQuery(conn,paste("INSERT INTO
descriptor_types(descriptor_type) VALUES($1)"),row))
+                               dbExecute(conn,paste("INSERT INTO
descriptor_types(descriptor_type) VALUES($1)"),row))
        }else{
                stop("database ",class(conn)," unsupported")
        }
@@ -1019,7 +1022,7 @@ updatePriorities <- function(conn,data){

descriptor_id=:descriptor_id", bind.data=data)
        }else if(inherits(conn,"PostgreSQLConnection")){

apply(data[,c("compound_id","descriptor_id","priority")],1,function(row)
-                               dbGetQuery(conn,paste("UPDATE
compound_descriptors SET priority = $3 WHERE compound_id=$1 AND
+                               dbExecute(conn,paste("UPDATE
compound_descriptors SET priority = $3 WHERE compound_id=$1 AND

descriptor_id=$2"),row))
        }else{

                stop("database ",class(conn)," unsupported")
@@ -1036,13 +1039,12 @@ getPreparedQuery <- function(conn,statement,bind.data){
        #dbSendPreparedQuery(conn,statement,bind.data)

        #print("sending query")
-       res <- dbSendQuery(conn,statement)
+       res <- dbSendStatement(conn,statement)
        #print("after sendQuery")
        on.exit(dbClearResult(res)) #clear result set when this function exits
        #print("after exit callback registered")
-       dbBind(res,bind.data)
+       suppressWarnings(dbBind(res,bind.data))
        #print("after dbBind")
-       dbFetch(res)
 }

git show f514d35b793e1d9462b899bf3c76cc06ab4dcc91
commit f514d35b793e1d9462b899bf3c76cc06ab4dcc91
Author: Kevin Horan <khoran at cs.ucr.edu>
Date:   Fri Jun 30 21:37:15 2017 +0000

    fix for RSQLite 2

    git-svn-id:
file:///home/git/hedgehog.fhcrc.org/bioconductor/branches/RELEASE_3_5/madman/Rpacks/ChemmineR at 130824
bc3139a8-67e5-0310-9ffc-ced21a209358

diff --git a/DESCRIPTION b/DESCRIPTION
index 4b4d514..c96da6e 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -1,8 +1,8 @@
 Package: ChemmineR
 Type: Package
 Title: Cheminformatics Toolkit for R
-Version: 2.28.0
-Date: 2016-11-30
+Version: 2.28.1
+Date: 2017-6-30
 Author: Y. Eddie Cao, Kevin Horan, Tyler Backman, Thomas Girke
 Maintainer: Thomas Girke <thomas.girke at ucr.edu>
 Description: ChemmineR is a cheminformatics package for analyzing
drug-like small molecule data in R. Its latest version contains
functions for efficient processing of large numbers of molecules,
physicochemical
diff --git a/R/compound_db.R b/R/compound_db.R
index 76d1f6c..970c4a2 100644
--- a/R/compound_db.R
+++ b/R/compound_db.R
@@ -38,7 +38,7 @@ initDb <- function(handle){

collapse=""),";",fixed=TRUE))
                #print(statements)

-               Map(function(sql) dbOp(dbGetQuery(conn,sql)),statements)
+               Map(function(sql) dbOp(dbExecute(conn,sql)),statements)
        }
        conn
 }
@@ -54,7 +54,7 @@ dbTransaction <- function(conn,expr){
                # be paranoid about setting this as bad things will
happen if its not set
                enableForeignKeys(conn)

-               dbGetQuery(conn,"BEGIN TRANSACTION")
+               dbExecute(conn,"BEGIN TRANSACTION")
                ret=expr
                dbCommit(conn)
                ret
@@ -64,8 +64,11 @@ dbTransaction <- function(conn,expr){
                stop(paste("db error inside transaction: ",e$message))
        })
 }
-dbGetQueryChecked <- function(conn,statement,...){
-       ret=dbGetQuery(conn,statement)
+dbGetQueryChecked <- function(conn,statement,execute=FALSE,...){
+       if(execute == TRUE)
+               ret=dbExecute(conn,statement)
+       else

+               ret=dbGetQuery(conn,statement)
        err=dbGetException(conn)
        if(err$errorMsg[1] != "OK")
                stop("error in dbGetQuery: ",err$errorMsg,"  ",traceback())
@@ -895,10 +898,10 @@ createFeature <- function(conn,name, isNumeric){
        dbGetQueryChecked(conn,
                paste("CREATE TABLE feature_",name," (
                        compound_id INTEGER PRIMARY KEY REFERENCES
compounds(compound_id) ON DELETE CASCADE ON UPDATE CASCADE, ",
-                       "",name," ",sqlType," )",sep=""))
+                       "",name," ",sqlType," )",sep=""),execute=TRUE)

        #print("made table")
-       dbGetQuery(conn,paste("CREATE INDEX feature_",name,"_index ON
+       dbExecute(conn,paste("CREATE INDEX feature_",name,"_index ON

feature_",name,"(\"",name,"\")",sep=""))
        #print("made index")

@@ -913,7 +916,7 @@ insertDef <- function(conn,data)  {

"VALUES(:definition,:definition_checksum,:format)",sep=""),
bind.data=data[fields])
        }else if(inherits(conn,"PostgreSQLConnection")){
                if(debug) print(data[,"definition_checksum"])
-               apply(data[,fields],1,function(row) dbOp(dbGetQuery(conn,
+               apply(data[,fields],1,function(row) dbOp(dbExecute(conn,
                                                 "INSERT INTO
compounds(definition,definition_checksum,format) VALUES($1,$2,$3)",
                                                 row)))
        }else{
@@ -990,7 +993,7 @@ insertDescriptor <- function(conn,data){
                        })
                apply(data[,fields2],1,function(row) {
                        row[2] = descTypes[row[2]] #translate
descriptor_type to descriptor_type_id
-                       dbTransaction(conn,dbGetQuery(conn,
paste("INSERT INTO compound_descriptors(compound_id,
+                       dbTransaction(conn,dbExecute(conn,
paste("INSERT INTO compound_descriptors(compound_id,

           descriptor_id) ",
                                        "VALUES( (SELECT compound_id
FROM compounds WHERE definition_checksum = $1),

(SELECT descriptor_id FROM descriptors
@@ -1006,7 +1009,7 @@ insertDescriptorType <- function(conn,data){
                                                                 bind.data=data)
        }else if(inherits(conn,"PostgreSQLConnection")){
                apply(data,1,function(row)
-                               dbGetQuery(conn,paste("INSERT INTO
descriptor_types(descriptor_type) VALUES($1)"),row))
+                               dbExecute(conn,paste("INSERT INTO
descriptor_types(descriptor_type) VALUES($1)"),row))
        }else{
                stop("database ",class(conn)," unsupported")
        }
@@ -1019,7 +1022,7 @@ updatePriorities <- function(conn,data){

descriptor_id=:descriptor_id", bind.data=data)
        }else if(inherits(conn,"PostgreSQLConnection")){

apply(data[,c("compound_id","descriptor_id","priority")],1,function(row)
-                               dbGetQuery(conn,paste("UPDATE
compound_descriptors SET priority = $3 WHERE compound_id=$1 AND
+                               dbExecute(conn,paste("UPDATE
compound_descriptors SET priority = $3 WHERE compound_id=$1 AND

descriptor_id=$2"),row))
        }else{
                stop("database ",class(conn)," unsupported")
@@ -1036,13 +1039,12 @@ getPreparedQuery <- function(conn,statement,bind.data){

        #dbSendPreparedQuery(conn,statement,bind.data)

        #print("sending query")
-       res <- dbSendQuery(conn,statement)
+       res <- dbSendStatement(conn,statement)
        #print("after sendQuery")
        on.exit(dbClearResult(res)) #clear result set when this function exits
        #print("after exit callback registered")
-       dbBind(res,bind.data)
+       suppressWarnings(dbBind(res,bind.data))
        #print("after dbBind")
-       dbFetch(res)
 }

Thomas
?

On Thu, Sep 7, 2017 at 7:39 PM Turaga, Nitesh <Nitesh.Turaga at roswellpark.org>
wrote:

  
  
#
Hi Thomas,

So, you do actually have a ?duplicate? commit and you should NOT be pushing this to the bioc-git server. Notice that the body of both those commits is the same. 

If you want to check the rest of your commit history, please try `git log - -oneline` . And you will see that you will have multiple duplicate commits. This compromises your commit history on both your Github repo, and on the bioc-git server.

As suggested in the error message,

remote: Take a look at the documentation to fix this,
remote: https://bioconductor.org/developers/how-to/git/sync-existing-repositories/,
	     particularly, point  #8 (force Bioconductor master to Github master).

I would advice you to take a look at that. If you have other thoughts on how to solve this problem, I?d be happy to listen.

Best,

Nitesh
This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
#
On 09/08/2017 09:34 AM, Turaga, Nitesh wrote:
Seemed like Thomas wasn't eager to replace his current master with 
git.bicoonductor.org's master (swap branches). Maybe one thing that 
makes it more palatable is that the current master can be retained, 
e.g., as a branch master_archive.

What are the alternatives to forcing master branch to be 
git.bioconductor.org's version?

Selectively removing duplicate commits (how does one  do that?)

Cherry-picking (into the official Bioconductor branch) rather than merging?

I know it'll get hairy here, and maybe no easy or complete or simple 
solution... maybe seeing the steps required for alternative solutions 
would make the branch swap solution more palatable :)

Martin
This email message may contain legally privileged and/or...{{dropped:2}}
#
Dear Martin and Nitesh,

If swapping branches is the recommended solution then I will do so. The
on/off situation with the duplicate commit error misled me to believe it is
a temporary problem on the Bioc end. I am sorry for the extra noise my
message may have caused.

As a group that maintains GitHub versions of all our Bioc packages, we are
extremely excited about the recent git transition. Thanks Nitesh (and your
colleagues) for doing all of this. It must have been a massive effort for
so many packages.

Best,

Thomas


On Fri, Sep 8, 2017 at 7:11 AM Martin Morgan <martin.morgan at roswellpark.org>
wrote:

  
  
#
Hi Thomas,

Thanks for understanding. Swapping branches is the best solution I could come up with, and would advice you to go ahead with that. 

The catching of false positives was a couple of bugs in our pre-receive hooks code, which we have since fixed. I?m glad to have made the shift to Git, and hope the community enjoys it.

Best Regards,

Nitesh
This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
#
Since our duplicate commits slipped through to Bioc (probably during
pre-receive hook downtime), the only option I see, to actually get to the
actual swap branch step, is to first fix the current duplicate problem in
the upstream Bioc remote by resetting the affected branches back to the
commit right before the duplicates occurred. Manual fixes seem unrealistic
given the number of instances. However, when following (see below) the
reset instructions here (https://goo.gl/SeNCPk) I am getting a ?hook
declined? error. Not sure how to get beyond this? Or are there alternative
solutions I should try?

Interestingly, according to the instructions on the top of the Bioc
instructions of this very topic (https://goo.gl/SeNCPk) I am supposedly now
someone who "went nuclear". Wasn't this term introduced (or perhaps just
abused) by the last presidential campaign? How can it get anymore
frustrating than being affiliated with this outcome :).

Thomas

git reset --hard e3a8d2122a65305c478276e538faace277ea9ed6
git push -f origin master
git push -f upstream master

Total 0 (delta 0), reused 0 (delta 0)
remote: FATAL: + refs/heads/master packages/ChemmineR t.girke DENIED by fallthru
remote: error: hook declined to update refs/heads/master
To git at git.bioconductor.org:packages/ChemmineR.git
 ! [remote rejected] master -> master (hook declined)
error: failed to push some refs to
'git at git.bioconductor.org:packages/ChemmineR.git'

?

On Fri, Sep 8, 2017 at 12:21 PM Turaga, Nitesh <
Nitesh.Turaga at roswellpark.org> wrote:

            

  
  
2 days later
#
The following allowed me to eliminate the duplicated commits in one step
via git merge --squash and then successfully push back to the bioc-git
server. After this I was able to switch to the swap branch approach to
avoid similar problems in the future.

Example here for master branch:

git checkout master
git pull upstream master # just in case
git reset --hard <commit_id> # Reset the current branch to the commit
right before dups started
git merge --squash HEAD@{1} # Squashes duplicated commits from chosen
<commit_id> to HEAD@{1} (state right before previous reset step)
git commit -am "some_message" # Commit squashed changes
git push upstream master # Push to bioc-git server

I am not sure if the above is the best solution but I thought I report it
here in case others experience similar problems. BTW: in my case the
duplicates were all generated in the upstream merge (step 6) of the
instructions here: https://goo.gl/wWVEeT. None of the parent branches (on
github or bioc) used in this merge step contained duplicated commits at
least as far as I have checked so far. Perhaps some of this relates back to
the git svn/rebase steps we used under the old git mirror?

Just in case, the following command is very helpful to identify duplicate
commits based on patch-id. Commits with identical patch-ids are very likely
to have identical content.

git rev-list master | xargs -r -L1 git diff-tree -m -p | git patch-id
| sort | uniq -w40 -D | cut -c42-80  | xargs -r git log --no-walk
--pretty=format:"%h %ad %an (%cn) %s" --date-order --date=iso

After duplicated commit pairs have been identified, one can check with diff
or vimdiff whether their content is identical:

git --no-pager show <commit_id1> > zzz1
git --no-pager show <commit_id2> > zzz2
vimdiff zzz1 zzz2

Thomas
On Fri, Sep 8, 2017 at 9:16 PM Thomas Girke <thomas.girke at ucr.edu> wrote:
Since our duplicate commits slipped through to Bioc (probably during