Skip to content

RSiteSearch, sos, rdocumentation.org, ...?

9 messages · Jonathan Baron, Michael Dewey, Spencer Graves +1 more

#
Hello, All:


       Jonathan Baron is "giving up" maintaining the RSiteSearch database.


       This breaks three things:  (1) The R Site Search web service that 
Baron has maintained.  (2) The RSiteSearch function in the utils 
package.  (3) The sos package, for which I'm the maintainer and lead 
author.


       Might someone else be willing to take these over?


       For me, the "findFn" capability with "writeFindFn2xls" is the 
fastest literature search for anything statistical.  However, I don't 
have the resources to take over the management of Baron's R Site Search 
database.


       He's provided a great service for the R community for many 
years.  I hope we can find a way to keep the system maintained. Failing 
that, I could use help in adapting the sos package to another database.


       Thanks,
       Spencer Graves


-------- Forwarded Message --------
Subject: 	Re: RSiteSearch, sos, rdocumentation.org, ...?
Date: 	Wed, 7 Sep 2016 16:15:22 -0400
From: 	Jonathan Baron <baron at psych.upenn.edu>
To: 	Spencer Graves <spencer.graves at prodsyse.com>
CC: 	Jonathan Baron <baron at psych.upenn.edu>, chris.is.fun at gmail.com, 
info at datacamp.com <info at datacamp.com>, Sundar Dorai-Raj 
<sdorairaj at gmail.com>, webmaster at www.r-project-org



R site search has stopped working. The indexing scrip, mknmz, failed
to complete. It has been producing more and more errors and warnings,
since it has not been updated for 5 yeaers.

I am giving up on this site. I have too many other things to do aside
from find bugs in programs written in languages I don't know (Perl),
or set up an alternative search engine.

Please inform anyone else who needs to be informed.

I cannot find the email of the www.r-project.org webmaster, so I'm
taking a stab. There are several links to this site in those pages.

Jon
#
Spencer,

Thanks for the quick reply.

I am open to someone who knows Perl getting an account on my site and
trying to get it working. It will probably involve fixing more than
one thing, as mknmz depends on some perl modules that also generate
errors.

My main contribution is figuring out how to extract the html help
files and vignettes only, with some help from R developers and Fedora
maintainers. Here is the trick, for someone who wants to do it:

m0 <- rownames(installed.packages())
m1 <- m0[which(m0 %in% needed.packages)]
source("http://bioconductor.org/biocLite.R")
update.packages(oldPkgs=m1,repos=biocinstallRepos())
update.packages(dependencies=FALSE,INSTALL_opts=c("--no-configure","--no-test-load","--no-R","--no-clean-on-error","--no-libs","--no-data","--no-demo","--no-exec","--html"),repos=biocinstallRepos(),ask=F)
m3 <- new.packages()
install.packages(m3,dependencies=FALSE,INSTALL_opts=c("--no-configure","--no-test-load","--no-R","--no-clean-on-error","--no-libs","--no-data","--no-demo","--no-exec","--html"),repos=biocinstallRepos())

Note 1: The first 4 lines are designed to deal with a list of the
packages that you actually use. These can be eliminated if you don't
use R on the same machine. The last 3 lines are all you need.

Note 2: This works on Fedora, but I think that the Fedora maintainers
of R have set some defaults that are helpful.

Jon
On 09/07/16 15:41, Spencer Graves wrote:

  
    
#
Don't do anything yet. I may have found the problem by accident.

I tried to use the computer from something else, and it was being
drastically slowed down by some leftover processes, which turned out
to be xlhtml. That is something that converts Excel files. Apparently,
some excel files got into the libraries, and they were causing the
indexing to hang completely.

I am now running everything again, starting from scratch, and it might
work. (I'm doing it wrong, but it is 3/4 done. I will do it right
tomorrow, if it works overnight.)

Jon
On 09/07/16 16:53, Jonathan Baron wrote:

  
    
#
OK.  It is sort of fixed and sort of works.

We'll keep it for now, but this is not going to work forever. When
namazu fails completely I will not have the time to install a new
search engine.

One option is to use google. For a site like this, I think they will
want some money, but I'm not sure, and I do not have the time to deal
with it.

We have over 10,000 packages now. I wonder if searching all help files
is really helpful anymore.

Jon
On 09/07/16 22:06, Jonathan Baron wrote:

  
    
#
I have mixed feelings about this. I used to find the sos package very 
useful when I first started using it but as the number of packages has 
grown I now find it gives me a huge list which takes a lot of time to 
digest. This may of course reflect my rudimentary search term selection 
skills.

Michael
On 08/09/2016 11:01, Jonathan Baron wrote:

  
    
  
#
On 9/8/2016 5:01 AM, Jonathan Baron wrote:
The fastest way I know to do a literature search for anything 
statistical uses the sos package as follows:


             1.  docPages <- findFn('search string') or findFn('{search 
string}')


             2.  installPackages(docPages) # this installs packages to 
enable a more complete package summary


             3.  writeFindFn2xls(docPages) # this creates an Excel file 
with 3 sheets:  a package summary, the findFn table, and the call.


             4.  Then I open the Excel file, and review the package 
summary sheet.  I prioritize my search from there based on the number 
and strength of matches, how close it sounds to what I want, the date of 
the last update, whether it has a vignette, and the authors and 
maintainers.


       There may be a better way to do this using Google or something 
else.  I'd be pleased if someone else could enlighten me.  I admit to 
being biased:  I'm the lead author and maintainer of "sos". However, I 
don't want to perpetuate a tool that has outlived its usefulness, and 
I'm too blind to see that!


       Spencer
#
I looked at rdocumentation.org. At first I thought it was a superior
replacement for namazu, but after I tried a few things I decided that
it wasn't. I could not find any documentation about how to search, and
the various things I tried seemed to yield very strange responses,
e.g., a search for "Hayes mediation bootstrap" gave me mostly
functions that had nothing to do with the search except for the word
"bootstrap".

So I managed to fix the major Perl module errors (one of which was
quite bothersome although not fatal ... yet). And I figured out a new
way to create the indices that namazu uses; the new way is more
selective. And things seem to work now. Aside from the problems I just
fixed, this is not hard to maintain, so I will continue.

It also seems that someone IS sort of maintaining namazu,
sporadically. There is a Fedora rpm for it. That was how I found out
how to fix the Perl module.

But I did end up spending a few hours on this on a day when I am
behind writing action letters, etc. etc. And ultimately I cannot do
this forever and would love it if someone else took it over, or at
least helped, with an account on my server.

Jon
On 09/08/16 06:36, Dirk Eddelbuettel wrote:

  
    
#
Jonathan,

FWIW I mentored a Google Summer of Code student (who was more than highly
self-sufficient and needed next to no help, apart from some small R packaging
tricks) as part of the Xapian project in order to write RXapian:

   https://github.com/amandaJayanetti/RXapian

which is an R interface to the Xapian index engine.

I don't know much about these indice generators, but Xapian [1] appears to be
free, open-source, current, maintained, powerful, and used.  From what I
gather you are still betting on an older (and as I seem to recall,
deprecated) technology. There may be more teers ahead.

The other tip would be to get in touch with Gabor who as part of r-hub has
indices for just about anything, and 9as he his a generation younger than
Spencer, you or me) also provides current (ie JSON over REST) interfaces.

Dirk

[1] https://xapian.org/