Skip to content

Merging two columns of unequal length

3 messages · William Michels, Jeff Newmiller, Bailey Hewitt

#
You should review "The Recycling Rule in R" before attempting to
perform functions on 2 or more vectors of unequal lengths:

https://cran.r-project.org/doc/manuals/R-intro.html#The-recycling-rule

Most often, the "Recycling Rule" does exactly what the researcher
intends (automatically). And in many cases, performing functions on
data of unequal (or not evenly divisible) lengths is either 1) an
indication of problems with the input data, or 2) an indication that
the researcher is unnecessarily 'forcing' data into a rectangular data
structure, when another approach might be better (e.g. the use of the
tapply function).

However, if you see no other way, the functions "cbind.na" and/or
"rbind.na" available from Andrej-Nikolai Spiess perform binding of
vectors without recycling:

http://www.dr-spiess.de/Rscripts.html

All you have to do is download and source the correct R-script, and
call the function:
[,1] [,2]
[1,]    1    1
[2,]    2    2
[3,]    3    1
[4,]    4    2
[5,]    5    1

Warning message:
In cbind(1:5, 1:2) :
  number of rows of result is not a multiple of vector length (arg 2)
[,1] [,2]
[1,]    1    1
[2,]    2    2
[3,]    3   NA
[4,]    4   NA
[5,]    5   NA
This issue arises so often, Dr. Spiess's two scripts "rbind.na" and
"cbind.na" have my vote for inclusion into the base-R distribution.

Best of luck,

W Michels, Ph.D.
On Mon, Dec 12, 2016 at 3:41 PM, Bailey Hewitt <bailster at hotmail.com> wrote:
#
I frequently work with mismatched-length data, but I think I would rarely want this behaviour because there is no compelling reason to believe that all of the NA values should wind up at the end of the data as you suggest. Normally there is a second column that controls where things should line up, and the merge function handles that reliably. If merge is not appropriate then I usually regard that as a warning that those data should perhaps be rbinded or stacked rather than cbinded.

I think Hadley Wickham's paper on tidy data [1] describes this philosophy well. 

[1] https://www.jstatsoft.org/article/view/v059i10
1 day later
#
Sorry for the delay! Thank you very much! I think I am getting a better understanding of my options from what you have said. Thanks again for the quick replies and the information, I really appreciate it!


Bailey