Skip to content
Prev 385454 / 398506 Next

readxl question

Hi Thomas,

I am not familiar with the use of the range argument, but it seems to me 
that the cell value becomes the column name. This might be fine, but you 
might get into trouble if you have repeated cell values since 
as.data.frame() will fix these.

I am also not sure about what you want, but this seems to capture your 
example (reading the same cells in a number of files):

```
library(readxl)

# Create test set
path <- readxl_example("geometry.xls")

read_xls(path) # See the content

example_file1 <- tempfile(fileext = ".xls")
example_file2 <- tempfile(fileext = ".xls")

file.copy(path, example_file1, overwrite = TRUE)
file.copy(path, example_file2, overwrite = TRUE)

# Solve the problem using loops
files <- c(example_file1, example_file2)
ranges <- c("B4", "C5", "D6")

fr <- lapply(ranges, function(cur_range, files){
   x <- lapply(files, read_xls, sheet = 1, range = cur_range)
   t(as.data.frame(x))
}, files = files)

# Loop over fr and save content if needed
```

A couple of variations over the theme, where the cell content is 
accessed after reading the file. This will not work well if the data in 
the excel files does not start at A1, but if you can adjust for this it 
should work just fine

```
# Solution #2

# Read the whole excel file, and access just the column - row
# This will give really unexpected results if the data does not start in 
the
# cell A1 as is the case for geometry.xls. Also, it does not work with 
ranges
# spaning more than a single cell
files <- rep(readxl_example("datasets.xlsx"), 3)
ranges <- c("B4", "C5", "D6")

# Loop over the files to avoid re-reading
fr <- lapply(files, function(cur_file, ranges){
   df <- read_excel(cur_file, sheet = 1)
   x <- lapply(ranges, function(cur_range, df){
     cr <- cellranger::as.cell_addr(cur_range, strict = FALSE)
     df[cr$row, cr$col][[1]]
   }, df = df)
   as.data.frame(setNames(x, ranges))

}, ranges = ranges)

# Solution 3
# Like solution 2 but using purr

library(purrr)

files <- rep(readxl_example("datasets.xlsx"), 3)
ranges <- c("B4", "C5", "D6")

map_dfr(files, function(cur_file, ranges){
   map_dfc(ranges, function(cur_range, df){
     df <- read_excel(cur_file, sheet = 1)
     cr <- cellranger::as.cell_addr(cur_range, strict = FALSE)
     setNames(df[cr$row, cr$col], cur_range)
   }, df = df)

}, ranges = ranges)

# Solution 4
# Like solution 3, but with the addition of the file name and producing 
a single
# data.frame at the end

library(purrr)

path <- readxl_example("datasets.xls")
example_file1 <- tempfile(fileext = "_1.xls")
example_file2 <- tempfile(fileext = "_2.xls")
example_file3 <- tempfile(fileext = "_3.xls")

file.copy(path, example_file1, overwrite = TRUE)
file.copy(path, example_file2, overwrite = TRUE)
file.copy(path, example_file3, overwrite = TRUE)

files <- c(example_file1, example_file2, example_file3)

# Name the file paths with the file names. We can them make use of the 
.id
# argument to map_dfr()
files <- setNames(files, basename(files))
ranges <- c("B4", "C5", "D6")

map_dfr(files, function(cur_file, ranges){
   map_dfc(ranges, function(cur_range, df){
     df <- read_excel(cur_file, sheet = 1)
     cr <- cellranger::as.cell_addr(cur_range, strict = FALSE)
     setNames(df[cr$row, cr$col], cur_range)
   }, df = df)
}, ranges = ranges, .id = "filename")
```

HTH
Ulrik
On 2020-08-26 15:38, PIKAL Petr wrote: