Dear R-SIG-MAC
I bought a new MacBook Air with the M3 chip, which has 8 CPUs, 10 GPUs, and 16GB of integrated memory. My R `torch` apps are crashing. I have assembled an MWE that works on other Mac architectures, including MacBook Air M1 and MacMini. The OS is the same (Sonoma 14.5). The MWE follows:
```{r}
# ==== MWE
# Download the training samples
rds_file <- "https://raw.githubusercontent.com/e-sensing/sitsdata/master/inst/extdata/torch/train_samples.rds?raw=true"
dest_file <- paste0(tempdir(),"/train_samples.rds")
download.file(rds_file,
destfile = dest_file,
method = "curl")
train_samples <- readRDS(dest_file)
# Sample labels
labels <- c("Cerrado", "Forest", "Pasture", "Soy_Corn")
# Create numeric labels vector
code_labels <- seq_along(labels)
names(code_labels) <- labels
# Split the data into training and validation data sets
# Create partitions for different splits of the input data
frac <- 0.2
train_samples <- dplyr::group_by(train_samples, .data[["label"]])
test_samples <- train_samples |>
dplyr::slice_sample(prop = frac) |>
dplyr::ungroup()
# Remove the lines used for validation
sel <- !train_samples[["sample_id"]] %in% test_samples[["sample_id"]]
train_samples <- train_samples[sel, ]
# Shuffle the data
train_samples <- train_samples[sample(nrow(train_samples), nrow(train_samples)), ]
test_samples <- test_samples[sample(nrow(test_samples), nrow(test_samples)), ]
# Organize data for model training
train_x <- as.matrix(train_samples[, -2:0])
train_y <- unname(code_labels[train_samples[["label"]]])
# Create the test data
test_x <- as.matrix(test_samples[, -2:0])
test_y <- unname(code_labels[test_samples[["label"]]])
# Set torch seed
torch::torch_manual_seed(sample.int(10^5, 1))
# Avoid a global variable for 'self'
self <- NULL
# function to create a simple sequential NN module
.torch_linear_relu_dropout <- torch::nn_module(
classname = "torch_linear_batch_norm_relu_dropout",
initialize = function(input_dim,
output_dim,
dropout_rate) {
self$block <- torch::nn_sequential(
torch::nn_linear(input_dim, output_dim),
torch::nn_relu(),
torch::nn_dropout(dropout_rate)
)
},
forward = function(x) {
self$block(x)
}
)
# Define the MLP architecture
mlp_model <- torch::nn_module(
initialize = function(num_pred, layers, dropout_rates, y_dim) {
tensors <- list()
# input layer
tensors[[1]] <- .torch_linear_relu_dropout(
input_dim = num_pred,
output_dim = 512,
dropout_rate = 0.40
)
# output layer
tensors[[length(tensors) + 1]] <-
torch::nn_linear(layers[length(layers)], y_dim)
# add softmax tensor
tensors[[length(tensors) + 1]] <- torch::nn_softmax(dim = 2)
# create a sequential module that calls the layers in the same
# order.
self$model <- torch::nn_sequential(!!!tensors)
},
forward = function(x) {
self$model(x)
}
)
# Train the model using luz
torch_model <- luz::setup(
module = mlp_model,
loss = torch::nn_cross_entropy_loss(),
metrics = list(luz::luz_metric_accuracy()),
optimizer = torch::optim_adamw,
)
torch_model <- luz::set_hparams(
torch_model,
num_pred = ncol(train_x),
layers = 512,
dropout_rates = 0.3,
y_dim = length(code_labels)
)
torch_model <- luz::set_opt_hparams(
torch_model,
lr = 0.001,
eps = 1e-08,
weight_decay = 1.0e-06
)
torch_model <- luz::fit(
torch_model,
data = list(train_x, train_y),
epochs = 100,
valid_data = list(test_x, test_y),
callbacks = list(luz::luz_callback_early_stopping(
patience = 20,
min_delta = 0.01
)),
verbose = TRUE
)
```
The error occurs in the `luz::fit` function. Inside RStudio, the code gets stuck, and then RStudio asks to restart R. When running R from the terminal, the output is:
```{r}
*** caught bus error ***
address 0x16daa0000, cause 'invalid alignment'
*** caught segfault ***
address 0x9, cause 'invalid permissions'
zsh: segmentation fault R
```
The `sessionInfo()` output is as follows:
```{r}
R version 4.4.0 (2024-04-24)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.5
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Sao_Paulo
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] crayon_1.5.2 vctrs_0.6.5 cli_3.6.2 zeallot_0.1.0
[5] rlang_1.1.3 processx_3.8.4 generics_0.1.3 torch_0.12.0.9000
[9] coro_1.0.4 glue_1.7.0 bit_4.0.5 prettyunits_1.2.0
[13] luz_0.4.0 ps_1.7.6 hms_1.1.3 fansi_1.0.6
[17] tibble_3.2.1 progress_1.2.3 lifecycle_1.0.4 compiler_4.4.0
[21] dplyr_1.1.4 fs_1.6.4 Rcpp_1.0.12 pkgconfig_2.0.3
[25] rstudioapi_0.16.0 R6_2.5.1 tidyselect_1.2.1 utf8_1.2.4
[29] pillar_1.9.0 callr_3.7.6 magrittr_2.0.3 tools_4.4.0
[33] bit64_4.0.5
```
Any clues will be most appreciated.
Thanks
Gilberto
============================
Prof Dr Gilberto Camara
Senior Researcher
National Institute for Space Research (INPE), Brazil
https://gilbertocamara.org/
=============================
M3 not working with torch
3 messages · Simon Urbanek, Gilberto Camara
Gilberto, since luz is a contributed package, you should probably start first by asking the authors. Given that the torch ecosystem is quite complex and has several layers that need to work together, even if you talk to them, you probably need to add details such as exact versions used (including the torch and metal layers) and how you installed the pieces (I know you helpfully supplied sesisonInfo() but I suspect that info such as exact torch run-time is pertinent as well). Next step would be to trace the error - check the system crash reporter or run R -d lldb to find out the exact library the crash happens in which may give you more clues. I don't have any M3 machines so I can't check myself, unfortunately. Cheers, Simon
On 21/05/2024, at 12:48 AM, Gilberto Camara <gilberto.camara at inpe.br> wrote:
Dear R-SIG-MAC
I bought a new MacBook Air with the M3 chip, which has 8 CPUs, 10 GPUs, and 16GB of integrated memory. My R `torch` apps are crashing. I have assembled an MWE that works on other Mac architectures, including MacBook Air M1 and MacMini. The OS is the same (Sonoma 14.5). The MWE follows:
[...]
```
The error occurs in the `luz::fit` function. Inside RStudio, the code gets stuck, and then RStudio asks to restart R. When running R from the terminal, the output is:
```{r}
*** caught bus error ***
address 0x16daa0000, cause 'invalid alignment'
*** caught segfault ***
address 0x9, cause 'invalid permissions'
zsh: segmentation fault R
```
The `sessionInfo()` output is as follows:
Dear Simon Many thanks for your response and suggestions. I have raised an issue in the R torch repository: https://github.com/mlverse/torch/issues/1167 All the best Gilberto ============================ Prof Dr Gilberto Camara Senior Researcher National Institute for Space Research (INPE), Brazil https://gilbertocamara.org/ =============================
On 20 May 2024, at 17:48, Simon Urbanek <simon.urbanek at R-project.org> wrote: Gilberto, since luz is a contributed package, you should probably start first by asking the authors. Given that the torch ecosystem is quite complex and has several layers that need to work together, even if you talk to them, you probably need to add details such as exact versions used (including the torch and metal layers) and how you installed the pieces (I know you helpfully supplied sesisonInfo() but I suspect that info such as exact torch run-time is pertinent as well). Next step would be to trace the error - check the system crash reporter or run R -d lldb to find out the exact library the crash happens in which may give you more clues. I don't have any M3 machines so I can't check myself, unfortunately. Cheers, Simon
On 21/05/2024, at 12:48 AM, Gilberto Camara <gilberto.camara at inpe.br> wrote:
Dear R-SIG-MAC
I bought a new MacBook Air with the M3 chip, which has 8 CPUs, 10 GPUs, and 16GB of integrated memory. My R `torch` apps are crashing. I have assembled an MWE that works on other Mac architectures, including MacBook Air M1 and MacMini. The OS is the same (Sonoma 14.5). The MWE follows:
[...]
```
The error occurs in the `luz::fit` function. Inside RStudio, the code gets stuck, and then RStudio asks to restart R. When running R from the terminal, the output is:
```{r}
*** caught bus error ***
address 0x16daa0000, cause 'invalid alignment'
*** caught segfault ***
address 0x9, cause 'invalid permissions'
zsh: segmentation fault R
```
The `sessionInfo()` output is as follows:
_______________________________________________ R-SIG-Mac mailing list R-SIG-Mac at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-mac