On 5/17/23 16:07, Tomas Kalibera wrote:
On 4/18/23 14:16, Fredrik Skoog wrote:
Hi,
If you run:
library(microbenchmark)
m <- matrix(rnorm(28000000), nrow=7000, byrow=TRUE)
rownames(m) <- rownames(m, do.NULL = FALSE, prefix = "this is a row
name")
colnames(m) <- colnames(m, do.NULL = FALSE, prefix = "this is a column
name")
microbenchmark(df <- as.data.frame(m, keep.rownames=TRUE), times=10)
The results shows worse performance in R4.2.3 (also bigger variations)
compared to v4.1.3. Also v4.2.0 shows worse performance, so it looks
like
it's 4.2.0 and later that has this issue. On Linux it's all good, so it
seems to be a Windows only issue.
Version 4.2.3
==============
Run 1
------
Unit: seconds
expr min lq mean
median uq max neval
df <- as.data.frame(m, keep.rownames = TRUE) 1.324839 2.411304
2.760553
2.593452 3.290228 4.263175 10
Run 2
------
Unit: milliseconds
expr min lq mean
median uq max neval
dt <- as.data.frame(m, keep.rownames = TRUE) 967.5651 1054.8 1155.453
1149.767 1194.742 1451.14 10
Version 4.1.3
===============
Run 1:
------
Unit: milliseconds
expr min lq mean
median uq max neval
df <- as.data.frame(m, keep.rownames = TRUE) 274.5478 298.2477
320.3988
320.9164 342.8119 375.6841 10
Run 2:
-------
Unit: milliseconds
expr min lq mean
median uq max neval
df <- as.data.frame(m, keep.rownames = TRUE) 278.5369 310.0312
313.0745
313.3275 320.0294 343.7539 10
I have tried it on two different machines, with the same result.
-----
The above example is just trying to do something simple that exposes the
issue, but as.data.table behaves similarly. Also it shows huge
variations
in time. We had a script that ran in 12 minutes in v3.6.3 and it took 18
min with v4.2.3, with v4.1.3 it takes around 9 minutes.
Has anyone else noticed this? I noticed in the release notes that
Doug Leas
malloc was replaced in v4.2.0 and that's a windows only change.
Thanks for the report. I confirm the slowdown with this example and I
confirm it is due to the change in memory allocator: I've switched my
working copy of R-devel back to the original version of dlmalloc,
which removed the slowdown.
Windows 10 (build 19041 and later) allows to choose a more recent
SegmentHeap allocator instead of the default Low Fragmentation Heap
allocator. It gives almost the same performance with this example as
the original version of dlmalloc, without the maintenance overhead of
using a custom allocator, so this might be one possible solution.
Hi Fredrik,
we made R-devel use Segment Heap on recent Windows as an experiment.
Could you please check the performance implications on some real
application, on which you based the example micro-benchmark? Did it
improve performance for you?
Indeed, if you have access to some other memory intensive real
applications with real data, it would be useful to check using that as
well.
Microbenchmarks are tricky. While yours works much better with Segment
Heap, my colleague found another one which works much better with Low
Fragmentation Heap.
Thanks
Tomas
Best regards,
Fredrik
[[alternative HTML version deleted]]