Working with big.matrix Objects

Overview

bigalgebra is designed to interoperate with the bigmemory ecosystem. This vignette demonstrates how to create in-memory and file-backed big.matrix objects, interact with them via the package’s wrappers, and manage the underlying resources safely.

Creating in-memory big.matrix objects

In-memory matrices behave much like ordinary R matrices but reside in shared memory, allowing multiple R sessions to access the same data.

X <- big.matrix(3, 3, type = "double", init = 0)
X[,] <- matrix(1:9, nrow = 3)
X[]
#>      [,1] [,2] [,3]
#> [1,]    1    4    7
#> [2,]    2    5    8
#> [3,]    3    6    9

Once created, the objects can be passed directly to Level 1 helpers:

dvcal(ALPHA = 2, X = X, BETA = -1, Y = X)
X[]
#>      [,1] [,2] [,3]
#> [1,]    1    4    7
#> [2,]    2    5    8
#> [3,]    3    6    9

Working with file-backed matrices

File-backed matrices persist their contents on disk, making them suitable for data sets that exceed available RAM.

dir.create(tmp_fb <- tempfile())
Y <- filebacked.big.matrix(4, 2, type = "double",
                           backingpath = tmp_fb,
                           backingfile = "fb.bin",
                           descriptorfile = "fb.desc",
                           init = 0)
Y[,] <- matrix(runif(8), nrow = 4)
Y[]
#>           [,1]      [,2]
#> [1,] 0.6493070 0.3190592
#> [2,] 0.5984925 0.3586293
#> [3,] 0.4148640 0.8447862
#> [4,] 0.4998140 0.9298028

These objects participate in higher-level operations without being loaded into memory.

Z <- filebacked.big.matrix(4, 2, type = "double",
                           backingpath = tmp_fb,
                           backingfile = "res.bin",
                           descriptorfile = "res.desc",
                           init = 0)
dvcal(ALPHA = 1.5, X = Y, BETA = 0, Y = Z)
Z[]
#>           [,1]      [,2]
#> [1,] 0.9739606 0.4785888
#> [2,] 0.8977387 0.5379439
#> [3,] 0.6222960 1.2671794
#> [4,] 0.7497210 1.3947043

Sharing matrices between sessions

The descriptor file records the metadata needed to reopen a file-backed matrix in a new R session. The attach.big.matrix() helper reconstructs the object:

Y_desc <- dget(file.path(tmp_fb, "fb.desc"))
Y_again <- attach.big.matrix(Y_desc)
identical(Y[,], Y_again[,])
#> [1] TRUE

Any operations performed via bigalgebra update the shared backing file, allowing all attached references to observe the change.

dsub(X = Z, Y = Y_again)
Y_again[]
#>            [,1]       [,2]
#> [1,] -0.3246535 -0.1595296
#> [2,] -0.2992462 -0.1793146
#> [3,] -0.2074320 -0.4223931
#> [4,] -0.2499070 -0.4649014

Cleaning up backing files

File-backed matrices allocate resources on disk. Deleting the backing and descriptor files once they are no longer needed helps keep the workspace tidy.

unlink(file.path(tmp_fb, c("fb.bin", "fb.desc", "res.bin", "res.desc")))
unlink(tmp_fb, recursive = TRUE)