| Title: | Exact Search and Graph Construction for 'bigmemory' Matrices |
|---|---|
| Description: | Exact nearest-neighbour and radius-search routines that operate directly on 'bigmemory::big.matrix' objects. The package streams row blocks through 'BLAS' kernels, supports self-search and external-query search, exposes prepared references for repeated queries, and can build exact k-nearest-neighbour, radius, mutual k-nearest-neighbour, and shared-nearest-neighbour graphs. Version 0.3.0 adds execution plans, serializable prepared caches, resumable streamed graph jobs, coercion helpers, exact candidate reranking, and recall summaries for evaluating approximate neighbours. |
| Authors: | Frederic Bertrand [aut, cre] |
| Maintainer: | Frederic Bertrand <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 0.3.0 |
| Built: | 2026-06-01 11:06:39 UTC |
| Source: | https://github.com/fbertran/bigknn |
Coerce bigKNN outputs to edge-list form
as_edge_list(x, include_distance = TRUE)as_edge_list(x, include_distance = TRUE)
x |
A bigKNN result or graph object. |
include_distance |
Logical flag controlling whether distances are kept when coercing raw kNN or radius results. |
A data frame with columns from, to, and either distance or
weight.
Coerce bigKNN outputs to a sparse matrix
as_sparse_matrix(x, include_distance = TRUE)as_sparse_matrix(x, include_distance = TRUE)
x |
A bigKNN result or graph object. |
include_distance |
Logical flag controlling whether distances are kept when coercing raw kNN or radius results. |
A Matrix::dgCMatrix.
Coerce bigKNN outputs to sparse-triplet form
as_triplet(x, include_distance = TRUE)as_triplet(x, include_distance = TRUE)
x |
A bigKNN result or graph object. |
include_distance |
Logical flag controlling whether distances are kept when coercing raw kNN or radius results. |
A triplet list with components i, j, x, and Dim.
Count neighbours within a fixed radius
count_within_radius_bigmatrix( x, query = NULL, radius, metric = "euclidean", block_size = knn_default_block_size(), plan = NULL, exclude_self = is.null(query) )count_within_radius_bigmatrix( x, query = NULL, radius, metric = "euclidean", block_size = knn_default_block_size(), plan = NULL, exclude_self = is.null(query) )
x |
A |
query |
Optional query source. Supply |
radius |
Distance threshold for including a neighbour. |
metric |
Distance metric. Supported values are |
block_size |
Number of rows to process per query and reference block. |
plan |
Optional execution plan returned by |
exclude_self |
Logical flag controlling whether a query row may return
itself as a neighbour when |
An integer vector with one count per query row.
bigmemory::big.matrix
Exact k-nearest neighbours for bigmemory::big.matrix
knn_bigmatrix( x, query = NULL, k = 10L, metric = "euclidean", block_size = knn_default_block_size(), plan = NULL, exclude_self = is.null(query) )knn_bigmatrix( x, query = NULL, k = 10L, metric = "euclidean", block_size = knn_default_block_size(), plan = NULL, exclude_self = is.null(query) )
x |
A |
query |
Optional query source. Supply |
k |
Number of neighbours to return. |
metric |
Distance metric. Supported values are |
block_size |
Number of rows to process per query and reference block. |
plan |
Optional execution plan returned by |
exclude_self |
Logical flag controlling whether a query row may return
itself as a neighbour when |
A list with components index, distance, k, metric, n_ref,
n_query, exact, and backend.
bigmemory::big.matrix
Build an exact kNN graph from a bigmemory::big.matrix
knn_graph_bigmatrix( x, k = 10L, metric = "euclidean", block_size = knn_default_block_size(), plan = NULL, include_distance = TRUE, format = c("edge_list", "triplet", "dgCMatrix"), symmetrize = c("none", "union", "mutual"), exclude_self = TRUE )knn_graph_bigmatrix( x, k = 10L, metric = "euclidean", block_size = knn_default_block_size(), plan = NULL, include_distance = TRUE, format = c("edge_list", "triplet", "dgCMatrix"), symmetrize = c("none", "union", "mutual"), exclude_self = TRUE )
x |
A |
k |
Number of neighbours per row. |
metric |
Distance metric. Supported values are |
block_size |
Number of rows to process per query and reference block. |
plan |
Optional execution plan returned by |
include_distance |
Logical flag controlling whether kNN graph edges store distances or unit weights. |
format |
Output format. One of |
symmetrize |
How directed kNN edges should be combined. One of
|
exclude_self |
Logical flag controlling whether self loops are suppressed in the directed kNN graph. |
An edge list, a triplet list, or a Matrix::dgCMatrix, depending on
the requested format.
big.matrix objectsStream a directed exact kNN graph into destination big.matrix objects
knn_graph_stream_bigmatrix( x, k, xpFrom, xpTo, xpValue = NULL, metric = "euclidean", plan = NULL, block_size = knn_default_block_size(), include_distance = TRUE, checkpoint_path = NULL )knn_graph_stream_bigmatrix( x, k, xpFrom, xpTo, xpValue = NULL, metric = "euclidean", plan = NULL, block_size = knn_default_block_size(), include_distance = TRUE, checkpoint_path = NULL )
x |
A |
k |
Number of neighbours per row. |
xpFrom |
Writable single-column |
xpTo |
Writable single-column |
xpValue |
Optional writable single-column |
metric |
Distance metric. Supported values are |
plan |
Optional execution plan returned by |
block_size |
Number of rows to process per query and reference block. |
include_distance |
Logical flag controlling whether |
checkpoint_path |
Optional path for a resumable job checkpoint. |
An object of class "bigknn_job".
Load a serialized prepared reference
knn_load_prepared(cache_path)knn_load_prepared(cache_path)
cache_path |
Path previously written by |
An object of class "bigknn_prepared".
Build an execution plan for exact search
knn_plan_bigmatrix( x, metric = "euclidean", memory_budget = "2GB", num_threads = getOption("bigKNN.num_threads", 1L), progress = getOption("bigKNN.progress", interactive()) )knn_plan_bigmatrix( x, metric = "euclidean", memory_budget = "2GB", num_threads = getOption("bigKNN.num_threads", 1L), progress = getOption("bigKNN.progress", interactive()) )
x |
A |
metric |
Distance metric. Supported values are |
memory_budget |
Memory budget expressed in bytes or a compact size string
such as |
num_threads |
Requested thread count forwarded to common BLAS/OpenMP environment variables during execution. |
progress |
Logical flag controlling progress reporting for plan-aware calls. |
An object of class "bigknn_plan".
bigmemory::big.matrix reference for repeated exact searchPrepare a bigmemory::big.matrix reference for repeated exact search
knn_prepare_bigmatrix( x, metric = "euclidean", block_size = knn_default_block_size(), plan = NULL, validate = TRUE, cache_path = NULL )knn_prepare_bigmatrix( x, metric = "euclidean", block_size = knn_default_block_size(), plan = NULL, validate = TRUE, cache_path = NULL )
x |
A |
metric |
Distance metric. Supported values are |
block_size |
Number of rows to process per query and reference block. |
plan |
Optional execution plan returned by |
validate |
Logical flag controlling whether the preparation pass checks for finite, metric-compatible rows while building the cache. |
cache_path |
Optional path where a serializable prepared-reference cache
should be written with |
An object of class "bigknn_prepared" containing the reference
pointer, metric-specific row cache, and metadata reused by later exact
search calls.
Search a prepared exact reference
knn_search_prepared( ref, query = NULL, k = 10L, block_size = knn_default_block_size(), plan = NULL, exclude_self = is.null(query) )knn_search_prepared( ref, query = NULL, k = 10L, block_size = knn_default_block_size(), plan = NULL, exclude_self = is.null(query) )
ref |
A prepared reference returned by |
query |
Optional query source. Supply |
k |
Number of neighbours to return. |
block_size |
Number of rows to process per query and reference block. |
plan |
Optional execution plan returned by |
exclude_self |
Logical flag controlling whether a query row may return
itself as a neighbour when |
A list with components index, distance, k, metric, n_ref,
n_query, exact, and backend.
big.matrix objectsStream prepared exact search results into destination big.matrix objects
knn_search_stream_prepared( ref, query = NULL, xpIndex, xpDistance = NULL, k = 10L, block_size = knn_default_block_size(), plan = NULL, exclude_self = is.null(query) )knn_search_stream_prepared( ref, query = NULL, xpIndex, xpDistance = NULL, k = 10L, block_size = knn_default_block_size(), plan = NULL, exclude_self = is.null(query) )
ref |
A prepared reference returned by |
query |
Optional query source. Supply |
xpIndex |
A writable |
xpDistance |
Optional writable |
k |
Number of neighbours to return. |
block_size |
Number of rows to process per query and reference block. |
plan |
Optional execution plan returned by |
exclude_self |
Logical flag controlling whether a query row may return
itself as a neighbour when |
A list with components index, distance, k, metric, n_ref,
n_query, exact, and backend. The index and distance entries
reference the supplied destination objects.
big.matrix objectsStream exact k-nearest neighbours into destination big.matrix objects
knn_stream_bigmatrix( x, query = NULL, xpIndex, xpDistance = NULL, k = 10L, metric = "euclidean", block_size = knn_default_block_size(), plan = NULL, exclude_self = is.null(query) )knn_stream_bigmatrix( x, query = NULL, xpIndex, xpDistance = NULL, k = 10L, metric = "euclidean", block_size = knn_default_block_size(), plan = NULL, exclude_self = is.null(query) )
x |
A |
query |
Optional query source. Supply |
xpIndex |
A writable |
xpDistance |
Optional writable |
k |
Number of neighbours to return. |
metric |
Distance metric. Supported values are |
block_size |
Number of rows to process per query and reference block. |
plan |
Optional execution plan returned by |
exclude_self |
Logical flag controlling whether a query row may return
itself as a neighbour when |
A list with components index, distance, k, metric, n_ref,
n_query, exact, and backend. The index and distance entries
reference the supplied destination objects.
Validate a prepared reference
knn_validate_prepared(ref)knn_validate_prepared(ref)
ref |
A prepared reference returned by |
Invisibly returns TRUE when the prepared reference is valid.
bigmemory::big.matrix
Build an exact mutual kNN graph from a bigmemory::big.matrix
mutual_knn_graph_bigmatrix( x, k = 10L, metric = "euclidean", block_size = knn_default_block_size(), plan = NULL, include_distance = TRUE, format = c("edge_list", "triplet", "dgCMatrix") )mutual_knn_graph_bigmatrix( x, k = 10L, metric = "euclidean", block_size = knn_default_block_size(), plan = NULL, include_distance = TRUE, format = c("edge_list", "triplet", "dgCMatrix") )
x |
A |
k |
Number of neighbours per row. |
metric |
Distance metric. Supported values are |
block_size |
Number of rows to process per query and reference block. |
plan |
Optional execution plan returned by |
include_distance |
Logical flag controlling whether graph edges store distances or unit weights. |
format |
Output format. One of |
An edge list, a triplet list, or a Matrix::dgCMatrix, depending on
the requested format.
bigmemory::big.matrix
Exact radius search for bigmemory::big.matrix
radius_bigmatrix( x, query = NULL, radius, metric = "euclidean", block_size = knn_default_block_size(), plan = NULL, exclude_self = is.null(query), sort = TRUE )radius_bigmatrix( x, query = NULL, radius, metric = "euclidean", block_size = knn_default_block_size(), plan = NULL, exclude_self = is.null(query), sort = TRUE )
x |
A |
query |
Optional query source. Supply |
radius |
Distance threshold for including a neighbour. |
metric |
Distance metric. Supported values are |
block_size |
Number of rows to process per query and reference block. |
plan |
Optional execution plan returned by |
exclude_self |
Logical flag controlling whether a query row may return
itself as a neighbour when |
sort |
Logical flag controlling whether each query's matches are sorted by distance and then by index. |
A list with components index, distance, offset, n_match,
radius, metric, n_ref, n_query, exact, and backend.
bigmemory::big.matrix
Build an exact radius graph from a bigmemory::big.matrix
radius_graph_bigmatrix( x, radius, metric = "euclidean", plan = NULL, block_size = knn_default_block_size(), include_distance = TRUE, format = c("edge_list", "triplet", "dgCMatrix"), symmetrize = c("none", "union", "mutual"), exclude_self = TRUE, sort = TRUE )radius_graph_bigmatrix( x, radius, metric = "euclidean", plan = NULL, block_size = knn_default_block_size(), include_distance = TRUE, format = c("edge_list", "triplet", "dgCMatrix"), symmetrize = c("none", "union", "mutual"), exclude_self = TRUE, sort = TRUE )
x |
A |
radius |
Distance threshold for including an edge. |
metric |
Distance metric. Supported values are |
plan |
Optional execution plan returned by |
block_size |
Number of rows to process per query and reference block. |
include_distance |
Logical flag controlling whether graph edges store distances or unit weights. |
format |
Output format. One of |
symmetrize |
How directed radius edges should be combined. One of
|
exclude_self |
Logical flag controlling whether self loops are suppressed. |
sort |
Logical flag controlling whether each query's matches are sorted by distance and then by index. |
A graph representation in the requested format.
big.matrix objectsStream exact radius-search results into destination big.matrix objects
radius_stream_bigmatrix( x, query = NULL, xpIndex, xpDistance = NULL, xpOffset, radius, metric = "euclidean", block_size = knn_default_block_size(), plan = NULL, exclude_self = is.null(query), sort = TRUE )radius_stream_bigmatrix( x, query = NULL, xpIndex, xpDistance = NULL, xpOffset, radius, metric = "euclidean", block_size = knn_default_block_size(), plan = NULL, exclude_self = is.null(query), sort = TRUE )
x |
A |
query |
Optional query source. Supply |
xpIndex |
A writable single-column |
xpDistance |
Optional writable single-column |
xpOffset |
A writable single-column |
radius |
Distance threshold for including a neighbour. |
metric |
Distance metric. Supported values are |
block_size |
Number of rows to process per query and reference block. |
plan |
Optional execution plan returned by |
exclude_self |
Logical flag controlling whether a query row may return
itself as a neighbour when |
sort |
Logical flag controlling whether each query's matches are sorted by distance and then by index. |
A list with components index, distance, offset, n_match,
radius, metric, n_ref, n_query, exact, and backend. The
index, distance, and offset entries reference the supplied
destination objects.
big.matrix objects with checkpointsStream exact radius-search results into destination big.matrix objects with checkpoints
radius_stream_job_bigmatrix( x, query = NULL, xpIndex, xpDistance = NULL, xpOffset, radius, metric = "euclidean", plan = NULL, block_size = knn_default_block_size(), exclude_self = is.null(query), sort = TRUE, checkpoint_path = NULL )radius_stream_job_bigmatrix( x, query = NULL, xpIndex, xpDistance = NULL, xpOffset, radius, metric = "euclidean", plan = NULL, block_size = knn_default_block_size(), exclude_self = is.null(query), sort = TRUE, checkpoint_path = NULL )
x |
A |
query |
Optional query source. Supply |
xpIndex |
A writable single-column |
xpDistance |
Optional writable single-column |
xpOffset |
A writable single-column |
radius |
Distance threshold for including a neighbour. |
metric |
Distance metric. Supported values are |
plan |
Optional execution plan returned by |
block_size |
Number of query rows to process per block. |
exclude_self |
Logical flag controlling whether self matches are removed
when |
sort |
Logical flag controlling whether each query's matches are sorted by distance and then by index. |
checkpoint_path |
Optional path for a resumable job checkpoint. |
An object of class "bigknn_job".
Compare approximate neighbours to exact truth
recall_against_exact(exact, approx_index, k = NULL)recall_against_exact(exact, approx_index, k = NULL)
exact |
Exact kNN output or index matrix. |
approx_index |
Approximate neighbour index matrix or result object. |
k |
Optional number of neighbours to compare. |
An object of class "bigknn_recall".
bigmemory::big.matrix
Rerank candidate neighbours exactly against a bigmemory::big.matrix
rerank_candidates_bigmatrix( x, query, candidate_index, metric = "euclidean", top_k = NULL, plan = NULL, block_size = knn_default_block_size(), exclude_self = is.null(query) )rerank_candidates_bigmatrix( x, query, candidate_index, metric = "euclidean", top_k = NULL, plan = NULL, block_size = knn_default_block_size(), exclude_self = is.null(query) )
x |
A |
query |
Query source. Supply |
candidate_index |
Candidate neighbour indices supplied as a matrix,
|
metric |
Distance metric. Supported values are |
top_k |
Number of reranked neighbours to return. Defaults to all supplied candidate columns. |
plan |
Optional execution plan returned by |
block_size |
Number of query rows to process at a time. |
exclude_self |
Logical flag controlling whether self ids are removed
when |
An object of class "bigknn_knn_result".
Resume a checkpointed bigKNN job
resume_knn_job(checkpoint_path)resume_knn_job(checkpoint_path)
checkpoint_path |
Path previously created by
|
An object of class "bigknn_job".
bigmemory::big.matrix
Build an exact shared-nearest-neighbour graph from a bigmemory::big.matrix
snn_graph_bigmatrix( x, k = 10L, metric = "euclidean", block_size = knn_default_block_size(), plan = NULL, weight = c("count", "jaccard"), format = c("edge_list", "triplet", "dgCMatrix") )snn_graph_bigmatrix( x, k = 10L, metric = "euclidean", block_size = knn_default_block_size(), plan = NULL, weight = c("count", "jaccard"), format = c("edge_list", "triplet", "dgCMatrix") )
x |
A |
k |
Number of neighbours per row in the underlying exact kNN search. |
metric |
Distance metric. Supported values are |
block_size |
Number of rows to process per query and reference block. |
plan |
Optional execution plan returned by |
weight |
Shared-nearest-neighbour weight definition. One of |
format |
Output format. One of |
An edge list, a triplet list, or a Matrix::dgCMatrix, depending on
the requested format.