Package: bigANNOY 0.3.0

bigANNOY: Approximate k-Nearest Neighbour Search for 'bigmemory' Matrices with Annoy

Approximate Euclidean k-nearest neighbour search routines that operate on 'bigmemory::big.matrix' data through Annoy indexes created with 'RcppAnnoy'. The package builds persistent on-disk indexes plus sidecar metadata from streamed 'big.matrix' rows, supports euclidean, angular, Manhattan, and dot-product Annoy metrics, and can either return in-memory results or stream neighbour indices and distances into destination 'bigmemory' matrices. Explicit index lifecycle helpers, stronger metadata validation, descriptor-aware file-backed workflows, and benchmark helpers are also included.

Authors:Frederic Bertrand [aut, cre]

bigANNOY_0.3.0.tar.gz
bigANNOY_0.3.0.zip(r-4.7)bigANNOY_0.3.0.zip(r-4.6)bigANNOY_0.3.0.zip(r-4.5)
bigANNOY_0.3.0.tgz(r-4.6-x86_64)bigANNOY_0.3.0.tgz(r-4.6-arm64)bigANNOY_0.3.0.tgz(r-4.5-x86_64)bigANNOY_0.3.0.tgz(r-4.5-arm64)
bigANNOY_0.3.0.tar.gz(r-4.7-arm64)bigANNOY_0.3.0.tar.gz(r-4.7-x86_64)bigANNOY_0.3.0.tar.gz(r-4.6-arm64)bigANNOY_0.3.0.tar.gz(r-4.6-x86_64)
bigANNOY_0.3.0.tgz(r-4.6-emscripten)
manual.pdf |manual.html
DESCRIPTION |NEWS
card.svg |card.png
bigANNOY/json (API)

# Install 'bigANNOY' in R:
install.packages('bigANNOY', repos = c('https://fbertran.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/fbertran/bigannoy/issues

Pkgdown/docs site:https://fbertran.github.io

Uses libs:
  • c++– GNU Standard C++ Library v3

On CRAN:

Conda:

cpp

4.98 score 1 stars 27 scripts 472 downloads 11 exports 6 dependencies

Last updated from:9597b60592. Checks:12 OK. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-x86_64OK138
source / vignettesOK225
linux-release-arm64OK128
linux-release-x86_64OK132
macos-release-arm64OK92
macos-release-x86_64OK233
macos-oldrel-arm64OK129
macos-oldrel-x86_64OK206
windows-develOK144
windows-releaseOK127
windows-oldrelOK139
wasm-releaseOK121

Exports:annoy_build_bigmatrixannoy_close_indexannoy_is_loadedannoy_load_bigmatrixannoy_open_indexannoy_search_bigmatrixannoy_validate_indexbenchmark_annoy_bigmatrixbenchmark_annoy_recall_suitebenchmark_annoy_volume_suitebenchmark_annoy_vs_rcppannoy

Dependencies:BHbigmemorybigmemory.sriRcppRcppAnnoyuuid

Benchmarking Recall and Latency
What the Benchmark Helpers Do | Load the Package | Create a Benchmark Workspace | A Single Synthetic Benchmark Run | Validation Is Part of the Benchmark Workflow | External-Query Versus Self-Search Benchmarks | Benchmark a Recall Suite Across Parameter Grids | Optional Exact Recall Against bigKNN | Benchmark User-Supplied Data | Compare bigANNOY with Direct RcppAnnoy | Benchmark Scaling by Data Volume | Interpreting the Main Summary Columns | Installed Benchmark Runner | Recommended Workflow | Recap

Last update: 2026-03-27
Started: 2026-03-26

bigANNOY Versus bigKNN
The Core Difference | When To Use Which Package | Shared Result Shape | Load the Packages You Need | A Small Comparison Dataset | Approximate Search with bigANNOY | Exact Search with bigKNN When Available | What Does "Aligned Result Shape" Buy You? | Why bigANNOY Still Matters When bigKNN Exists | Benchmark Integration | A Practical Decision Framework | Important Boundaries | Recap

Last update: 2026-03-27
Started: 2026-03-26

File-Backed bigmemory Workflows
Load the Packages | Create a Small File-Backed Workspace | Build a File-Backed Reference Matrix | Build an Annoy Index from a Descriptor Path | Accepted File-Oriented Input Forms | Query with a File-Backed big.matrix | Query with a Descriptor Object and a Descriptor Path | Stream Results into File-Backed Destination Matrices | Reattach the Output Files Later | Separated-Column Query Matrices | Persisted Reference, Persisted Index, Persisted Outputs | Practical Tips | Recap

Last update: 2026-03-27
Started: 2026-03-26

Getting Started with bigANNOY
Load the Packages | Create a Small Reference Matrix | Build the First Annoy Index | Run a Self-Search | Search with an External Query Matrix | Tune the Main Search Controls | Stream Results into big.matrix Outputs | Reopen and Validate a Persisted Index | What Inputs Are Accepted? | Recap

Last update: 2026-03-27
Started: 2026-03-26

Metrics and Tuning
Load the Packages | A Small Dataset for Metric Comparisons | Supported Metrics | Compare Metrics on the Same Queries | Build-Time Controls | n_trees | seed | build_threads | block_size | load_mode | Query-Time Controls | k | search_k | prefault | Use the Benchmark Helpers to Tune n_trees and search_k | Package-Level Defaults | A Practical Tuning Pattern | Recap

Last update: 2026-03-27
Started: 2026-03-26

Persistent Indexes and Lifecycle
Why Lifecycle Management Matters | Load the Packages | Build an Index in Lazy Mode | Inspect the Sidecar Metadata | Lazy Loading Versus Eager Loading | Validate Without Loading | Validate and Load Explicitly | Close a Loaded Handle Explicitly | Reopen the Same Index in a New Object | Lifecycle State Lives in the Session Object | What Happens If Validation Fails? | Recommended Workflow | Recap

Last update: 2026-03-27
Started: 2026-03-26

Validation and Sharing Indexes
Load the Packages | Create a Small Persisted Example | What the Metadata Records | Validate Before You Use a Persisted Index | What Counts as an Error Versus a Warning | Reopen the Index as a Separate Session Object | Sharing Checklist | Simulate Sharing by Copying the Persisted Files | Non-Strict Validation for Diagnostics | Strict Validation as a Gate | A Common Sharing Pitfall: Renaming Only the .ann File | Recommended Sharing Pattern | Recap

Last update: 2026-03-27
Started: 2026-03-26