Approximate Nearest Neighbor Search
Approximate nearest neighbor (ANN) search is of great importance in various fields including databases, data mining, and machine learning. ANN is derived from k-nearest-neighbor search. In contrast to kNN methods, ANN methods deliver approximate results, but typically allow for much faster queries.
Search Graph Parameters
In order to build a search graph for ANN search, GGNN requires two parameters:
k_build: intThe number of outgoing edges per point in the dataset, typically \(k_{build} \in [20,96]\). The maximum number of edges per point is 512 and the minimum is 2. Higher values increase construction time and memory consumption but typically improve query performance.
tau_build: floatA cost factor, typically \(\tau_{build} \in [0.3,1.0]\) Higher values increase construction time but typically improve query performance.
Query Parameters
k_query: intThe number of nearest neighbors to search for. Typically, \(10-100\). Technically, up to \(6000\) neighbors can be searched for before reaching the shared memory limit.
Note
Due to the design of the stopping criterion, it is advisable to always search for at least 10 nearest neighbors, even when fewer results are required.
tau_query: floatA cost factor, typically \(\tau_{query} \in [0.7,2.0]\) Higher values increase query time but produce more accurate results. There are diminishing returns in query accuracy when increasing this value.
max_iterations: intA hard limit of search iterations to perform per query. Each iteration visits one point in the search graph. Typically, \(200-2000\) iterations are approximate.
Note
If increasing tau_query and max_iterations does not yield sufficient accuracy,
try increasing k_build and tau_build during search graph construction.
Caution
Increasing k_query and max_iterations increases the shared memory consumption
and may limit on-GPU parallelism (SM occupancy).
Increasing the query parameters too much can slow down the query significantly.
Distance Measures
GGNN supports Euclidean (L2) and Cosine distance measures.
The measure can be specified during search graph construction and query and will default to Euclidean.