A Low-Power Artificial Intelligence Framework based on Vector Symbolic Architectures

A large part of the development of this website was done by Denis Kleyko during the duration of Marie Skłodowska-Curie Global Fellowship action that was funded from the European Union's Horizon 2020 research and innovation programme agreement No 839179. Therefore, this page overviews actions's goal and directs to the thereotical and experimental findings discovered during the fellowship. 

Goal

Artificial Neural Networks (ANNs) form the main approach in Artificial Intelligence (AI). They have two major drawbacks, however: (1) ANNs require significant computational resources; (2) they lack transparency. These challenges restrict the widespread application of AI in daily life. The required resources prevent the use of ANNs on resource-constrained devices and the lack of transparency limits their adoption in many areas where transparency is critical. This project aims at addressing these challenges via development of Hyperdimensional Computing a.k.a. Vector Symbolic Architectures (HD/VSA): a transparent, bio-inspired framework for AI. With respect to the 1st challenge, HD/VSA have the potential to become a computational paradigm for emerging low-power computing hardware with huge potential for implementing AI algorithms. With respect to the 2nd challenge, HD/VSA are a promising framework for opening the black box of ANNs due to their predictable statistical properties. It is expected that HD/VSA will allow analytical characterization of a class of Recurrent ANNs. The overall research aim of this action is to improve the understanding of computing principles in high-dimensional spaces with HD/VSA, and to advance the theory and design principles of simple AI algorithms implementable on emerging low-power computing hardware. This project was funded 

Authors: Kleyko D., Davies M., Frady E. P., Kanerva P., Kent S., Olshausen B. A., Rabaey J. M., Osipov E., Rachkovskij D. A., Rahimi A., Sommer F. T.

Venue: Proceedings of the IEEE, vol. 110, no. 10, pp. 1538-1571, 2022

Abstract: This article reviews recent progress in the development of the computing framework vector symbolic architectures (VSA) (also known as hyperdimensional computing). This framework is well suited for implementation in stochastic, emerging hardware, and it naturally expresses the types of cognitive operations required for artificial intelligence (AI). We demonstrate in this article that the field-like algebraic structure of VSA offers simple but powerful operations on high-dimensional vectors that can support all data structures and manipulations relevant to modern computing. In addition, we illustrate the distinguishing feature of VSA, “computing in superposition,” which sets it apart from conventional computing. It also opens the door to efficient solutions to the difficult combinatorial search problems inherent in AI applications. We sketch ways of demonstrating that VSA are computationally universal. We see them acting as a framework for computing with distributed representations that can play a role of an abstraction layer for emerging computing hardware. This article serves as a reference for computer architects by illustrating the philosophy behind VSA, techniques of distributed computing with them, and their relevance to emerging computing hardware, such as neuromorphic computing.

Authors: Kleyko D., Rachkovskij D. A., Osipov E.,  Rahimi A.

Venue: ACM Computing Surveys, vol. 55, no. 6, pp. 1-40, 2022

Abstract: This two-part comprehensive survey is devoted to a computing framework most commonly known under the names Hyperdimensional Computing and Vector Symbolic Architectures (HD/VSA). Both names refer to a family of computational models that use high-dimensional distributed representations and rely on the algebraic properties of their key operations to incorporate the advantages of structured symbolic representations and distributed vector representations. Notable models in the HD/VSA family are Tensor Product Representations, Holographic Reduced Representations, Multiply-Add-Permute, Binary Spatter Codes, and Sparse Binary Distributed Representations but there are other models too. HD/VSA is a highly interdisciplinary field with connections to computer science, electrical engineering, artificial intelligence, mathematics, and cognitive science. This fact makes it challenging to create a thorough overview of the field. However, due to a surge of new researchers joining the field in recent years, the necessity for a comprehensive survey of the field has become extremely important. Therefore, amongst other aspects of the field, this Part I surveys important aspects such as: known computational models of HD/VSA and transformations of various input data types to high-dimensional distributed representations. Part II of this survey [84] is devoted to applications, cognitive computing and architectures, as well as directions for future work. The survey is written to be useful for both newcomers and practitioners.

Authors: Kleyko D., Rachkovskij D. A., Osipov E.,  Rahimi A.

Venue: ACM Computing Surveys, vol. 55, no. 9, pp. 1-52, 2023

Abstract: This is Part II of the two-part comprehensive survey devoted to a computing framework most commonly known under the names Hyperdimensional Computing and Vector Symbolic Architectures (HD/VSA). Both names refer to a family of computational models that use high-dimensional distributed representations and rely on the algebraic properties of their key operations to incorporate the advantages of structured symbolic representations and vector distributed representations. Holographic Reduced Representations [321, 326] is an influential HD/VSA model that is well known in the machine learning domain and often used to refer to the whole family. However, for the sake of consistency, we use HD/VSA to refer to the field. Part I of this survey [222] covered foundational aspects of the field, such as the historical context leading to the development of HD/VSA, key elements of any HD/VSA model, known HD/VSA models, and the transformation of input data of various types into high-dimensional vectors suitable for HD/VSA. This second part surveys existing applications, the role of HD/VSA in cognitive computing and architectures, as well as directions for future work. Most of the applications lie within the Machine Learning/Artificial Intelligence domain; however, we also cover other applications to provide a complete picture. The survey is written to be useful for both newcomers and practitioners.

Authors: Heddes M., Nunes I., Verges P., Kleyko D., Abraham D., Givargis T., Nicolau A., Veidenbaum A.

Venue: Journal of Machine Learning Research, vol. 24, pp. 1-10, 2023

Abstract: Hyperdimensional computing (HD), also known as vector symbolic architectures (VSA), is a framework for computing with distributed representations by exploiting properties of random high-dimensional vector spaces. The commitment of the scientific community to aggregate and disseminate research in this particularly multidisciplinary area has been fundamental for its advancement. Joining these efforts, we present Torchhd, a high-performance open source Python library for HD/VSA. Torchhd seeks to make HD/VSA more accessible and serves as an efficient foundation for further research and application development. The easy-to-use library builds on top of PyTorch and features state-of-the-art HD/VSA functionality, clear documentation, and implementation examples from well-known publications. Comparing publicly available code with their corresponding Torchhd implementation shows that experiments can run up to 100x faster.

Authors: Kleyko D., Frady E. P., Sommer F. T.

Venue: IEEE Transactions on Neural Networks and Learning Systems vol. 33, no. 6, pp. 2701-2713, 2022

Abstract: Various nonclassical approaches of distributed information processing, such as neural networks, reservoir computing (RC), vector symbolic architectures (HD/VSA), and others, employ the principle of collective-state computing. In this type of computing, the variables relevant in computation are superimposed into a single high-dimensional state vector, the collective state. The variable encoding uses a fixed set of random patterns, which has to be stored and kept available during the computation. In this article, we show that an elementary cellular automaton with rule 90 (CA90) enables the space–time tradeoff for collective- state computing models that use random dense binary representations, i.e., memory requirements can be traded off with computation running CA90. We investigate the randomization behavior of CA90, in particular, the relation between the length of the randomization period and the size of the grid, and how CA90 preserves similarity in the presence of the initialization noise. Based on these analyses, we discuss how to optimize a collective-state computing model, in which CA90 expands representations on the fly from short seed patterns—rather than storing the full set of random patterns. The CA90 expansion is applied and tested in concrete scenarios using RC and HD/VSA. Our experimental results show that collective-state computing with CA90 expansion performs similarly compared to traditional collective-state models, in which random patterns are generated initially by a pseudorandom number generator and then stored in a large memory.

Authors: Kleyko D., Karunaratne G., Rabaey J. M., Sebastian A., Rahimi A.

Venue: IEEE Transactions on Neural Networks and Learning Systems, 2022

Abstract: Memory-augmented neural networks enhance a neural network with an external key-value (KV) memory whose complexity is typically dominated by the number of support vectors in the key memory. We propose a generalized KV memory that decouples its dimension from the number of support vectors by introducing a free parameter that can arbitrarily add or remove redundancy to the key memory representation. In effect, it provides an additional degree of freedom to flexibly control the tradeoff between robustness and the resources required to store and compute the generalized KV memory. This is particularly useful for realizing the key memory on in-memory computing hardware where it exploits nonideal, but extremely efficient nonvolatile memory devices for dense storage and computation. Experimental results show that adapting this parameter on demand effectively mitigates up to 44% nonidealities, at equal accuracy and number of devices, without any need for neural network retraining.

Authors: Frady E. P., Kleyko D., Sommer F. T.

Venue: IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 5, pp. 2191-2204, 2023

Abstract: Variable binding is a cornerstone of symbolic reasoning and cognition. But how binding can be implemented in connectionist models has puzzled neuroscientists, cognitive psychologists, and neural network researchers for many decades. One type of connectionist model that naturally includes a binding operation is vector symbolic architectures (HD/VSA). In contrast to other proposals for variable binding, the binding operation in HD/VSA is dimensionality-preserving, which enables representing complex hierarchical data structures, such as trees, while avoiding a combinatoric expansion of dimensionality. Classical HD/VSA encode symbols by dense randomized vectors, in which information is distributed throughout the entire neuron population. By contrast, in the brain, features are encoded more locally, by the activity of single neurons or small groups of neurons, often forming sparse vectors of neural activation. Following Laiho et al. (2015), we explore symbolic reasoning with a special case of sparse distributed representations. Using techniques from compressed sensing, we first show that variable binding in classical HD/VSA is mathematically equivalent to tensor product binding between sparse feature vectors, another well-known binding operation which increases dimensionality. This theoretical result motivates us to study two dimensionality-preserving binding methods that include a reduction of the tensor matrix into a single sparse vector. One binding method for general sparse vectors uses random projections, the other, block-local circular convolution, is defined for sparse vectors with block structure, sparse block-codes. Our experiments reveal that block-local circular convolution binding has ideal properties, whereas random projection based binding also works, but is lossy. We demonstrate in example applications that a HD/VSA with block-local circular convolution and sparse block-codes reaches similar performance as classical HD/VSA. Finally, we discuss our results in the context of neuroscience and neural networks.

Authors: Kleyko D., Rosato A., Frady E. P., Panella M., Sommer F. T.

Venue: IEEE Transactions on Neural Networks and Learning Systems, 2023

Abstract: Multilayer neural networks set the current state of the art for many technical classification problems. But, these networks are still, essentially, black boxes in terms of analyzing them and predicting their performance. Here, we develop a statistical theory for the one-layer perceptron and show that it can predict performances of a surprisingly large variety of neural networks with different architectures. A general theory of classification with perceptrons is developed by generalizing an existing theory for analyzing reservoir computing models and connectionist models for symbolic reasoning known as vector symbolic architectures. Our statistical theory offers three formulas leveraging the signal statistics with increasing detail. The formulas are analytically intractable, but can be evaluated numerically. The description level that captures maximum details requires stochastic sampling methods. Depending on the network model, the simpler formulas already yield high prediction accuracy. The quality of the theory predictions is assessed in three experimental settings, a memorization task for echo state networks (ESNs) from reservoir computing literature, a collection of classification datasets for shallow randomly connected networks, and the ImageNet dataset for deep convolutional neural networks. We find that the second description level of the perceptron theory can predict the performance of types of ESNs, which could not be described previously. Furthermore, the theory can predict deep multilayer neural networks by being applied to their output layer. While other methods for prediction of neural networks performance commonly require to train an estimator model, the proposed theory requires only the first two moments of the distribution of the postsynaptic sums in the output neurons. Moreover, the perceptron theory compares favorably to other methods that do not rely on training an estimator model.

Authors: Teeters J. L., Kleyko D., Kanerva P., Olshausen B. A.

Venue: Frontiers in Neuroscience, vol. 16, pp. 1-19, 2023

Abstract: Operations on high-dimensional, fixed-width vectors can be used to distribute information from several vectors over a single vector of the same width. For example, a set of key-value pairs can be encoded into a single vector with multiplication and addition of the corresponding key and value vectors: the keys are bound to their values with component-wise multiplication, and the key-value pairs are combined into a single superposition vector with component-wise addition. The superposition vector is, thus, a memory which can then be queried for the value of any of the keys, but the result of the query is approximate. The exact vector is retrieved from a codebook (a.k.a. item memory), which contains vectors defined in the system. To perform these operations, the item memory vectors and the superposition vector must be the same width. Increasing the capacity of the memory requires increasing the width of the superposition and item memory vectors. In this article, we demonstrate that in a regime where many (e.g., 1,000 or more) key-value pairs are stored, an associative memory which maps key vectors to value vectors requires less memory and less computing to obtain the same reliability of storage as a superposition vector. These advantages are obtained because the number of storage locations in an associate memory can be increased without increasing the width of the vectors in the item memory. An associative memory would not replace a superposition vector as a medium of storage, but could augment it, because data recalled from an associative memory could be used in algorithms that use a superposition vector. This would be analogous to how human working memory (which stores about seven items) uses information recalled from long-term memory (which is much larger than the working memory). We demonstrate the advantages of an associative memory experimentally using the storage of large finite-state automata, which could model the storage and recall of state-dependent behavior by brains.

Authors: Diao C., Kleyko D., Rabaey J. M., Olshausen B. A.

Venue: International Joint Conference on Neural Networks (IJCNN),  pp. 1-9, 2021

Abstract: Machine learning algorithms deployed on edge devices must meet certain resource constraints and efficiency requirements. Random Vector Functional Link (RVFL) networks are favored for such applications due to their simple design and training efficiency. We propose a modified RVFL network that avoids computationally expensive matrix operations during training, thus expanding the network’s range of potential applications. Our modification replaces the least-squares classifier with the Generalized Learning Vector Quantization (GLVQ) classifier, which only employs simple vector and distance calculations. The GLVQ classifier can also be considered an improvement upon certain classification algorithms popularly used in the area of Hyperdimensional Computing. The proposed approach achieved state-of-the-art accuracy on a collection of datasets from the UCI Machine Learning Repository—higher than previously proposed RVFL networks. We further demonstrate that our approach still achieves high accuracy while severely limited in training iterations (using on average only 21% of the least-squares classifier computational costs).

Authors: Kleyko D., Bybee C., Huang P.-C., Kymn C. J., Olshausen B. A., Frady E. P., Sommer F. T.

Venue: Neural Computation, vol. 35, no. 7,  pp. 1159-1186, 2023

Abstract: We investigate the task of retrieving information from compositional distributed representations formed by Hyperdimensional Computing/Vector Symbolic Architectures and present novel techniques which achieve new information rate bounds. First, we provide an overview of the decoding techniques that can be used to approach the retrieval task. The techniques are categorized into four groups. We then evaluate the considered techniques in several settings that involve, e.g., inclusion of external noise and storage elements with reduced precision. In particular, we find that the decoding techniques from the sparse coding and compressed sensing literature (rarely used for Hyperdimensional Computing/Vector Symbolic Architectures) are also well-suited for decoding information from the compositional distributed representations. Combining these decoding techniques with interference cancellation ideas from communications improves previously reported bounds (Hersche et al., 2021) of the information rate of the distributed representations from 1.20 to 1.40 bits per dimension for smaller codebooks and from 0.60 to 1.26 bits per dimension for larger codebooks.

Authors: Kleyko D., Frady E. P., Kheffache M., Osipov E.

Venue: IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 4, pp. 1688-1701, 2022

Abstract: We propose an approximation of Echo State Networks (ESN) that can be efficiently implemented on digital hardware based on the mathematics of hyperdimensional computing. The reservoir of the proposed integer Echo State Network (intESN) is a vector containing only n-bits integers (where n<8 is normally sufficient for a satisfactory performance). The recurrent matrix multiplication is replaced with an efficient cyclic shift operation. The proposed intESN approach is verified with typical tasks in reservoir computing: memorizing of a sequence of inputs; classifying time-series; learning dynamic processes. Such architecture results in dramatic improvements in memory footprint and computational efficiency, with minimal performance loss. The experiments on a field-programmable gate array confirm that the proposed intESN approach is much more energy efficient than the conventional ESN.

Authors: Kleyko D., Kheffache M., Frady E. P., Wiklund U., Osipov E.

Venue: IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 8, pp. 3777-3783, 2021

Abstract: The deployment of machine learning algorithms on resource-constrained edge devices is an important challenge from both theoretical and applied points of view. In this article, we focus on resource-efficient randomly connected neural networks known as Random Vector Functional Link (RVFL) networks since their simple design and extremely fast training time make them very attractive for solving many applied classification tasks. We propose to represent input features via the density-based encoding known in the area of stochastic computing and use the operations of binding and bundling from the area of hyperdimensional computing for obtaining the activations of the hidden neurons. Using a collection of 121 real-world datasets from the UCI Machine Learning Repository, we empirically show that the proposed approach demonstrates higher average accuracy than the conventional RVFL. We also demonstrate that it is possible to represent the readout matrix using only integers in a limited range with minimal loss in the accuracy. In this case, the proposed approach operates only on small n-bits integers, which results in a computationally efficient architecture. Finally, through hardware Field-Programmable Gate Array (FPGA) implementations, we show that such an approach consumes approximately eleven times less energy than that of the conventional RVFL.