Abstract
The main aim of this paper is to develop a suitable regression analysis model for describing the relationship between the index efficiency and the parameters of the Rival Penalized Competitive Learning Binary Tree (RPCL-b-tree). RPCL-b-tree is a hierarchical indexing structure built with a hierarchical RPCL clustering implementation, which transforms the feature space into a sequence of nested clusters. Based on the RPCL-b-tree, the efficient Nearest-Neighbor search for a query can be performed with the branch-and-bound algorithm. The index efficiency of an RPCL-b-tree relates to a set of parameters: leaf node size of the tree, number of retrieved objects per search, feature dimensionality and database size. To formulate this relationship, we develop a nonlinear regression model in this paper. This regression model includes two components. One is used to describe the relationship between index efficiency and the number of retrieved objects per search; another is to describe the relationship between index efficiency and leaf node size of an RPCL-b-tree. In both of these two components, we consider the influence from the database size and the feature dimensionality. Our experimental results show that the proposed regression model has a high convergibility and high generalization ability. Moreover, this regression model is explainable on its coefficients, whose values directly reflect the index efficiency of an RPCL-b-tree. Depending on the regression model and its estimated coefficients, we can easily analyze the index efficiency of an RPCL-b-tree to be built. Because the parameters of the different kinds of indexing structures are very similar, this model also is suitable to analyze the other kinds of indexing structures. Thus, it is a powerful tool to construct the optimal RPCL-b-tree or other kinds of indexing structures for a database.