River is an open-source Python library for training neural networks in continuous mode, including data transformation methods, as well as training and optimization algorithms. River is suitable for deploying models that are trained on streaming data.
Adding new data to train conventional machine learning algorithms, such as linear regression and xgboost, requires retraining from scratch on both old and new data. In many cases, this is not possible: for example, there may not be enough memory to store a full dataset, or the model may be too slow to retrain. When training in continuous mode, data is transmitted to the model sequentially. Continuous learning is suitable for applications such as training models on large datasets, spam filtering, recommendation systems and the Internet of Things.
River is the result of combining the most used functions of the creme and scikit-multiflow libraries. The library contains a variety of continuous learning algorithms for regression, classification, and clustering problems, including a naive Bayesian classifier, tree ensembles, factorization machines, linear models, and others. River also includes drift detection algorithms, methods for processing unbalanced data sets and detecting anomalies. For continuous learning tasks, River is an order of magnitude faster than PyTorch, Tensorflow and scikit-learn.