# mlpack **Repository Path**: imbit_mathmhb/mlpack ## Basic Information - **Project Name**: mlpack - **Description**: mlpack: a scalable C++ machine learning library - **Primary Language**: Unknown - **License**: BSD-3-Clause - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2019-02-03 - **Last Updated**: 2024-06-17 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

mlpack: a fast, flexible machine learning library
a fast, flexible machine learning library

Home | Documentation | Community | Help | IRC Chat

Jenkins Appveyor Coveralls License

Download: current stable version (3.0.4)

**mlpack** is an intuitive, fast, and flexible C++ machine learning library with bindings to other languages. It is meant to be a machine learning analog to LAPACK, and aims to implement a wide array of machine learning methods and functions as a "swiss army knife" for machine learning researchers. In addition to its powerful C++ interface, mlpack also provides command-line programs and Python bindings. ### 0. Contents 1. [Introduction](#1-introduction) 2. [Citation details](#2-citation-details) 3. [Dependencies](#3-dependencies) 4. [Building mlpack from source](#4-building-mlpack-from-source) 5. [Running mlpack programs](#5-running-mlpack-programs) 6. [Using mlpack from Python](#6-using-mlpack-from-python) 7. [Further documentation](#7-further-documentation) 8. [Bug reporting](#8-bug-reporting) ### 1. Introduction The mlpack website can be found at http://www.mlpack.org and it contains numerous tutorials and extensive documentation. This README serves as a guide for what mlpack is, how to install it, how to run it, and where to find more documentation. The website should be consulted for further information: - [mlpack homepage](http://www.mlpack.org/) - [Tutorials](http://www.mlpack.org/docs/mlpack-git/doxygen/tutorials.html) - [Development Site (Github)](http://www.github.com/mlpack/mlpack/) - [API documentation](http://www.mlpack.org/docs/mlpack-git/doxygen/index.html) ### 2. Citation details If you use mlpack in your research or software, please cite mlpack using the citation below (given in BibTeX format): @article{mlpack2018, title = {mlpack 3: a fast, flexible machine learning library}, author = {Curtin, Ryan R. and Edel, Marcus and Lozhnikov, Mikhail and Mentekidis, Yannis and Ghaisas, Sumedh and Zhang, Shangtong}, journal = {Journal of Open Source Software}, volume = {3}, issue = {26}, pages = {726}, year = {2018}, doi = {10.21105/joss.00726}, url = {https://doi.org/10.21105/joss.00726} } Citations are beneficial for the growth and improvement of mlpack. ### 3. Dependencies mlpack has the following dependencies: Armadillo >= 6.500.0 Boost (program_options, math_c99, unit_test_framework, serialization, spirit) CMake >= 3.3.2 All of those should be available in your distribution's package manager. If not, you will have to compile each of them by hand. See the documentation for each of those packages for more information. If you would like to use or build the mlpack Python bindings, make sure that the following Python packages are installed: setuptools cython >= 0.24 numpy pandas >= 0.15.0 If you are compiling Armadillo by hand, ensure that LAPACK and BLAS are enabled. ### 4. Building mlpack from source This section discusses how to build mlpack from source. However, mlpack is in the repositories of many Linux distributions and so it may be easier to use the package manager for your system. For example, on Ubuntu, you can install mlpack with the following command: $ sudo apt-get install libmlpack-dev Note: Older Ubuntu versions may not have the most recent version of mlpack available---for instance, at the time of this writing, Ubuntu 16.04 only has mlpack 2.0.1 available. Options include upgrading your Ubuntu version, finding a PPA or other non-official sources, or installing with a manual build. There are some other useful pages to consult in addition to this section: - [Building mlpack From Source](http://www.mlpack.org/docs/mlpack-git/doxygen/build.html) - [Building mlpack From Source on Windows](http://www.mlpack.org/docs/mlpack-git/doxygen/build_windows.html) mlpack uses CMake as a build system and allows several flexible build configuration options. One can consult any of numerous CMake tutorials for further documentation, but this tutorial should be enough to get mlpack built and installed. First, unpack the mlpack source and change into the unpacked directory. Here we use mlpack-x.y.z where x.y.z is the version. $ tar -xzf mlpack-x.y.z.tar.gz $ cd mlpack-x.y.z Then, make a build directory. The directory can have any name, not just 'build', but 'build' is sufficient. $ mkdir build $ cd build The next step is to run CMake to configure the project. Running CMake is the equivalent to running `./configure` with autotools. If you run CMake with no options, it will configure the project to build with no debugging symbols and no profiling information: $ cmake ../ You can specify options to compile with debugging information and profiling information: $ cmake -D DEBUG=ON -D PROFILE=ON ../ Options are specified with the -D flag. A list of options allowed: DEBUG=(ON/OFF): compile with debugging symbols PROFILE=(ON/OFF): compile with profiling symbols ARMA_EXTRA_DEBUG=(ON/OFF): compile with extra Armadillo debugging symbols BOOST_ROOT=(/path/to/boost/): path to root of boost installation ARMADILLO_INCLUDE_DIR=(/path/to/armadillo/include/): path to Armadillo headers ARMADILLO_LIBRARY=(/path/to/armadillo/libarmadillo.so): Armadillo library BUILD_CLI_EXECUTABLES=(ON/OFF): whether or not to build command-line programs BUILD_PYTHON_BINDINGS=(ON/OFF): whether or not to build Python bindings BUILD_TESTS=(ON/OFF): whether or not to build tests USE_OPENMP=(ON/OFF): whether or not to use OpenMP if available Other tools can also be used to configure CMake, but those are not documented here. See [this section of the build guide](http://www.mlpack.org/docs/mlpack-git/doxygen/build.html#build_config) for more details. By default, command-line programs will be built, and if the Python dependencies (Cython, setuptools, numpy, pandas) are available, then Python bindings will also be built. OpenMP will be used for parallelization when possible by default. Once CMake is configured, building the library is as simple as typing 'make'. This will build all library components as well as 'mlpack_test'. $ make You can specify individual components which you want to build, if you do not want to build everything in the library: $ make mlpack_pca mlpack_knn mlpack_kfn If the build fails and you cannot figure out why, register an account on Github and submit an issue; the mlpack developers will quickly help you figure it out: [mlpack on Github](https://www.github.com/mlpack/mlpack/) Alternately, mlpack help can be found in IRC at `#mlpack` on irc.freenode.net. If you wish to install mlpack to `/usr/local/include/mlpack/` and `/usr/local/lib/` and `/usr/local/bin/`, once it has built, make sure you have root privileges (or write permissions to those three directories), and simply type $ make install You can now run the executables by name; you can link against mlpack with `-lmlpack` and the mlpack headers are found in `/usr/local/include/mlpack/` and if Python bindings were built, they will be accessible with the `mlpack` package in Python. If running the programs (i.e. `$ mlpack_knn -h`) gives an error of the form error while loading shared libraries: libmlpack.so.2: cannot open shared object file: No such file or directory then be sure that the runtime linker is searching the directory where `libmlpack.so` was installed (probably `/usr/local/lib/` unless you set it manually). One way to do this, on Linux, is to ensure that the `LD_LIBRARY_PATH` environment variable has the directory that contains `libmlpack.so`. Using bash, this can be set easily: export LD_LIBRARY_PATH="/usr/local/lib/:$LD_LIBRARY_PATH" (or whatever directory `libmlpack.so` is installed in.) ### 5. Running mlpack programs After building mlpack, the executables will reside in `build/bin/`. You can call them from there, or you can install the library and (depending on system settings) they should be added to your PATH and you can call them directly. The documentation below assumes the executables are in your PATH. Consider the 'mlpack_knn' program, which finds the k nearest neighbors in a reference dataset of all the points in a query set. That is, we have a query and a reference dataset. For each point in the query dataset, we wish to know the k points in the reference dataset which are closest to the given query point. Alternately, if the query and reference datasets are the same, the problem can be stated more simply: for each point in the dataset, we wish to know the k nearest points to that point. Each mlpack program has extensive help documentation which details what the method does, what each of the parameters is, and how to use them: ```shell $ mlpack_knn --help ``` Running `mlpack_knn` on one dataset (that is, the query and reference datasets are the same) and finding the 5 nearest neighbors is very simple: ```shell $ mlpack_knn -r dataset.csv -n neighbors_out.csv -d distances_out.csv -k 5 -v ``` The `-v (--verbose)` flag is optional; it gives informational output. It is not unique to `mlpack_knn` but is available in all mlpack programs. Verbose output also gives timing output at the end of the program, which can be very useful. ### 6. Using mlpack from Python If mlpack is installed to the system, then the mlpack Python bindings should be automatically in your PYTHONPATH, and importing mlpack functionality into Python should be very simple: ```python >>> from mlpack import knn ``` Accessing help is easy: ```python >>> help(knn) ``` The API is similar to the command-line programs. So, running `knn()` (k-nearest-neighbor search) on the numpy matrix `dataset` and finding the 5 nearest neighbors is very simple: ```python >>> output = knn(reference=dataset, k=5, verbose=True) ``` This will store the output neighbors in `output['neighbors']` and the output distances in `output['distances']`. Other mlpack bindings function similarly, and the input/output parameters exactly match those of the command-line programs. ### 7. Further documentation The documentation given here is only a fraction of the available documentation for mlpack. If doxygen is installed, you can type `make doc` to build the documentation locally. Alternately, up-to-date documentation is available for older versions of mlpack: - [mlpack homepage](http://www.mlpack.org/) - [Tutorials](http://www.mlpack.org/docs/mlpack-git/doxygen/tutorials.html) - [Development Site (Github)](https://www.github.com/mlpack/mlpack/) - [API documentation](http://www.mlpack.org/docs/mlpack-git/doxygen/index.html) ### 8. Bug reporting (see also [mlpack help](http://www.mlpack.org/help.html)) If you find a bug in mlpack or have any problems, numerous routes are available for help. Github is used for bug tracking, and can be found at https://github.com/mlpack/mlpack/. It is easy to register an account and file a bug there, and the mlpack development team will try to quickly resolve your issue. In addition, mailing lists are available. The mlpack discussion list is available at [mlpack discussion list](http://lists.mlpack.org/mailman/listinfo/mlpack) and the git commit list is available at [commit list](http://lists.mlpack.org/mailman/listinfo/mlpack-git) Lastly, the IRC channel `#mlpack` on Freenode can be used to get help.