hypper
Hypper is a data-mining Python library for binary classification. It uses hypergraph-based methods to explore datasets for the purpose of undersampling, feature selection and binary classification.
Hypper provides an easy-to-use interface familiar to well-recognized Scikit-Learn API.
The primary goal of this library is to provide a tool for handling datasets consisting of mainly categorical features. Novel hypergraph-based methods proposed in the Hypper library were benchmarked against the alternative solutions and achieved satisfactory results. More details can be found in scientific papers presented in the section below.
Installation
pip install hypper
Local installations
pip install -e .['documentation'] # documentation
pip install -e .['develop'] # development (with testing)
pip install -e .['benchmarking'] # benchmarking scripts
pip install -e .['all'] # install everything
Tutorials:
1. Introduction to data mining with Hypper
Testing
pytest
Important links
- Source code - https://github.com/hypper-team/hypper
- Documentation - https://hypper-team.github.io/hypper.html
Citation
@ARTICLE{Misiorek2022-ru,
title = "Hypergraph-based importance assessment for binary classification
data",
author = "Misiorek, Pawel and Janowski, Szymon",
abstract = "AbstractWe present a novel hypergraph-based framework enabling
an assessment of the importance of binary classification data
elements. Specifically, we apply the hypergraph model to rate
data samples' and categorical feature values' relevance to
classification labels. The proposed Hypergraph-based Importance
ratings are theoretically grounded on the hypergraph cut
conductance minimization concept. As a result of using
hypergraph representation, which is a lossless representation
from the perspective of higher-order relationships in data, our
approach allows for more precise exploitation of the information
on feature and sample coincidences. The solution was tested
using two scenarios: undersampling for imbalanced classification
data and feature selection. The experimentation results have
proven the good quality of the new approach when compared with
other state-of-the-art and baseline methods for both scenarios
measured using the average precision evaluation metric.",
journal = "Knowl. Inf. Syst.",
publisher = "Springer Science and Business Media LLC",
month = dec,
year = 2022,
copyright = "https://creativecommons.org/licenses/by/4.0",
language = "en"
}
1""" 2.. include:: ../README.md 3""" 4__docformat__ = "restructuredtext" 5 6import logging 7import logging.config 8from pathlib import Path 9 10logging.config.fileConfig( 11 fname=Path(__file__).parent / "logger.conf", 12 disable_existing_loggers=False, 13) 14 15from hypper.classification import * 16from hypper.data import * 17from hypper.feature_selection import * 18from hypper.plotting import * 19from hypper.undersampling import * 20 21__all__ = [ 22 "hypper.classification", 23 "hypper.data", 24 "hypper.feature_selection", 25 "hypper.plotting", 26 "hypper.undersampling", 27]
classification
data
feature_selection
plotting
undersampling