panelsplit
panelsplit: A tool for panel data analysis
panelsplit is a Python package designed to facilitate time series cross-validation for panel (multi-entity) data. Whether you're in feature engineering, hyper-parameter tuning, or model estimation, panelsplit provides robust modules that make working with panel data both easier and more efficient.
Key Features:
- Panel data cross-validation: Split up your panel dataset, respecting its temporal structure.
- Data compatibility: Provides a DataFrame-agnostic approach via narwhals, supporting libraries like polars, pandas, and numpy.
- Flexible pipelines: Easily build pipelines that integrate with popular libraries such as scikit-learn and feature-engine.
- Parallel Processing: Leverages parallel computing to speed up fitting and prediction tasks.
Why choose panelsplit?
panelsplit is built with the practical needs of data scientists working with panel data in mind:
- Robust & flexible: Whether experimenting with models or deploying production pipelines, its modular design lets you focus on analysis rather than plumbing.
- User-friendly: Clear API design and comprehensive documentation make it easy to integrate into your existing workflows.
- Efficient: Parallel processing and tailored cross-validation ensure that your computations are both fast and accurate.
Explore the modules in detail by clicking on the links below to see full documentation and examples.
Modules
panelsplit.cross_validation
- PanelSplit class: Automatically generates train/test splits while preserving temporal order and handling edge cases.
- Label generation: Provides helper functions to create training and testing labels.
- Snapshot generation: Generates snapshots of data in cases where transformations aren't comparable across time.
panelsplit.application
- Model fitting & prediction: Fits models on each training split using cloned estimators and supports multiple prediction methods (e.g.,
predict,predict_proba). - Parallel execution: Leverages parallel processing for efficient handling of cross-validation splits.
- Data integrity: Restores predictions to the original data order for consistency.
panelsplit.pipeline
- Sequential processing: Chains multiple transformers and estimators into a streamlined pipeline, each with its own cross-validation approach.
- Dynamic method injection: Automatically creates methods (like
predictandscore) based on the final estimator’s capabilities. - Out-of-fold predictions: Supports cross-validation based predictions with reassembled outputs.
panelsplit.plot
- Visualize time series splits easily.
1""" 2 3# panelsplit: A tool for panel data analysis 4 5panelsplit is a Python package designed to facilitate time series cross-validation for panel (multi-entity) data. Whether you're in feature engineering, hyper-parameter tuning, or model estimation, panelsplit provides robust modules that make working with panel data both easier and more efficient. 6 7**Key Features:** 8- **Panel data cross-validation:** Split up your panel dataset, respecting its temporal structure. 9- **Data compatibility:** Provides a DataFrame-agnostic approach via narwhals, supporting libraries like polars, pandas, and numpy. 10- **Flexible pipelines:** Easily build pipelines that integrate with popular libraries such as scikit-learn and feature-engine. 11- **Parallel Processing:** Leverages parallel computing to speed up fitting and prediction tasks. 12 13--- 14 15## Why choose panelsplit? 16 17panelsplit is built with the practical needs of data scientists working with panel data in mind: 18- **Robust & flexible:** Whether experimenting with models or deploying production pipelines, its modular design lets you focus on analysis rather than plumbing. 19- **User-friendly:** Clear API design and comprehensive documentation make it easy to integrate into your existing workflows. 20- **Efficient:** Parallel processing and tailored cross-validation ensure that your computations are both fast and accurate. 21 22Explore the modules in detail by clicking on the links below to see full documentation and examples. 23 24--- 25## Modules 26 27 28### `panelsplit.cross_validation` 29- **PanelSplit class:** Automatically generates train/test splits while preserving temporal order and handling edge cases. 30- **Label generation:** Provides helper functions to create training and testing labels. 31- **Snapshot generation:** Generates snapshots of data in cases where transformations aren't comparable across time. 32 33### `panelsplit.application` 34- **Model fitting & prediction:** Fits models on each training split using cloned estimators and supports multiple prediction methods (e.g., `predict`, `predict_proba`). 35- **Parallel execution:** Leverages parallel processing for efficient handling of cross-validation splits. 36- **Data integrity:** Restores predictions to the original data order for consistency. 37 38### `panelsplit.pipeline` 39- **Sequential processing:** Chains multiple transformers and estimators into a streamlined pipeline, each with its own cross-validation approach. 40- **Dynamic method injection:** Automatically creates methods (like `predict` and `score`) based on the final estimator’s capabilities. 41- **Out-of-fold predictions:** Supports cross-validation based predictions with reassembled outputs. 42 43### `panelsplit.plot` 44- Visualize time series splits easily. 45""" 46 47__all__ = ["application", "cross_validation", "pipeline", "plot"]