panelsplit
panelsplit: A tool for panel data analysis
panelsplit is a Python package designed to facilitate time series cross-validation for panel (multi-entity) data. Whether you're in feature engineering, hyper-parameter tuning, or model estimation, panelsplit provides robust modules that make working with panel data both easier and more efficient.
Key Features:
- Panel data cross-validation: Split up your panel dataset, respecting its temporal structure.
- Data compatibility: Works effortlessly with both numpy arrays and pandas DataFrames.
- Flexible pipelines: Easily build pipelines that integrate with popular libraries such as scikit-learn and feature-engine.
- Parallel Processing: Leverages parallel computing to speed up cross-validation and prediction tasks.
Why choose panelsplit?
panelsplit is built with the practical needs of data scientists working with panel data in mind:
- Robust & flexible: Whether experimenting with models or deploying production pipelines, its modular design lets you focus on analysis rather than plumbing.
- User-friendly: Clear API design and comprehensive documentation make it easy to integrate into your existing workflows.
- Efficient: Parallel processing and tailored cross-validation ensure that your computations are both fast and accurate.
Explore the modules in detail by clicking on the links below to see full documentation and examples.
Modules
panelsplit.cross_validation
- PanelSplit class: Automatically generates train/test splits while preserving temporal order and handling edge cases.
- Label generation: Provides helper functions to create training and testing labels.
- Snapshot generation: Generates snapshots of data in cases where transformations aren't comparable across time.
panelsplit.application
- Model fitting & prediction: Fits models on each training split using cloned estimators and supports multiple prediction methods (e.g.,
predict
,predict_proba
). - Parallel execution: Leverages parallel processing for efficient handling of cross-validation splits.
- Data integrity: Restores predictions to the original data order for consistency.
panelsplit.pipeline
- Sequential processing: Chains multiple transformers and estimators into a streamlined workflow.
- Dynamic method injection: Automatically creates methods (like
predict
andscore
) based on the final estimator’s capabilities. - Out-of-fold predictions: Supports cross-validation based predictions with reassembled outputs.
panelsplit.plot
- Visualize time series splits easily.
1""" 2 3# panelsplit: A tool for panel data analysis 4 5panelsplit is a Python package designed to facilitate time series cross-validation for panel (multi-entity) data. Whether you're in feature engineering, hyper-parameter tuning, or model estimation, panelsplit provides robust modules that make working with panel data both easier and more efficient. 6 7**Key Features:** 8- **Panel data cross-validation:** Split up your panel dataset, respecting its temporal structure. 9- **Data compatibility:** Works effortlessly with both numpy arrays and pandas DataFrames. 10- **Flexible pipelines:** Easily build pipelines that integrate with popular libraries such as scikit-learn and feature-engine. 11- **Parallel Processing:** Leverages parallel computing to speed up cross-validation and prediction tasks. 12 13 14--- 15 16## Why choose panelsplit? 17 18panelsplit is built with the practical needs of data scientists working with panel data in mind: 19- **Robust & flexible:** Whether experimenting with models or deploying production pipelines, its modular design lets you focus on analysis rather than plumbing. 20- **User-friendly:** Clear API design and comprehensive documentation make it easy to integrate into your existing workflows. 21- **Efficient:** Parallel processing and tailored cross-validation ensure that your computations are both fast and accurate. 22 23Explore the modules in detail by clicking on the links below to see full documentation and examples. 24 25--- 26## Modules 27 28 29### `panelsplit.cross_validation` 30- **PanelSplit class:** Automatically generates train/test splits while preserving temporal order and handling edge cases. 31- **Label generation:** Provides helper functions to create training and testing labels. 32- **Snapshot generation:** Generates snapshots of data in cases where transformations aren't comparable across time. 33 34### `panelsplit.application` 35- **Model fitting & prediction:** Fits models on each training split using cloned estimators and supports multiple prediction methods (e.g., `predict`, `predict_proba`). 36- **Parallel execution:** Leverages parallel processing for efficient handling of cross-validation splits. 37- **Data integrity:** Restores predictions to the original data order for consistency. 38 39### `panelsplit.pipeline` 40- **Sequential processing:** Chains multiple transformers and estimators into a streamlined workflow. 41- **Dynamic method injection:** Automatically creates methods (like `predict` and `score`) based on the final estimator’s capabilities. 42- **Out-of-fold predictions:** Supports cross-validation based predictions with reassembled outputs. 43 44### `panelsplit.plot` 45- Visualize time series splits easily. 46""" 47 48__all__ = ["application", "cross_validation", "pipeline", "plot"]