Projects
- Boosters — A high-performance gradient boosting library for Python and Rust
MLOps engineer passionate about machine learning, energy forecasting, and building high-performance systems. I write about the things I learn.
The story of building a gradient boosting library in Rust—from dissecting XGBoost models to achieving performance parity and beyond.
A practical guide to configuring gradient boosting: learning rate, tree depth, regularization, and systematic tuning strategies.
How decision trees fit into gradient boosting: from second-order Taylor expansion to the split gain formula that powers XGBoost and LightGBM.
When to choose XGBoost vs LightGBM: speed, accuracy, features, and practical recommendations based on your use case.
How LightGBM bundles sparse features to reduce memory 25×, and how both libraries handle categorical data without one-hot encoding.
How LightGBM speeds up training by keeping high-gradient samples and subsampling the rest. Plus row and column sampling strategies.
XGBoost grows trees level-by-level, LightGBM splits the best leaf. Learn when each strategy wins and how num_leaves relates to max_depth.
The optimization that makes gradient boosting fast: how histogram-based training reduces split finding from O(n log n) to O(bins) with the…
How gradient boosting optimizes in function space: from pseudo-residuals to the complete boosting algorithm, with derivations for common loss…
An intuitive introduction to gradient boosting: learn how combining weak models creates powerful predictions, from ensemble intuition to the boosting…
The blog introduces SageMaker as a versatile AWS service for tasks like building data pipelines and deploying machine learning models, addressing…
This series of blog posts aims to demystify the associated terminology and concepts, providing a comprehensive guide for individuals looking to…
This article explores the importance of data lineage, which tracks the flow and transformations of data from source to destination, playing a vital…
In this blog, we explore how to ensure data quality in a Spark Scala ETL (Extract, Transform, Load) job. To achieve this, we leverage Deequ, an…
This blog delves into the importance of data quality, and provides insight into how Data and MLOps Engineers can ensure that quality is maintained…