EgorDm's Blog • Blog

Blog

09 February 2024

Deploying SageMaker Pipelines Using CloudFormation

The blog introduces SageMaker as a versatile AWS service for tasks like building data pipelines and deploying machine learning models, addressing common confusion by explaining how to write pipeline definitions and deploy them using AWS CDK into your SageMaker domain.

16 December 2023

A Comprehensive Introduction to Large Language Models

machine learning AI LLM

This series of blog posts aims to demystify the associated terminology and concepts, providing a comprehensive guide for individuals looking to comprehend and leverage these powerful models in their projects.

13 August 2023

Overview of Data Lineage

data engineering dataops

This article explores the importance of data lineage, which tracks the flow and transformations of data from source to destination, playing a vital role in ensuring data integrity and transparency in data processes.

20 June 2023

Data Quality Testing with Deequ in Spark

data engineering spark

In this blog, we explore how to ensure data quality in a Spark Scala ETL (Extract, Transform, Load) job. To achieve this, we leverage Deequ, an open-source library, to define and enforce various data quality checks..

12 May 2023

Introduction to Data Quality

data engineering dataops

This blog delves into the importance of data quality, and provides insight into how Data and MLOps Engineers can ensure that quality is maintained throughout the system lifecycle.