Manan Younas | Blog to take notes, learn and share

MapReduce is a fundamental algorithmic model used in distributed computing to process and generate large datasets efficiently. It was popularized by Google and later adopted by the open-source community through Hadoop. The model simplifies parallel processing...

Vectorisation

A vector in the context of NLP is a multi-dimensional array of numbers that represents linguistic units such as words, characters, sentences, or documents. Motivation for Vectorisation Machine learning algorithms require numerical inputs rather than raw text....

Data Abstractions in Spark ( RDD, DataSet, DataFrame)

Spark provides three abstractions for handling data RDDs Distributed collections of objects that can be cached in memory across cluster nodes (e.g., if an array is large, it can be distributed across multiple clusters). DataFrame DataFrames are distributed collections...

Snowflake Object Model

The Snowflake object model is a hierarchical framework that organizes and manages data within the Snowflake cloud data platform . An "object" itself refers to a logical container or structure that is used to either Store data, Organize data, or Manage data. From the...

Exporting GA4 data from BigQuery to Snowflake

Exporting GA4 data from BigQuery to Snowflake In a previous article, we have already explored how to export data grom GA4 to BigQuery. In instances, where we want to migrate data from BigQuery to another platform like snowflake, BigQuery offers a few options. BigQuery...

My Learning Space

Spark Optimizations : Technical guide to .persist()

MapReduce : Fundamental BigData algorithm behind Hadoop and Spark

Vectorisation

Data Abstractions in Spark ( RDD, DataSet, DataFrame)

Snowflake Object Model

Exporting GA4 data from BigQuery to Snowflake