Select Page

My Learning Space

​Space to take notes, learn and share.

Introduction to Azure Databricks

Databricks is a cloud-based data engineering tool used for data transformation and data exploration through machine learning models. Azure Databricks is Microsoft Azure Platform’s implementation of Databricks.  Evolution of Databirkcs :   A short timeline of evolution...

read more

Data Modeling in Power BI

PowerBI provides a handful of features for building robust data models. Here are a few concepts to begin modeling data in PowerBI :   Fact tables & Dimensions tables:   In its simplest form, a data model design will consist of the following:   Fact table:  Also...

read more

Dictionary in Python

Python Dictionary A dictionary in Python is a data structure to store data in Key: value format. There are several similarities between python lists and dictionaries, however, they differ in how their elements are accessed. List elements are ordered and are accessed...

read more

Loading Data from Kaggle to Amazon S3 Bucket

Loading data from kaggle directly into S3 is a two step process. In first step we configure Kaggle to be able to download. And in second step, we extract data from Kaggle into S3 bucket. Get data from Kaggle To get data from kaggle, we setup Kaggle command line tool...

read more
Correlation Coefficient – Simplistic explanation

Correlation Coefficient – Simplistic explanation

What is Correlation Coefficient Correlation means a mutual relationship or connection between two or more things (variables).  The correlation coefficient is a numeric measure to quantify this relationship. The coefficient describes two aspects of a relationship....

read more

Loading data in Snowflake

Any relational database for example ( MYSQL, POSTGRESQL)  can only support structured data in the forms of rows and columns. Snowflake, however, can support multiple types of data loads. Concept of stages Internal Stages Snowflake has the following...

read more