My Learning Space
Space to take notes, learn and share.
Introduction to Azure Databricks
Databricks is a cloud-based data engineering tool used for data transformation and data exploration through machine learning models. Azure Databricks is Microsoft Azure Platform’s implementation of Databricks. Evolution of Databirkcs : A short timeline of evolution...
Data Modeling in Power BI
PowerBI provides a handful of features for building robust data models. Here are a few concepts to begin modeling data in PowerBI : Fact tables & Dimensions tables: In its simplest form, a data model design will consist of the following: Fact table: Also...
Dictionary in Python
Python Dictionary A dictionary in Python is a data structure to store data in Key: value format. There are several similarities between python lists and dictionaries, however, they differ in how their elements are accessed. List elements are ordered and are accessed...
Loading Data from Kaggle to Amazon S3 Bucket
Loading data from kaggle directly into S3 is a two step process. In first step we configure Kaggle to be able to download. And in second step, we extract data from Kaggle into S3 bucket. Get data from Kaggle To get data from kaggle, we setup Kaggle command line tool...
Correlation Coefficient – Simplistic explanation
What is Correlation Coefficient Correlation means a mutual relationship or connection between two or more things (variables). The correlation coefficient is a numeric measure to quantify this relationship. The coefficient describes two aspects of a relationship....
Loading data in Snowflake
Any relational database for example ( MYSQL, POSTGRESQL) can only support structured data in the forms of rows and columns. Snowflake, however, can support multiple types of data loads. Concept of stages Internal Stages Snowflake has the following...