PinnedMODULE 1 : BIG DATA — THE BIG PICTURE |Comparison Between Monolithic and Distributed SystemsMonolithic SystemsJul 31, 2024Jul 31, 2024
PinnedPyspark StructType and StructField ?let’s In PySpark, StructType and StructField are classes used for defining the schema of a DataFrame. They allow you to specify the…Nov 11, 20232Nov 11, 20232
Databricks Cost Optimization Methods ?Databricks cost optimization involves managing resources effectively to reduce unnecessary costs while maximizing performance. Below are…Dec 17, 2024Dec 17, 2024
How to read .xlsx file in PySpark ?To read an .xlsx file in PySpark, you can use libraries like pyspark-excel or the openpyxl library in combination with PySpark's DataFrame…Dec 17, 2024Dec 17, 2024
Creating PySpark and SQL tables dynamically without hardcodingCreating SQL tables dynamically without hardcoding involves using scripts or templating mechanisms to generate the SQL code based on inputs…Dec 15, 2024Dec 15, 2024
Databricks Workflows in Real-Time Use with SparkDatabricks Workflows provide a robust orchestration tool to manage ETL pipelines, machine learning (ML) tasks, and data engineering…Dec 15, 2024Dec 15, 2024
What is unity Catalog in DatabricksUnity Catalog is a unified governance solution introduced by Databricks for data and AI on its Lakehouse Platform. It is designed to…Dec 15, 2024Dec 15, 2024
Sql Project Overview: Sales and Operations Data WarehouseThis project involves the design and implementation of a data warehouse for a retail company. The data warehouse is structured to support…Aug 12, 20241Aug 12, 20241
Rank vs dense_rank in sqlBasically SQL, both RANK() and DENSE_RANK() are window functions used to assign rankings to rows within a partition of data. However, they…Aug 12, 2024Aug 12, 2024
MODULE 1 : BIG DATA — THE BIG PICTURE | Database vs Data Warehouse vs Data LakeDatabase vs Data Warehouse vs Data LakeAug 2, 2024Aug 2, 2024