how to fill null values and drop null values in pyspark, sql and scala

Pinjari Akbar
1 min readApr 13, 2024

PySpark, SQL, and Scala, handling null values typically involves two main operations: filling null values with specified values and dropping rows or columns containing null values. Here’s how you can perform these operations in each of the mentioned environments:

Null Fill:

PySpark:

from pyspark.sql.functions import when

# Fill null values in a specific column with a specified value
df_filled = df.withColumn("column_name", when(df["column_name"].isNull(), "default_value").otherwise(df["column_name"]))

SQL:

-- Fill null values in a specific column with a specified value
SELECT COALESCE(column_name, 'default_value') AS column_name_filled
FROM table_name;

sacla:

// Fill null values in a specific column with a specified value
val df_filled = df.na.fill("default_value", Seq("column_name"))

Null Drop:

PySpark:

# Drop rows containing any null values
df_dropna = df.dropna()

# Drop rows containing null values in a specific column
df_dropna_specific = df.dropna(subset=["column_name"])

SQL:

-- Drop rows containing any null values
SELECT * FROM table_name WHERE column_name IS NOT NULL;

-- Drop rows containing null values in a specific column
SELECT * FROM table_name WHERE column_name IS NOT NULL;

Scala:

// Drop rows containing any null values
val df_dropna = df.na.drop()

// Drop rows containing null values in a specific column
val df_dropna_specific = df.na.drop(Seq("column_name"))

--

--