how to fill null values and drop null values in pyspark, sql and scala

Pinjari Akbar
1 min readApr 13, 2024

--

PySpark, SQL, and Scala, handling null values typically involves two main operations: filling null values with specified values and dropping rows or columns containing null values. Here’s how you can perform these operations in each of the mentioned environments:

Null Fill:

PySpark:

from pyspark.sql.functions import when

# Fill null values in a specific column with a specified value
df_filled = df.withColumn("column_name", when(df["column_name"].isNull(), "default_value").otherwise(df["column_name"]))

SQL:

-- Fill null values in a specific column with a specified value
SELECT COALESCE(column_name, 'default_value') AS column_name_filled
FROM table_name;

sacla:

// Fill null values in a specific column with a specified value
val df_filled = df.na.fill("default_value", Seq("column_name"))

Null Drop:

PySpark:

# Drop rows containing any null values
df_dropna = df.dropna()

# Drop rows containing null values in a specific column
df_dropna_specific = df.dropna(subset=["column_name"])

SQL:

-- Drop rows containing any null values
SELECT * FROM table_name WHERE column_name IS NOT NULL;

-- Drop rows containing null values in a specific column
SELECT * FROM table_name WHERE column_name IS NOT NULL;

Scala:

// Drop rows containing any null values
val df_dropna = df.na.drop()

// Drop rows containing null values in a specific column
val df_dropna_specific = df.na.drop(Seq("column_name"))

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

No responses yet