Spark Narrow Transformation and Wide Transformation ?

Pinjari Akbar
2 min readNov 11, 2023

--

Spark, transformations can be categorized as narrow or wide transformations based on the number of partitions and dependencies. Understanding these distinctions is crucial for optimizing the performance of your Spark jobs

Narrow Transformation:

Narrow transformations are the ones where each input partition contributes to at most one output partition. These transformations are performed independently on each partition, and they do not require shuffling or redistributing data across partitions.

# Narrow transformation example: map
rdd = sc.parallelize([1, 2, 3, 4, 5], 2) # 2 partitions

def square(x):
return x * x

result_rdd = rdd.map(square)

print(result_rdd.collect())

In this example, the map operation applies the square function to each element in the RDD independently within its partition. There's no need to shuffle or exchange data between partitions.

Wide Transformation:

Wide transformations are the ones where each input partition can contribute to multiple output partitions. These transformations require shuffling and redistribution of data across partitions.

# Wide transformation example: groupByKey
pair_rdd = sc.parallelize([(1, ‘a’), (2, ‘b’), (1, ‘c’), (2, ‘d’)], 2) # 2 partitions

result_rdd = pair_rdd.groupByKey()

print(result_rdd.collect())

Spark Dataframe

--

--

No responses yet