How to create delta table in Spark ?

Pinjari Akbar
3 min readNov 12, 2023

--

In Spark, you can create a Delta table using the CREATE TABLE statement with the USING DELTA clause. Here's an example using Scala in a Spark application or a Spark shell.

CREATE TABLE IF NOT EXISTS Statement:

This statement is used to create a Delta table named student in the default database if it does not already exist. If the table already exists, the statement has no effect. The table has columns for id, firstName, middleName, lastName, gender, birthDate, ssn, and salary. The data will be stored in Delta format, which provides features like ACID transactions and time travel.

CREATE OR REPLACE TABLE Statement:

This statement is used to create a Delta table named student in the default database. If the table already exists, it will be replaced with the new definition provided. This means that if there is an existing student table, it will be dropped and a new one will be created with the specified columns.

In summary:

  • The IF NOT EXISTS clause in the first statement ensures that the table is only created if it doesn't already exist.
  • The second statement with CREATE OR REPLACE TABLE is more forceful; it creates the table anew, replacing it if it already exists.

Both statements specify that the data will be stored using the Delta format, which is an important distinction for big data workloads and analytics on platforms like Apache Spark. Delta format provides features like atomic commits, schema evolution, and efficient data management.

In pyspark

It looks like you’re using PySpark to create or replace a Delta table and add columns with specific data types and properties. The code you provided uses the DeltaTable API in PySpark to define and modify a Delta table. Below is the corrected and annotated version of your code:

Create or Replace table in pyspark ?

In this code:

  • I added the necessary import statements.
  • I used the tableName method to specify the table name when creating or replacing the table.
  • I replaced the explaine with explain() to correctly call the explain method.

Make sure to adjust the paths and configurations based on your specific requirements and environment.

--

--

No responses yet