Creating PySpark and SQL tables dynamically without hardcoding

3 min readDec 15, 2024

Creating SQL tables dynamically without hardcoding involves using scripts or templating mechanisms to generate the SQL code based on inputs like configurations, schema definitions, or metadata. Here are several approaches to dynamically create tables in SQL:

1. Dynamic Table Creation Using Variables

Use variables to dynamically substitute table names, column definitions, or schema details.

Example in SQL

DECLARE @TableName NVARCHAR(50) = 'DynamicTable';
DECLARE @ColumnDefinitions NVARCHAR(MAX) = 'ID INT, Name NVARCHAR(100), Age INT';

DECLARE @CreateTableSQL NVARCHAR(MAX);
SET @CreateTableSQL = 'CREATE TABLE ' + @TableName + ' (' + @ColumnDefinitions + ')';
-- Execute the dynamic SQL
EXEC sp_executesql @CreateTableSQL;

Here, the table name and columns are passed dynamically using variables.
You can replace variables with dynamic inputs (e.g., from a config file or stored procedure parameters).

2. Python with SQL (Dynamic Table Creation)

Use Python to dynamically generate and execute SQL commands based on configurations.

Example with Python and SQLite

import sqlite3

# Define dynamic table schema
table_name = "DynamicTable"
columns = [
    {"name": "ID", "type": "INTEGER"},
    {"name": "Name", "type": "TEXT"},
    {"name": "Age", "type": "INTEGER"}
]
# Build the CREATE TABLE SQL statement
columns_sql = ", ".join([f"{col['name']} {col['type']}" for col in columns])
create_table_sql = f"CREATE TABLE {table_name} ({columns_sql});"
# Connect to SQLite and execute
conn = sqlite3.connect("example.db")
cursor = conn.cursor()
cursor.execute(create_table_sql)
conn.commit()
print(f"Table {table_name} created!")

You can replace columns with a dynamically loaded JSON schema or user input.

3. Dynamic Table Creation in PySpark

When using Spark, you often work with schemas dynamically.

Example: PySpark with Hive Table Creation

from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, IntegerType

# Initialize Spark session
spark = SparkSession.builder.appName("DynamicTableCreation").enableHiveSupport().getOrCreate()
# Dynamic schema
schema = StructType([
    StructField("ID", IntegerType(), True),
    StructField("Name", StringType(), True),
    StructField("Age", IntegerType(), True)
])
# DataFrame with empty rows for table creation
df = spark.createDataFrame([], schema)
# Create a Hive table dynamically
table_name = "dynamic_table"
df.write.saveAsTable(table_name)
print(f"Table {table_name} created in Hive!")

Replace schema with metadata dynamically retrieved from a catalog or configuration.

4. Use Metadata for Table Schema

Example: Generate SQL from Metadata

Metadata-driven table creation involves storing table schema details in a configuration file or metadata repository.

Metadata (JSON) Example:

{
  "table_name": "DynamicTable",
  "columns": [
    {"name": "ID", "type": "INTEGER"},
    {"name": "Name", "type": "TEXT"},
    {"name": "Age", "type": "INTEGER"}
  ]
}

Python Script:

import json

# Load metadata
metadata = json.loads("""
{
  "table_name": "DynamicTable",
  "columns": [
    {"name": "ID", "type": "INTEGER"},
    {"name": "Name", "type": "TEXT"},
    {"name": "Age", "type": "INTEGER"}
  ]
}
""")
# Build SQL statement
table_name = metadata['table_name']
columns_sql = ", ".join([f"{col['name']} {col['type']}" for col in metadata['columns']])
create_table_sql = f"CREATE TABLE {table_name} ({columns_sql});"
print(create_table_sql)

This approach ensures you can define multiple tables in a single configuration and generate them dynamically.

5. Automate with Stored Procedures

Stored procedures allow dynamic table creation in SQL by passing parameters.

Example: Dynamic SQL in a Stored Procedure

CREATE PROCEDURE CreateTableDynamic
    @TableName NVARCHAR(50),
    @ColumnDefinitions NVARCHAR(MAX)
AS
BEGIN
    DECLARE @SQL NVARCHAR(MAX);
    SET @SQL = 'CREATE TABLE ' + @TableName + ' (' + @ColumnDefinitions + ')';
    EXEC sp_executesql @SQL;
END;

Execute Stored Procedure:

EXEC CreateTableDynamic 'DynamicTable', 'ID INT, Name NVARCHAR(100), Age INT';

6. Use Spark SQL for Large-Scale Automation

For large-scale automation in distributed environments:

Combine Spark with Delta tables or Hive.
Use schema inference to generate tables dynamically.

Example: Infer Schema and Create Table

# Dynamically read JSON and infer schema
df = spark.read.json("s3://path-to-data/")
df.printSchema()

# Dynamically create a table
table_name = "dynamic_table"
df.write.format("delta").saveAsTable(table_name)
print(f"Table {table_name} created with inferred schema!")

Best Practices

Centralized Schema Definition: Use metadata repositories or JSON/YAML files for consistent schema management.
Validation: Validate inputs to avoid SQL injection or schema mismatches.
Logging: Log dynamically executed SQL statements for debugging and traceability.
Automation Frameworks: Use frameworks like Airflow, dbt, or Databricks Workflows for orchestrating dynamic table creation at scale.