Common Methods for Converting Spark RDD to DataFrame

This approach leverages Spark's implicit conversinos to infer column names from case class attributes.

import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder()
  .appName("RDDConversionExample")
  .master("local[*]")
  .getOrCreate()

import spark.implicits._

case class User(id: Int, username: String, score: Double)

val userRDD = spark.sparkContext.parallelize(Seq(
  User(1, "Alex", 85.5),
  User(2, "Beth", 92.3),
  User(3, "Chris", 78.9)
))

val userDF = userRDD.toDF()
userDF.show()

Method 2: Using toDF with Explicit Column Names

When working with tuple-based RDDs, you can specify custom column names during conversion.

val productRDD = spark.sparkContext.parallelize(Seq(
  ("Laptop", 1200.00, 15),
  ("Phone", 899.99, 32),
  ("Tablet", 450.50, 27)
))

val productDF = productRDD.toDF("product_name", "price", "stock")
productDF.show()

Method 3: Using createDataFrame with Defined Schema

For complete control over column names and data types, create a schema and convert Row-based RDDs.

import org.apache.spark.sql.{Row, types}

val transactionRDD = spark.sparkContext.parallelize(Seq(
  Row("2023-01-01", 1001, 149.99),
  Row("2023-01-02", 1002, 299.50),
  Row("2023-01-03", 1003, 99.95)
))

val transactionSchema = types.StructType(Array(
  types.StructField("date", types.StringType, nullable = false),
  types.StructField("order_id", types.IntegerType, nullable = false),
  types.StructField("amount", types.DoubleType, nullable = false)
))

val transactionDF = spark.createDataFrame(transactionRDD, transactionSchema)
transactionDF.show()

Tags: Spark RDD DataFrame Scala toDF

Posted on Mon, 18 May 2026 18:57:46 +0000 by ThunderAI