How to Convert JavaRDD<String> of JSON to Dataset<Row> in Spark Java

Published Jun 29, 2022  ∙  Updated Jul 8, 2022

Suppose we have an instance of SparkSession in Java.

SparkSession spark = new SparkSession(

We also have an RDD JavaRDD<String> which we want to convert into a Dataset<Row>.

JavaRDD<String> jsonStrings = ...;

First, we can convert our RDD to a Dataset<String> using spark.createDataset().

Dataset<String> tempDs = spark.createDataset(

1. Using

Then, we can parse each JSON using

In this operation, Spark SQL infers the schema of a JSON dataset and loads it as a Dataset<Row>.

Dataset<Row> finalDs =;

2. Using from_json()

We can also get the schema from the JSON string dataset as a StructType.

StructType schema = spark
Dataset<Row> finalDs = stringDs
  .withColumn("json", from_json(col("value"), schema))