How to Delete Documents From MongoDB Using Spark Scala?

Published on Sep 20, 2025

5 min read

What is the role of MongoDB connection string in Spark Scala document deletion?
How to delete documents while preserving a backup in Spark Scala?
How to delete documents that match a specific condition in Spark Scala?
How to import the necessary Spark libraries in Scala?

How to Delete Documents From MongoDB Using Spark Scala? image

Best Database Management Tools to Buy in October 2025

Database Systems: Design, Implementation, & Management

BUY & SAVE

$155.60 $259.95

Save 40%

Database Systems: Design, Implementation, & Management

BUY & SAVE

$31.95 $259.95

Save 88%

Data Mining: Practical Machine Learning Tools and Techniques (Morgan Kaufmann Series in Data Management Systems)

EXCLUSIVE NEW LABEL BOOSTS CUSTOMER INTEREST AND URGENCY.
CUTTING-EDGE DESIGN ENHANCES USER EXPERIENCE AND SATISFACTION.
LIMITED-TIME OFFERS ON NEW ITEMS ENCOURAGE QUICK PURCHASING.

BUY & SAVE

$54.94 $69.95

Save 21%

Concepts of Database Management

BUY & SAVE

$40.99 $193.95

Save 79%

Customer Relationship Management: Concept, Strategy, and Tools (Springer Texts in Business and Economics)

BUY & SAVE

$85.91 $99.99

Save 14%

The Enterprise Data Catalog: Improve Data Discovery, Ensure Data Governance, and Enable Innovation

BUY & SAVE

$26.66 $65.99

Save 60%

The Manga Guide to Databases (The Manga Guides)

BUY & SAVE

$19.95 $24.99

Save 20%

Data Mining: Practical Machine Learning Tools and Techniques (The Morgan Kaufmann Series in Data Management Systems)

BUY & SAVE

$33.84 $69.95

Save 52%

Database Systems: Design, Implementation, & Management

BUY & SAVE

$39.00 $259.95

Save 85%

ONE MORE?

To delete documents from MongoDB using Spark Scala, you can follow the following steps:

Start by creating a new SparkSession:

import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder() .appName("MongoDB Spark Connector") .config("spark.mongodb.input.uri", "mongodb:///.") .config("spark.mongodb.output.uri", "mongodb:///.") .getOrCreate()

Replace <host>, <database>, and <collection> with the appropriate values for your MongoDB connection.

Load the MongoDB collection into a DataFrame:

val df = spark.read.format("com.mongodb.spark.sql.DefaultSource").load()

Apply a filter condition to identify the documents you want to delete:

val filtered = df.filter()

Replace <filter condition> with the appropriate condition to identify the documents you want to delete. For example, if you want to delete documents where the field "status" equals "inactive", the filter condition would be col("status") === "inactive".

Convert the filtered DataFrame back to an RDD:

val rdd = filtered.rdd

Use the remove function from the MongoDB Spark Connector to delete the documents:

rdd.foreachPartition { partition => val mongoClient = com.mongodb.spark.MongoSpark mongoClient.remove(partition, WriteConfig(spark)) }

Close the Spark session:

spark.close()

Before running this code, make sure you have the MongoDB Spark Connector jar file added to your Spark classpath.

Note: Deleting documents from a database is a critical and irreversible action. Ensure that you have taken proper backups or have a backup plan in place before executing deletion operations.

What is the role of MongoDB connection string in Spark Scala document deletion?

The MongoDB connection string plays a crucial role in Spark Scala document deletion.

In order to delete documents from a MongoDB database using Spark Scala, you need to establish a connection to the MongoDB database. The connection string contains the necessary information to establish this connection, including the host address, port number, database name, and authentication credentials if required.

Once the connection is established, you can access the MongoDB collection and perform deletion operations using Spark Scala. The connection string ensures that Spark knows where to connect to the MongoDB database and enables the interaction between the Spark engine and the MongoDB server.

By specifying the correct connection string, you can ensure that Spark Scala can successfully communicate with the MongoDB database to delete documents as desired.

How to delete documents while preserving a backup in Spark Scala?

To delete documents in Spark Scala while preserving a backup, you can follow these steps:

Load the documents into a Spark DataFrame.
Create a backup of the DataFrame by making a copy of it.
Apply the deletion operation on the original DataFrame.
Save the modified DataFrame to replace the original document dataset.
Save the backup DataFrame to a separate location for backup purposes.

Here is an example code snippet demonstrating the steps mentioned above:

import org.apache.spark.sql.{DataFrame, SparkSession}

// Step 1: Load documents into DataFrame val spark = SparkSession.builder().appName("DeleteDocuments").getOrCreate()

val originalDF: DataFrame = spark.read.format("format").load("path/to/documents")

// Step 2: Create backup by copying the DataFrame val backupDF = originalDF

// Step 3: Delete documents from the original DataFrame based on a condition val filteredDF = originalDF.filter() // Replace with your desired deletion logic

// Step 4: Save the modified DataFrame to replace the original document dataset filteredDF.write.format("format").save("path/to/documents")

// Step 5: Save the backup DataFrame to a separate location for backup backupDF.write.format("format").save("path/to/backup")

Replace "format" with the actual format of your documents, such as "parquet", "json", "csv", etc. Replace "path/to/documents" with the actual path to your document dataset and "path/to/backup" with the desired location to save the backup.

Make sure to adapt the in Step 3 according to your specific deletion logic. This condition will determine which documents will be deleted from the DataFrame.

How to delete documents that match a specific condition in Spark Scala?

To delete documents that match a specific condition in Spark Scala, you can follow these steps:

Import the necessary Spark libraries and create a SparkSession:

import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder() .appName("DeleteDocuments") .master("local") .getOrCreate()

Read the documents from a data source into a DataFrame:

val df = spark.read.format("your_data_source_format") .option("your_data_source_options") .load("your_data_source_location")

Filter the DataFrame based on the condition you want to match:

val filteredDF = df.filter("your_condition")

Store the filtered DataFrame as a temporary table:

filteredDF.createOrReplaceTempView("tempTable")

Delete the documents that match the condition from the original data source using SQL:

spark.sql("DELETE FROM your_data_source_table WHERE your_condition")

Note: The above approach assumes that your data source supports SQL command execution. If your data source does not support SQL commands, you may need to use a different approach specific to your data source.

How to import the necessary Spark libraries in Scala?

In Scala, you can import the necessary Spark libraries by using the import keyword. Here is an example of how to import the commonly used Spark libraries:

import org.apache.spark.SparkConf import org.apache.spark.sql.{SparkSession, DataFrame}

// Create a new Spark Application val conf = new SparkConf() .setAppName("My Spark Application") .setMaster("local[*]") // Run Spark locally with all available cores

// Create a SparkSession val spark = SparkSession .builder() .config(conf) .getOrCreate()

// Import additional Spark libraries as needed import spark.implicits._ // You can enable DataFrame syntax, such as using .select(), .filter(), etc. import org.apache.spark.sql.functions._ // You can use built-in functions like `col()`, `count()`, etc.

By doing this, you can now use Spark's functionality within your Scala code.