DeepDiff is a Python library designed to compare objects and identify differences between them. It supports various data types including dictionaries, iterables, strings, and more.
Installation
Install the library using pip:
pip install deepdiff==4.3.2
Comparing Text Files
To compare text files, read their contents and pass them to DeepDiff:
from deepdiff import DeepDiff
text1 = open('file1.txt', 'r', encoding='utf-8').read()
text2 = open('file2.txt', 'r', encoding='utf-8').read()
result = DeepDiff(text1, text2)
print(result)
# Output: {'values_changed': {'root': {'new_value': 'abcd', 'old_value': 'abc'}}}
Comparing JSON Data
For JSON comparison, load the data into Python objects first:
import json
from deepdiff import DeepDiff
with open('data1.json', 'r') as f1, open('data2.json', 'r') as f2:
json1 = json.load(f1)
json2 = json.load(f2)
result = DeepDiff(json1, json2)
print(result)
Example output:
{
'dictionary_item_removed': ["root['data'][0]['author_id']"],
'values_changed': {
"root['data'][1]['tab']": {
'new_value': 'share_we',
'old_value': 'share'
},
"root['title']": {
'new_value': '2VEX',
'old_value': 'V2EX'
}
}
}
Dictionary Comparison
When comparing dictionaries, DeepDiff highlights added and removed keys:
from deepdiff import DeepDiff
dict1 = {1: 1, 3: 3, 4: 4}
dict2 = {1: 1, 3: 3, 5: 5, 6: 6}
print(DeepDiff(dict1, dict2))
# Output: {'dictionary_item_added': [root[5], root[6]], 'dictionary_item_removed': [root[4]]}
print(DeepDiff(dict1, dict2, view='tree'))
# Output: {'dictionary_item_removed': [<root[4] t1:4, t2:not present>], 'dictionary_item_added': [<root[5] t1:not present, t2:5>, <root[6] t1:not present, t2:6>]}
Searching Within Objects
The grep utility helps locate specific values within nested structures:
from deepdiff import grep
nested_obj = {
"long": "somewhere",
"string": 2,
0: 0,
"somewhere": "around",
"a": {"b": {"c": {"d": "somewhere"}}}
}
search_result = nested_obj | grep("somewhere")
print(search_result)
# Output: {'matched_paths': {"root['somewhere']"}, 'matched_values': {"root['long']", "root['a']['b']['c']['d']"}}
Hashing Complex Objects
DeepHash generates consistant hashes for complex objects that are normally unhashable:
from deepdiff import DeepHash
obj = {1: 2, 'a': 'b'}
hashes = DeepHash(obj)
print(hashes[obj]) # Prints the hash of the dictionary itself
By default, DeepHash uses Murmur3 hashing. To use it, install the optional dependency:
pip install 'deepdiff[murmur]'
If not installed, SHA256 is used insteead, which is slower.
Unit Testing Integration
In unit tests, DeepDiff can validate API responses or configuration data:
import unittest
from deepdiff import DeepDiff
response = {'title': 'V2EX', 'slogan': 'way to explore'}
expected = {'title': 'V2EX', 'slogan': 'way to explore'}
self.assertEqual(DeepDiff(response, expected), {})
If there's a mismatch, DeepDiff returns a detailed report:
# This assertion will fail
self.assertEqual(DeepDiff(response['title'], 'v2ex'), {})
The same pattern applies to pytest:
assert not DeepDiff(response, expected)
assert not DeepDiff(response['title'], 'v2ex')