Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
484 views
in Technique[技术] by (71.8m points)

python - Compare Dictionaries with unhashable or uncomparable values? (e.g. Lists or Dataframes)

TL;DR: How can you compare two python dictionaries if some of them have values which are unhashable/mutable (e.g. lists or pandas Dataframes)?


I have to compare dictionary pairs for equality. In that sense, this question is similar to these two, but their solutions only seem to work for immutable objects...

My problem, is that I'm dealing with pairs of highly nested dictionaries where the unhashable objects could be found in different places depending on which pair of dictionaries I'm comparing. My thinking is that I'll need to iterate across the deapest values contained in the dictionary and can't just rely on the dict.iteritems() which only unrolls the highest key-value pairs. I'm not sure how iterate across all the possible key-value pairs contained in the dictionary and compare them either using sets/== for the hashable objects and in the cases of pandas dataframes, running df1.equals(df2). (Note for pandas dataframe, just running df1==df2 does a piecewise comparison and NA's are poorly handled. df1.equals(df2) gets around that does the trick.)

So for example:

a = {'x': 1, 'y': {'z': "George", 'w': df1}}
b = {'x': 1, 'y': {'z': "George", 'w': df1}}
c = {'x': 1, 'y': {'z': "George", 'w': df2}}

At a minimum, and this would be pretty awesome already, the solution would yield TRUE/FALSE as to whether their values are the same and would work for pandas dataframes.

def dict_compare(d1, d2):
   if ...
      return True
   elif ...
      return False

dict_compare(a,b)
>>> True
dict_compare(a,c)
>>> False

Moderately better: the solution would point out what key/values would be different across the dictionaries.

In the ideal case: the solution could separate the values into 4 groupings:

  • added,
  • removed,
  • modified
  • same
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Well, there's a way to make any type comparable: Simply wrap it in a class that compares like you need it:

class DataFrameWrapper():
    def __init__(self, df):
        self.df = df

    def __eq__(self, other):
        return self.df.equals(other.df)

So when you wrap your "uncomparable" values you can now simply use ==:

>>> import pandas as pd

>>> df1 = pd.DataFrame({'a': [1,2,3]})
>>> df2 = pd.DataFrame({'a': [3,2,1]})

>>> a = {'x': 1, 'y': {'z': "George", 'w': DataFrameWrapper(df1)}}
>>> b = {'x': 1, 'y': {'z': "George", 'w': DataFrameWrapper(df1)}}
>>> c = {'x': 1, 'y': {'z': "George", 'w': DataFrameWrapper(df2)}}
>>> a == b
True
>>> a == c
False

Of course wrapping your values has it's disadvantages but if you only need to compare them that would be a very easy approach. All that may be needed is a recursive wrapping before doing the comparison and a recursive unwrapping afterwards:

def recursivewrap(dict_):
    for key, value in dict_.items():
        wrapper = wrappers.get(type(value), lambda x: x)  # for other types don't wrap
        dict_[key] = wrapper(value)
    return dict_  # return dict_ so this function can be used for recursion

def recursiveunwrap(dict_):
    for key, value in dict_.items():
        unwrapper = unwrappers.get(type(value), lambda x: x)
        dict_[key] = unwrapper(value)
    return dict_

wrappers = {pd.DataFrame: DataFrameWrapper,
            dict: recursivewrap}
unwrappers = {DataFrameWrapper: lambda x: x.df,
              dict: recursiveunwrap}

Sample case:

>>> recursivewrap(a)
{'x': 1,
 'y': {'w': <__main__.DataFrameWrapper at 0x2affddcc048>, 'z': 'George'}}
>>> recursiveunwrap(recursivewrap(a))
{'x': 1, 'y': {'w':    a
  0  1
  1  2
  2  3, 'z': 'George'}}

If you feel really adventurous you could use wrapper classes that depending on the comparison result modify some variable that holds the information what wasn't equal.


This part of the answer was based on the original question that didn't include nestings:

You can seperate the unhashable values from the hashable values and do a set-comparison for the hashable values and a "order-independant" list-comparison for the unhashables:

def split_hashable_unhashable(vals):
    """Seperate hashable values from unhashable ones and returns a set (hashables) 
    and list (unhashable ones)"""
    set_ = set()
    list_ = []
    for val in vals:
        try:
            set_.add(val)
        except TypeError:  # unhashable
            list_.append(val)
    return set_, list_


def compare_lists_arbitary_order(l1, l2, cmp=pd.DataFrame.equals):
    """Compare two lists using a custom comparison function, the order of the
    elements is ignored."""
    # need to have equal lengths otherwise they can't be equal
    if len(l1) != len(l2):  
        return False

    remaining_indices = set(range(len(l2)))
    for item in l1:
        for cmpidx in remaining_indices:
            if cmp(item, l2[cmpidx]):
                remaining_indices.remove(cmpidx)
                break
        else:
            # Run through the loop without finding a match
            return False
    return True

def dict_compare(d1, d2):
    if set(d1) != set(d2):  # compare the dictionary keys
        return False
    set1, list1 = split_hashable_unhashable(d1.values())
    set2, list2 = split_hashable_unhashable(d2.values())
    if set1 != set2:  # set comparison is easy
        return False

    return compare_lists_arbitary_order(list1, list2)

It got a bit longer than expected. For your test-cases it definetly works:

>>> import pandas as pd

>>> df1 = pd.DataFrame({'a': [1,2,3]})
>>> df2 = pd.DataFrame({'a': [3,2,1]})

>>> a = {'x': 1, 'y': df1}
>>> b = {'y': 1, 'x': df1}
>>> c = {'y': 1, 'x': df2}
>>> dict_compare(a, b)
True
>>> dict_compare(a, c)
False
>>> dict_compare(b, c)
False

The set-operations can also be used to find differences (see set.difference). It's a bit more complicated with the lists, but not really impossible. One could add the items where no match was found to a seperate list instead of instantly returning False.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...