Sunday, 15 September 2013

Find duplicate lists where element order is insignificant but repeat list elements are significant

Find duplicate lists where element order is insignificant but repeat list
elements are significant

I've got an odd problem where I need to find duplicate collections of
items where the order doesn't matter but the presence of duplicate values
within a collection does matter. For example, say I have the following
list of lists:
lol = [
['red'],
['blue', 'orange'],
['orange', 'red'],
['red', 'orange'],
['red', 'red'],
['blue', 'orange', 'red'],
['red', 'orange', 'blue']
]
In my case, the unique collection would be:
unique_lol = [
['red'],
['blue', 'orange'],
['orange', 'red'],
['red', 'red'],
['blue', 'orange', 'red']
]
And the information I'm looking to obtain are the duplicate lists:
dup_lol = [
['orange', 'red'],
['blue', 'orange', 'red']
]
I don't care which duplicate is reported as the duplicate, i.e. ['orange',
'red'] vs ['red', 'orange'], just that the duplicate combination is
reported. I first tried to use a set of frozensets:
sofs = {frozenset(x) for x in lol}
However, this approach gets tripped up by the ['red', 'red'] list, which
gets converted to ['red']:
set([frozenset(['red']),
frozenset(['orange', 'red']),
frozenset(['blue', 'orange', 'red']),
frozenset(['blue', 'orange'])])
Plus, this doesn't give me the duplicates, just the unique ones, and I
cannot run difference against the list of lists anyway.
I'm sure I can iterate over the parent lists brute force style, but I feel
like I'm missing something simple. I almost need a dictionary where the
key is the ordered list, and the value is the number of times that
combination appears, but that lists cannot be dictionary keys and that
just sounds odd anyway.

No comments:

Post a Comment