Tuesday, March 3, 2015

SPT #4: There is no try

"Stupid Python Tricks" explores sexy ways to whip Python features until they beg for mercy. Stupid Python Tricks may well be fifty different shades of gray. Reader discretion is advised.

I've recently been wishing that Python`s set type had a method like add() but which returned a Boolean indicating whether the item being added was already in the set. We could add this behavior to the add() method without fear of breaking much code, since most uses will ignore the return value, but I'd rather keep add() as fast as possible. So let's call this new method added() and have it return True if the item needed to be added, and False otherwise. You can derive a new class from set, so let's go ahead and do that:

class Set(set):
   def added(self, item):
       result = item not in self   # True if item needs to be added
       self.add(item)
       return result

Note our self.added() here is not conditional in any way; it doesn't need to be. set.add() is idempotent: adding the same item multiple times doesn't hurt anything, and it's actually faster to do the add() even if it's not necessary (since that stays in the fast C implementation of the set type) than to try to avoid the unnecessary add() with an if statement.

Our new method is convenient for deduplicating lists while retaining the order of their items:

pies = ["apple", "banana cream", "apple", "boysenberry",
        "apple", "pumpkin", "banana cream"]
seen =  Set()  # keeps track of pies we have already seen
pies[:] = (pie for pie in pies if seen.added(pie))
print(pies)

Result: ["apple", "banana cream", "boysenberry", "pumpkin"]

Be right back; I'm hungry for pie now.

OK. Our added() method works fine. There's nothing wrong with it. But doesn't it seem a little... inelegant... to have to store the result of the set membership test in a local variable, add the new item, and finally return the value we previously squirreled away? Why can't we simply return the result, and then do the add? Because the return ends the function's execution? Don't be silly; we won't let that stop us!

class Set(set):
   def added(self, item):
       try:     return item not in self
       finally: self.add(item)

Not only is this an unconventional use of try, it's also a wee bit slower than our earlier version. And that's why we call it "Stupid Python Tricks."

3 comments:

  1. Confused now from the stackoverflow link https://stackoverflow.com/questions/11551996/why-do-we-need-the-finally-clause-in-python

    Because your abusive implementation is not intuitive at all. Probably because added() is itself an abuse, and is one of 2 forms for me. safeadd(item) which always succeeds makes more sense, because the function starts doing 2 things is another reason it's not pretty. storing the result in a temporary boolean inellagant? it's just a boolean, why fuss abotu the cost of ram?

    ReplyDelete
  2. I say 2 forms, I would have written a safeadd(item) or written added() by abusing the way add() would throw, but since I'm new to python, I assumed wrongly that the language had an add() method, it does not, so I have learned a good thing from this in the end. I really expected adding an element to throw with a "Key already exists", so Python by design has lexically prevented duplicate key errors, which has nothing to do with SPT #4, but your intention to teach has worked.

    ReplyDelete
  3. Your critique is completely valid. Nobody should ever write code that way. That's why it's called Stupid Python Tricks. It's meant to be a fun bit of "look what you can do, but don't."

    ReplyDelete