Thursday, April 9, 2015

SJST #1: Multiline string literals

I've recently been doing some JavaScript programming for the Welcome Screen feature included with some of our products. The Welcome Screen is a QWebView: basically a complete Web browser embedded in the application. The Welcome Screen itself is a mini Web application implemented using HTML, CSS, and JavaScript. So here's Stupid JavaScript Tricks #1, which presents an awful hack to simulate multi-line string literals in JavaScript. As usual, programmer beware.

The more I work with JavaScript, the more I appreciate Python, which, by comparison, is a quite thoughtfully-designed programming language. One of the things that's very convenient about Python is its ability to construct multiple-line string literals using triple quote marks:

haiku = """
Triple quoted strings
Programmer's able helper
When writing poems
"""

If you need to embed HTML, or CSS, or a shell script, or some test data, or really anything else that contains newlines, in your Python code, you can easily see how this would be useful. It's significantly more readable, and more maintainable, than cramming that all into a single line with \n escapes in it, or using a syntax-heavy workaround like defining a list and then joining it.

Python's designers were so thoughtful that they even allow the use of triple single quotes on the off chance that your string literal happens to contain three double quote marks in a row. (Who knows, maybe you've embedded some Python code with multi-line string literals of its own! Which is a bad idea, generally, but hey.)

Now, think of a programming language that is used quite frequently for Web applications, in which you can easily imagine you might find yourself with a need to represent decently-sized snippets of HTML and/or CSS, and try to guess what feature it doesn't have.

That's right. JavaScript does not have decent syntax for multi-line string literals. Well, that's not quite true; ECMAScript 6 (a language standard for JavaScript) allows the definition of template strings using backticks (`). However, the ECMAScript 6 standard is only a couple years old, which means you might not be able to use it depending on what browsers you need to support. In my use case, it isn't supported by the version of WebKit provided by the version of Qt we're using in our applications.

So... skanky hack to the rescue. It turns out that JavaScript can easily convert pretty much any object into some sort of string representation. The string representation of a function is the function's source code. That's pretty handy when debugging, and it'd be nice if we didn't have to use the inspect module with Python to get the same thing (if we're lucky and the function was defined in a source file). But I digress.

By making the body of the function a big-ass comment using the C-style /* ... */ syntax, we can put whatever text we want into it—with line breaks wherever we need them, since line breaks are perfectly legal inside block comments. We can then get that function's text representation, nibble off the JavaScript function boilerplate, and end up with the string we want.

Unfortunately, this will break if the text we want to represent contains */ (which might be a problem with CSS if you ever put comments into your stylesheets, as they use this syntax, and JavaScript doesn't support nesting block comments). Inventing further kludgey workarounds for this is left as an exercise.

Here's how such a "text definition" might look:

var haiku = function() {/*

A horrible hack!
Is this much better than not
Having it at all?

*/}.toString().slice(15, -3).trim();

alert(haiku);

Now, this works... unless you're stuck using the JavaScript version in our Welcome Screen, which doesn't allow method calls on function objects until they've been finalized. The upshot is we can't actually assign the string in a single statement; instead we must do this:

var haiku = function() {/*

A horrible hack!
Is this much better than not
Having it at all?

*/}

haiku = haiku.toString();
haiku = haiku.slice(haiku.indexOf("/*")+2,
        haiku.lastIndexOf("*/")).trim();

alert(haiku);

In this code we actually search for the comment delimiters rather than hard-coding their positions, which is something that bugged me about that initial version. Let's write a function to do it; then we can pass in an anonymous function defined right in the function call. (In Python we'd use it as a decorator using the @ syntax, but then, in Python we wouldn't have this problem to begin with.)

function multi(func) {
    t = func.toString();
    return t.slice(t.indexOf("/*")+2, t.lastIndexOf("*/")).trim();
}

var haiku = multi(function() {/*

So many haiku!
Or is it haikus? Don't care.
The cheese stands alone.

*/})

alert(haiku);

All right. So it's there, and it's stinky, and it works. Should you use it in your Web apps? Probably not. But let's not ever pretend we're all saints. Sometimes, you do what works.

Tuesday, March 3, 2015

SPT #4: There is no try

"Stupid Python Tricks" explores sexy ways to whip Python features until they beg for mercy. Stupid Python Tricks may well be fifty different shades of gray. Reader discretion is advised.

I've recently been wishing that Python`s set type had a method like add() but which returned a Boolean indicating whether the item being added was already in the set. We could add this behavior to the add() method without fear of breaking much code, since most uses will ignore the return value, but I'd rather keep add() as fast as possible. So let's call this new method added() and have it return True if the item needed to be added, and False otherwise. You can derive a new class from set, so let's go ahead and do that:

class Set(set):
   def added(self, item):
       result = item not in self   # True if item needs to be added
       self.add(item)
       return result

Note our self.added() here is not conditional in any way; it doesn't need to be. set.add() is idempotent: adding the same item multiple times doesn't hurt anything, and it's actually faster to do the add() even if it's not necessary (since that stays in the fast C implementation of the set type) than to try to avoid the unnecessary add() with an if statement.

Our new method is convenient for deduplicating lists while retaining the order of their items:

pies = ["apple", "banana cream", "apple", "boysenberry",
        "apple", "pumpkin", "banana cream"]
seen =  Set()  # keeps track of pies we have already seen
pies[:] = (pie for pie in pies if seen.added(pie))
print(pies)

Result: ["apple", "banana cream", "boysenberry", "pumpkin"]

Be right back; I'm hungry for pie now.

OK. Our added() method works fine. There's nothing wrong with it. But doesn't it seem a little... inelegant... to have to store the result of the set membership test in a local variable, add the new item, and finally return the value we previously squirreled away? Why can't we simply return the result, and then do the add? Because the return ends the function's execution? Don't be silly; we won't let that stop us!

class Set(set):
   def added(self, item):
       try:     return item not in self
       finally: self.add(item)

Not only is this an unconventional use of try, it's also a wee bit slower than our earlier version. And that's why we call it "Stupid Python Tricks."

Thursday, February 19, 2015

SPT #3: The constant singleton

"Stupid Python Tricks" explores ways to abuse Python features for the hell of it. Stupid Python Tricks may well be perpendicular to maintainability and sanity. Abandon all hope, ye who enter here.

Classes can be convenient namespaces for corralling related objects. You don't even need to instantiate such a class; you can just use the class itself, since attribute access works the same for classes and instances. (If you need methods in the namespaces, it is better to instantiate the class, since otherwise you need to decorate each method with @classmethod. But for data, it works great, and saves the step of instantiation.)

One category of related objects that can be useful to put in a namespace are constants. You can even override __setattr__ to slap the programmer's hand should he (or she) try to change those tempting attributes-what-should-be-constants. But disappointment lurks ahead... declaring __setattr__ on a class won't actually override the behavior of setting the attribute on the class! See, for example, this:

class Const(object):
    answer = 42
    def __setattr__(self, name, value):
        raise TypeError("variables may vary, but constants never do")

The intent here is to prevent the class attribute answer (or any other attribute, for that matter) from being set or changed. But it doesn't work:

    Const.answer = 13    # no problem!

It works only if we instantiate the class:

    const = Const()
    const.answer = 42    # TypeError

One way to solve this is to just go ahead and instantiate the class right after we define it. We can even give the instance the same name as the class, a cheap way of making it a little more difficult to make additional instances (not that it would matter too much if someone did, aside from a little bit of memory overhead, and not like it's difficult to get the class's type even if it's not in a convenient namespace):

     Const = Const()

But now suppose we want to write a base class that provides the behavior of denying attribute changes, so that we can write subclasses that hold our constant definitions. Now we've gotta instantiate each of them because the desired behavior isn't fully contained in the class itself. And if we forget, the constants aren't constant anymore! Ugh. What we need to do is somehow override the attribute-setting behavior of the class itself. How?

Well, we know that to override attribute-setting behavior on an instance, we define __setattr__ on its class. We also know that every Python class is itself an object, which implies that a class is an instance of some other class. So what we want to do is define __setattr__ on the class's class. 

A class's class is called its metaclass. Just as every ordinary instance is created from a class, so every class is an instance created from a metaclass. In Python, the default metaclass is a class called type, so to write our own metaclass, we must inherit from type.

To sum up, we can override the attribute-setting behavior of our Const base class by writing a __getattr__ method on a metaclass, then using that metaclass to define our base class. Metaclasses are inherited, so the behavior of our base class will carry over to any classes derived from it. Which is exactly what we want!

While we're at it, we can also define what happens if someone tries to instantiate the class, by overriding the __call__ method. The __call__ method of type is what calls __new__ and then __init__ to construct the class. Since we're writing our own metaclass, we can make it do something different.

What should trying to instantiate a class that's just being used as a namespace do? Maybe raise TypeError? Or, since someone who tries to instantiate the class is probably trying to get access to its attributes, just return the class itself (i.e., make the class a singleton, i.e., make the class its own instance, i.e., make instantiating the class a no-op)? This being Stupid Python Tricks, let's do that! (Also, it's less typing than raising a proper exception, although of course writing a sentence explaining that ruins the economy.)

    class ConstMeta(type):
        def __setattr__(cls, name, value):
            raise TypeError("variables may vary, but constants never do")
        def __call__(cls):
            return cls

Now we can define a base class. (Python 2 and Python 3 have slightly different syntax for specifying the metaclass.)

    # Python 2 version
    class Const(object):
        __metaclass__ = ConstMeta

    # Python 3 version
    class Const(metaclass=ConstMeta):
         pass

Aaand now we can use that base class to define namespaces in which the attributes can't be modified:

    class nums(Const):
        ANSWER = 42
        FUN = 69
        UNLUCKY = 13
        PI = 3.1415927  # close enough

Now let's try changing all circles to hexagons:

     nums.PI = 3   # nope!

You might next reasonably ask: how does the class get defined in the first place if its metaclass disallows putting attributes on it? And the answer is that defining an attribute in the body of a class definition doesn't call __setattr__! Instead, the attributes are placed in a dictionary, and when the class definition is complete, that dictionary becomes the class's __dict__ attribute, where all its attributes are stored. Try it:

     print (nums.__dict__.items())

Now you might think that knowing about __dict__ will let you modify these read-only attributes anyway, even though we've written a metaclass to prevent it. Right?

    nums.__dict__["FUN"] -= 1     # Right????

But as it happens, the __dict__ attribute of a class is not actually a Python dict object, but rather a proxy object that does not allow item assignment. Foiled! You might also think you could just use object.__setattr__ ... but that doesn't work either! Foiled again!

All right already... type.__setattr__ does work:

    # Deep Thought had an off-by-one error
    type.__setattr__(nums, "ANSWER", 41)

Of course, you could also just go with the nuclear option:

    del ConstMeta.__setattr__    # kaboom

Which goes to show you... even when you think you've put up a wall around something in Python, you really haven't. Ever. That's why Python's unofficial motto is "I'll show you mine if you show me yours." Whoops, I mean, "We're all consenting adults here."

And now you know... the rest of the story.