DRY (Don't Repeat Yourself) Design Patterns

One of the Pageforest applications I've been working on has prompted me to write a template generation language (modeled after Django's Template Language - DTL).  DTL has a number of features that allow you to re-purpose templates and combine them in interesting ways.  Yet, as a template language, it does not use the familiar concepts we, as programmers, are used to in our primary programming languages.

This got me thinking about the generalized patterns we use to help us design complex systems, by decomposing them into simpler parts.  I'll compare DTL with JavaScript to show how the facilities in one comapare to those in the other.

  • Includes - DTL has a {% include "file.html" %} tag that can be used for basic composition of one template inside another.  This is much like functional decomposition in a programming language; you can call a function from within another function:
        function foo() {
            bar();
        }
     But note that DTL does not support parameter substitution - or function arguments.  This makes {% include %} a much less powerful and useful feature (and, in fact, is not much used in most Django applications).
  • Inheritance - DTL allows one template to extend  another via  the {% extends "base.html" %} tag.  This turns out to be a very practical way to concisely design templates for different pages in an application, and yet enforce some regularity in the top-level design of the web site.  Developers would typically create a master template that describes the top level layout and navigation for their site, and then individual site pages can extend that template, and replace sections with page-specific content.

    This is accomplished with {% block name %} tags.  A child template can re-define any named block that appears in the template it extends.  This is similar to inheritance-based composition in class-based languages.  You can think of each block as a method in a base class, which can be selectively over-ridden by derived classes.

Yet, DTL is missing some important techniques for composition.

  • Parameterized templates - As mentioned above, there is not method of invoking a template, but replacing formal parameters from a calling template.  Variables in templates are basically dynamically scoped, so a calling template can define a variable that it knows will be used in an included template.
  • Template as datatype - Just as we frequently create functions dynamically in JavaScript, and pass them around as data, it would be similarly useful to allow templates to be stored as data, passed in variables, and then invoked dynamically.  There is not "eval" or "apply" function that would enable this type if template programming.

DTL is a very nice, and constrained, template definition language.  And I like that fact that it does not expose the full complexity of a programming language to its users (which was it's creator's design philosophy).  Yet it would be very useful to build language extensions via functional-like programming practices directly in the template language, without having to resort to escape functions that require writing custom tags and filters in Python.

Posted by

Comments [0]

A Mandelbrot Set Viewer - Pushing the Envelope with Blob Storage

As we've been developing the Pageforest service, we've been developing sample applications in parallel.  We think the best way to motivate features to support in our service, is to actually build applications that utilize those services.  In fact, we explicitly stated that we won't implement any feature that we don't have an imminent need from a JavaScript application developer (including ourselves!).

If you've seen our simple Scratch application sample, you'll see that a simple Pageforest application can be written in just a few lines of code.  And your application can save and load data from a cloud data-store on behalf of your users.

What you may not have realized, is that documents can contain much more than a single JSON blob storage.  In fact, each "document" in our system, can also have associated with it, any number of child "Blobs".  The permissions for reading and writing these blobs are controlled by their parent document.  Blobs can contain any Internet data-type, including images, sounds, pdf's, html documents, javascript files, or JSON persistence.

With this feature in mind, we decided to push the envelope with a Mandelbrot set viewer application with the following goals:

  • Use the Canvas element to draw the Mandelbrot set using JavaScript only (no flash, no plug-ins).
  • Cache images rendered in the client, into Blobs in the data store - so that once any user has viewed a region of the set, it would be available to any other user without having to recompute it.
  • Use HTML 5's Web Workers to compute image tiles in the background - keeping the UI responsive even when the CPU is busy with intensive image processing.
  • Use Google's Map (v3) API to provide a famiilar navigation interface to the Mandelbrot set, making it as easy to pan and zoom over the Mandelbrot set as it is to view maps and satellite imagery of the Earth.
  • Use the spare CPU cycles of concurrent connected browsers, to create a peer-to-peer compute cloud to further speed up calculation of desired image tiles.

We started the project over the Memorial Day weekend, and today we have a working prototype that meets all but the last goal.

If you would like to play with the Mandelbrot Set Viewer, be aware that you must be signed in to Pageforest in order to generate tiles (you should be able to view existing image tiles without signing in).

The way the Mandelbrot Viewer works, is that whenever the map UI generates a request for an image tile (all of the tiles at all of the magnifications have been assigned names according to their position - even if the tile hasn't been generated yet), we simultaneously query the Blob store to see if the tile exists.  If it doesn't, we queue up a tile creation task and send it to a Worker.  Because workers don't have direct access to Canvas elements for drawing, we compute the data for the bitmap in the Worker, and send that back to the parent window when the Worker is done.  It can then be quickly saved into a Canvas element, converted to a PNG file, and then uploaded to the server.

We had some difficulty getting compatability across browsers to support raw binary upload's via AJAX, so we instead just send a base64 (text) encoded version of the file, and decode the data on the server before storing it.

Once the tile is generated, we update the url in the map image, so the browser attempts to download the tile again.

As is all of Pageforest's code, this example has been made open source.  You're welcome to make a copy to make your own variations.

Posted by

Comments [0]

Pageforest Version 0.6.0

Today we have released version 0.6 of our client library for Pageforest.  We had to make some major changes to the low-level REST api to fix a security flaw we discovered in our first version.  As a result any of the code built using the 0.5 versions of our library no longer work (sorry!).

On the plus side, I think our "Scratch" sample application is pretty simple and very functional for those that would like to experiment with Pageforest.  See our QuickStart page to get the steps to create your own application starting from our sample.

We're also aware that we're pretty light on documentation now.  If you do take a stab at building a Pageforest app, please let us know what your experiences are (support@pageforest.com), and we'll be happy to help you with any questions you have.

Posted by

Comments [0]

Key-Value storage API demo

Here's a snapshot of our first official demo application. It's basically one JavaScript file with few dependencies (jQuery and HMAC-SHA1) that uses the Pageforest API to authenticate an existing user, then store and retrieve data in the Key-Value store. The source code of this version is available here: http://code.google.com/p/pageforest/source/browse/?r=eeeeee#hg/examples/keyvalue (JavaScript application and Django project for the server side on Google App Engine).

Next, we are going to change the auth mechanism and require that username and password are entered only on a trusted page like https://auth.pageforest.com/ in a separate browser tab. Then the JavaScript app doesn't have to deal with the details of cryptographic authentication, and malicious applications don't have access to user credentials.

Posted by

Comments [0]

Memcache mixin for datastore models

So we made a datastore model mixin that will transparently use memcache. It is available on Google Code under the MIT license, just like the rest of pageforest.com: http://code.google.com/p/pageforest/source/browse/appengine/utils/cacheable.py

To use it, just inherit your model from it:

from google.appengine.ext import db
from utils.mixins import Cacheable

class App(Cacheable):
    name = db.StringProperty()

The interesting part is that it automatically reduces datastore write contention by skipping datastore put if the write rate is consistently high. The App Engine datastore only supports 5 updates per second. So if one entity gets 10 updates per second, the Cacheable mixin makes sure that it's always saved to memcache but the datastore is updated only once every 2 seconds.

Posted by

Comments [0]

Google's new Closure JavaScript optimizer

I'm pretty excited to see the release, today, of Google's Closure JavaScript compiler.

Closure goes way beyond a simple JavaScript minifier. It can do things like unwind function calls, and replace them with the body of the function (inlining). It also changes local variable names to single characters.

You can either download the compiler locally, or use their web service (though the UI or via a REST API). Here a sample of how aggressively Closure can reduce your code size:

function Foo(string)
{
 alert(string);
}

Foo("hello");

In Simple optimization mode this yields:

function Foo(a) {
 alert(a)
}
Foo("hello");

In Advanced mode this compresses to:

alert("hello");

I'm still learning how to use Closure optimally for some of my code. For example, in Advanced mode, my JavaScript Namespace code is pretty severely compressed. First, Simple optimization yields:

While Advanced Optimization saves a few hundred more bytes, but mangles some variable names that should be left alone as external method names:

There is a tutorial on how to annotate your code to make sure that Advanced optimization does not break your code by applying variable renaming too aggressively.

To fix my Namespace code I add these lines:

// Export names
var p = Namespace.prototype;
p['Extend'] = p.Extend;
p['Define'] = p.Define;
p['Import'] = p.Import;
p['SGlobalName'] = p.SGlobalName;

which then add the following lines to my function in optimized form to restore the "exports" from my class library:

d = f.prototype;
d.Extend = d.d;
d.Define = d.c;
d.Import = d.f;
d.SGlobalName = d.g 

With all these fixes, I'm able to get a clean compile of the Namespace library that compresses down to:

Posted by

Comments [0]

How Does Google Count Absolute Unique Visitors?

As a test of the Answer service, Mahalo, I posed the following question:


How does Google Analytics calculate Absolute Unique Visitors?

I know that Google claims that they can report on the number of Absolute Unique Visitors over any time period. What I can't figure out is how they can be calculating this without doing very expensive database queries. I feel they must be making an approximation of some sort.

Otherwise, they would have to query the unique set of users who visited the site across a large time span, and remove duplicates in real time. They could not afford to do this for a site with millions of unique users.

I will reward the tip to the person who best answers this question by providing a feasible solution to the technical problem or explaining how the reported value is approximated. Even better, if it is backed by an authoritative explanation from Google developers.

Note that the crux of the problem is to avoid double-counting Returning Visitors that are duplicately counted across the time span of a report.


Unfortunately, even the best answerer did not understand the question. Perhaps there were not enough users on the site, nor did they have people with the needed expertise to figure out what I was asking.

I rescinded my $5 "tip", and actually got some Mahalo users mad at me for doing so. After given the problem some more thought, this is what I cam up with:


Here's how I would calculate Absolute Unique Visitors:

Data Collection

On the first visit of a user, for each day, I record how many days since their last visit (for the "returning" visitors - as opposed to the "new" visitors).

Data Aggregation

When Analytics is processing the raw data, they can collect buckets of counters for the total number of visitors that:

  • New (never visited before)
  • Visited 1 day ago or more (aka all "returning visitors")
  • Visited 2 days ago or more
  • Visited 3 days ago or more
  • etc. (they may choose to cut off the number of buckets at some reasonable maximum - which would set a max on the reported ranges they could accurately display).

Note that these are cumulative numbers - each bucket has strictly fewer users than the previous one.

Reporting

When the site owner asks for the Absolute Unique Users across a date range, the reporting engine can scan all the dates in the period and accumulate a sum as follows (pseudo-code):

Assumes Data[DAY] containing values:
  NEW - Number of new users who arrived that day
  RETURNING[N] - Number of users who arrived that day with a haitus of N 

days or more.

UNIQUE = 0
for DAY from 1 to N:
  UNIQUE += Data[DAY].NEW
  UNIQUE += Data[DAY].RETURNING[DAY]

UNIQUE is thus, the sum of all NEW users reported on each day (who are always unique), and then only those returning users who were not counted in a prior day (since they were last on the site before the beginning of the reporting period).

Posted by

Comments [0]

JSComposer - A JavaScript Composition Utility for Google AppEngine (Python)

After developing a JavaScript namespace facility, I needed a simple way of merging and/or minifying my javascript source files from my AppEngine (Django) application. So, I developed a simple python module that can be used to:

  1. Merge multiple JavaScript files into one (for faster download).
  2. Minifies your javascript on the server (and stores it in memcache for fast retrieval)
  3. Allows you to include javascript files individually on your test server for easy debugging.

Both the namespace library and jscomposer have been placed in the public domain, so feel free to use as you see fit:

namespace.js
jscomposer.py

I would love to hear from you if you use either of these libraries.

-- Mike

Posted by

Comments [0]

JavaScript Namespaces

One thing that JavaScript programmers have to deal with is corruption of the global namespace. Every time you define a simple function, or other variable at the top level of a web page, the names you've chosen could potentially come in conflict with names used by other developers or libraries that you are using. In the browser, all global variables become properties of the window object.

I've been dealing with this in an ad-hoc manner until recently. I would create a single global variable for all my code, and then define all my functions and variables within it, like this:

var PF = {
  global: value,
  ...,

MyFunc: function(args)
    {
    ...
    },

...
};

I tend to want to migrate code from one project to another quite frequently, so putting all the code in one namespace was becoming quite tedious as I was editing the code to move it into different namespaces for different projects. Inspired by Python, I've developed a more general method of defining and importing namespaces across different modules/namespaces of javascript code.

Here is a typical way to define a new namespace, and import another namespace into it so you can reference code from other libraries succinctly.

global_namespace.Define('startpad.base', function(ns) {
    var Other = ns.Import('startpad.other');

    ns.Extend(ns, {
        var1: value1,
        var2: value2,
        MyFunc: function(args)
            {
            ....Other.AFunction(args)...
            }
    });
       
    ns.ClassName = function(args)
    {
    };
       
    ns.ClassName.prototype = {
        constructor: ns.ClassName,
        var1: value1,
           
    Method1: function(args)
        {
        }
    };
});

The benefits of this approach are:

  • Isolation of code without polluting the global (window) namespace with multiple names. A single global name ('global_namespace') is added to the window object.
  • Easy to import code from another namespace, and assign it a short local name (e.g., 'Other', above).
  • Allow javascript code to be loaded into the browser without regard for execution order. Forward references (to a namespace that hasn't been loaded yet), work fine as the Import function will pre-create a namespace object when it is referenced, and then fill it in when the namespace is defined.
  • Long names can be assigned that are unique using a heirarchy similar to DNS names. E.g., since I own startpad.org, I claim the "startpad" name as a top level global namespace, and can use names like "startpad.base", or "startpad.timer", for libraries that I am building
  • Namespaces can be versioned simply by naming convention. For example, I could load in the same browser, namespaces for "startpad.timer.v1" and "startpad.timer.v2".

There still remains a problem of javascript composition. I don't like to include lots of different script files in the same web page. So you still have to combine the source code from multiple different independent script files into one file. This can be done as part of a build process (along with javascript minification), or through a composition service running on your web server (I hope to write one of these in Python for my AppEngine projects).

I am placing namespace.js into the public domain. Let me know if you end up using it, or have suggestions for improvements.

Posted by

Comments [0]

Beware Mutable default value initializers in Python

As a new Python programmer I was surprised by the behavior of default parameter expressions. I knew they were only executed once when the function is first defined, but I hadn't realized the ramifications for using a static dictionary object as a default expression. This little gotcha hit me yesterday and took quite a while for me to figure out what was happening. Here's some sample code:

def Bad(dict={}):
    print "Bad: %s" % repr(dict)
    dict['p'] = 1

Bad()
Bad()

Which results in:

Bad: {}
Bad: {'p': 1}

Since the value of dict is mutable, it can be changed as a side effect of the function. So all subsequent calls will use a default value that has been modified by previous calls to the function! This can be (and was) a nightmare to track down in a large program.

I fixed this by changing to the following:

def Good(dict=None):
    if dict == None:
        dict = {}
    print "Good: %s" % repr(dict)
    dict['p'] = 1

Good()
Good()

Which results in:

Good: {}
Good: {}

The fix de-couples the effects of one call on subsequent calls (as was the intention of the original code).

Posted by

Comments [0]