Carl Hewitt on cloud computing, scalable semantics, and Wikipedia.
It was my great privilege to interview Carl Hewitt for this week’s Innovators show. He is principally known for work dating back to the late 1960s and early 1970s, when he helped lay the foundations for a declarative, message-oriented model of computation. Then, and for decades thereafter, the virtues of that model were not widely appreciated because the problems it solves were not evident. Now, in an era of multi-core systems, cloud-based computation, and global interconnectivity, it makes all kinds of sense.
In this conversation, we review the themes Carl sounded in this recent talk at Stanford. (Video is here, and an audio-only version I made for myself is here.)
In one of the most striking moments in that talk, Carl says:
What can I change? Just me. For anything else, I send a message, I say please, and I hope for the best.
Then he laughs and adds:
Does this sound like some circumstances you are familiar with?
Having thought deeply, for 40 years, about the intersection of computation and human affairs, he has arrived at an elegant synthesis: The same organizational and communication patterns govern both realms. As well they should, since the two are now and forever intertwingled.
At the end of our conversation, we turn to Carl’s critique of Wikipedia. He raises important questions about how Wikipedia’s cadre of mostly-anonymous administrators, dedicated to the codification of conventional knowledge, come into conflict with academics and researchers whose work pushes the boundaries and conflates the categories of conventional knowledge.

IronPython/Azure status report.
As I mentioned here, I’m exploring the viability of Python as a way of programming the newly-announced Microsoft cloud platform, Azure. Partly that’s because I love Python, but mainly it’s because I believe that the culture surrounding Python and other open source dynamic languages can fruitfully cross-pollinate with the culture that infuses Microsoft’s platforms.
One of the reasons these cultures face each other across a great divide is religious attachment to low-level operating systems. In the cloud, though, the differences among these low-level systems are increasingly hidden behind interfaces to higher-order constructs: compute nodes, storage objects. These, in turn, are building blocks for still-higher-order services that will be created — and consumed — both by platform vendors and by the developers who are their customers.
It becomes possible, in this new world, for platforms to support a continuum of access styles. You want object-oriented? Do it that way. RESTful? Go for it. You know the Python or Ruby libraries best? Use them. The .NET Framework? Use that. Or even mix and match according to convenience and taste.
Consider this Python module written by Sriram Krishnan, which wraps the RESTful interface to Azure blobs. It’s written in standard Python, using OpenSSL-based cryptography. When I tried it on my machine, though, I ran into an inconsistency in my local Python installation.
Normally a Python developer would debug and fix the installation. But I was planning to deploy this module in IronPython on Azure, and IronPython doesn’t run compiled modules such as OpenSSL. It can, of course, use equivalent .NET functionality — in this case, the method implementing the SHA-256 flavor of keyed-Hash Message Authentication Code. So I made that small change.
At this point, having eliminated my module’s only dependency on unmanaged code, I thought I could run it in the Azure development fabric, and then deploy it to the Azure cloud. But no. Azure’s security model currently won’t allow Python even to import pure-Python modules at runtime. A wacky solution might be to use Python’s custom import mechanism to load those modules over the network. More practically, the modules might be provisioned into Azure.
I don’t know how this will play out. Meanwhile, there’s another option: Eliminate all use of Python modules, and rely only on the .NET Framework. So as an experiment, I switched over from Python’s minidom, httplib, time, and base64 modules to their .NET equivalents.
The good news is that this works. I can deploy the module to Azure, and use it in the cloud. The bad news is that, in some cases, I’d rather use the standard Python modules. The .NET equivalent to Python’s httplib, for example, is the HttpWebRequest/HttpWebResponse pair. But these APIs differ from those provided by httplib in a couple of ways that annoy me.
First, there’s an inconsistency in the way headers are handled. You get and set most headers using the Headers collection. But you get and set a few special ones, like Content-Type and Content-Length, using special named properties.
Second, status codes are handled inconsistently. Most responses return status codes. But for codes in the 4xx series, an exception is thrown.
To me these behaviors are quirks that make it trickier to create RESTful interfaces. I’m sure there are reasons for them, and people who prefer them for those reasons, but I’d rather just use httplib. In any case, if both styles are available, there’s no need to argue. Everybody gets what they need.
We’re not there yet in the current Azure preview. Those of us chomping at the bit to run IronPython in the cloud will have to be inventive. I expect things will get easier as both Azure and IronPython mature, and as Python technologies like Django and NWSGI are — I hope — woven into the fabric.
Why might this matter? Again, I’m looking for cross-pollination. Python culture will be able to make really productive use of higher-order Azure services such as identity, access control, workflow, Live Services. And it will also exert a positive influence on the future evolution of the Azure platform.

Mind, hands, and heart: John Leeke on Internet video for sharing knowledge about historic home preservation.
This week’s ITConversations show suffered a tragic glitch that rendered the audio unusable, but I was able to transcribe it as text. My guest is John Leeke, a carpenter who takes care of old buildings and shares his knowledge of the tools and best practices involved in doing that. His methods of sharing have evolved over many years. He started in the early 1980s as a writer for magazines like Old House Journal and Fine Woodworking, transitioned to Internet publishing when that became possible, and more recently has become a leader in the use of Internet video to communicate knowledge that’s embodied, as he likes to say, in the mind, the hands, and the heart.
His approach to Internet video exemplifies and weaves together a number of themes that I’ve focused on in recent years, including narration of work, online apprenticeship, tacit knowledge, screencasting to document our work in the virtual world, and video to document our work in the physical world.
JU: We got introduced by way of the folks at the Open University, whom I met when I visited the UK in January 2007 to speak at the Technology, Knowledge, and Society conference. They were showing me their FlashMeeting videoconferencing system, and they cited you as an example of somebody who’s making very practical use of the medium in your work, which is historic home renovation.
JL: Right. I’d been using FlashMeeting for about a year and half then. They had singled me out because I wasn’t doing education, or developing the FlashMeeting system, like they were there at the Knowledge Media Institute, I was out in the real world doing things with it, demonstrating the horizontal movement of knowledge.
JU: That absolutely grabbed me. Ever since I got involved in Internet video, I saw there was a huge opportunity for horizontal, or direct, or peer-to-peer transfer of knowledge. In particular, of knowledge that is embodied, literally — it’s in your hands…
JL: It’s in your mind, your hands, and your heart. I’ve been sharing what I know through print media since the early 1980s. I grew up working in my father’s shop, in the 1950s, and then was out in the field working on historic buildings as a preservation carpenter for fifteen years. Then I fell into writing about my work: Homebuilding Magazine, Fine Woodworking, Old House Journal. I got pretty practiced at that by the late 1990s.
JU: You’ve published books too, right?
JL: Yes, I’ve self-published a series on caring for older buildings. Through the 1990s I knew that video would be important for my work, but I never came around to publishing anything in video. I didn’t have the time or dollars to put into it. But but 2003 and 2004, it was getting streamlined enough and easy enough to do over the Internet.
JU: As much use of online video as there is, I think we’ve barely scratched the surface when it comes to the sort of sharing of practical knowledge that you’ve been doing.
JL: It’s starting to happen. Just yesterday a colleague sent me a link to a YouTube video about how to draw and sketch the classical forms, like Ionic capitals. It was an architect showing how he sketched, and how he developed a balustrade for a fancy classical building. It showed him actually doing it. This wasn’t happening in the 1990s. You could do it, but it was a huge expensive production. Now you can do it for a couple of hundred dollars, and sometimes even less.
JU: Of course there’s still the question of why someone would do this. And in fact, the theme of the talk I gave at that conference was network-enabled apprenticeship. The idea was that throughout human history, people have learned trades and crafts by direct observation and imitation.
JL: Yeah, workers working side by side. And it’s more than observation. It’s the guiding of hands that makes that work. Internet video, even when it’s live, doesn’t get you all the way there. But it’s certainly a dramatic next level beyond print media, that’s for sure.
Expositional work online — presentation of words and pictures and even videos — it’s all presentational. Someone develops it, and as a separate event in time someone else comes and watches and learns. But when it’s live and interactive, that’s when you jump to the next level. Being there in person is best, of course, but this is a really valuable and powerful intermediate level because it opens up access to many more people than I can get together with personally, side by side.
JU: Can you give an example?
JL: In our work we’re often restoring old windows. This is the time of year when you have to take care of them. One of the details of that work is reglazing, where the glass meets the sash — the wooden frame that slides up and down. There’s a material called glazing compound, or putty, and it’s easy enough to use so that any handy person can do it, but it’s hard to get it so that it looks nice and smooth and even, if you haven’t done it before. Once you learn, it’s a cinch. And it’s easy to show someone how. I’ve taught eight-year-olds and eighty-year-olds how to run a perfect line. But you can’t do that with even a detailed series of photos.
JU: And you’ve tried…
JL: Yes, I’ve written three or four articles over the years, and each one is better, and you can learn a certain kind of thing from print and photos. You can learn what kind of putty to use, you can learn how to hold the putty knife. But until you see a putty knife in motion, and can respond in realtime — adjust the angle, a little more pressure — you can get it in thirty seconds if you’re side by side, and in a few minutes over interactive video.
JU: So you’re talking about a couple of levels here. The first is direct observation and imitation. My first revelation on that front was when I had to fix an old HP laser printer. I found a parts kit online that came with a video on a CD, and it enabled me to successfully disassemble and reassemble that printer. Later I realized there was no other way I could have done the job successfully. No written instruction would have gotten me there.
JL: Right, that’s one level and it works well when the printer you’re repairing is just like the one in the video. And when the job involves mechanical parts that lock and fit together.
But with the window putty, it’s different. You’re working with a plastic material. It’s as if you had to make those printer parts yourself. It’s basic stuff, not manufactured stuff.
JU: The motor skills are subtler, and the nonverbal communication is more critical.
JL: Right, and with the nonverbal communication as well as the visual, you really need to be able to go back and forth between the learner and the teacher. If you can do that within seconds — or if you’re standing next to someone, microseconds — that feedback between the eyes, the mind, the hand, the muscles, the tool, the material the tool is shaping — that’s how they learn so fast in person. And it can happen in seconds when you’re doing interactive video over the Internet.
JU: What’s your setup for doing these interactive training sessions over the Internet?
JL: I take my notebook computer, plug in my Sony HandyCam, and shoot whatever it is we’re teaching or discussing. It’s getting to the point where it’s all plug and play, and if I can do it, many people can.
JU: So that’s the broadcast piece of it, what’s the setup for interacting with people who are following along?
JL: That happens on a page at my website, HistoricHomeworks.com. Other people log in there to the FlashMeeting system, and if they have camera and audio at their end, I can see and hear them. Typical numbers are two or three participants, up to eight or ten. The live sessions are also catalogued for later viewing.
JU: The FlashMeeting system has some interesting features, including a method of visualizing the conversation so you can see who spoke when and for how long.
JL: That helps support the Knowledge Media Institute’s principal mission, which is to study and understand how knowledge spreads from person to person around the world. The analytical features built into FlashMeeting serve that mission.
It fascinates me. For example, you can see displayed on a map of the world the locations of viewers of these recorded sessions showing how to restore historic windows, or painting and restoring exterior woodwork. I can see where the interest is, and it turns out that people everywhere care about this stuff, because there are wooden buildings all around the world. On six of the seven continents there are people using these videos streaming from my office in Portland, Maine. At KMI they joke that they’re waiting for someone to start watching in Antarctica.
JU: It’s an interesting point because in the world of online media there’s a lot of emphasis on what’s new, but you’re operating out on the long tail. Your piece on interior storm windows was very relevant to me because I just went through the exercise of doing the stretch-and-seal method, and your demonstration of how to build reusable interior storms really got my attention.
That’s an idea a person might never encounter. But if you do, it doesn’t matter when. The publishing world calls this evergreen content, it’s valuable anytime.
JL: Right. There’s also a discussion on my website about this topic. It’s more expositional — words and pictures — and that goes hand in hand with the video. One of the limitations of the FlashMeeting system is that I can’t annotate the video, after the fact, with links to those materials.
JU: A lot of folks will look at this and say, OK, John Leeke is an unusual guy. He doesn’t just do the work, he also documents the work, and that’s great for him, but it’s not really relevant to most people who won’t have the time or inclination. For them, this process seems tangential.
But I think that’s often untrue. Here’s an example. I have a pellet stove, and there are a couple of maintenance procedures that I frankly screwed up the first time through because I didn’t absorb the understanding of how to do them from the manual. What struck me was that once I knew how to do it, I could have illustrated these procedures with a couple of five minute videos. And maybe I should just do that myself. But the thing is, if I’m the dealer, and I’m getting complaints from customers who are buying these things and then failing to understand the manual and screwing things up, it’s very much in my interest to do some of my own video documentation.
JL: Of course. And by the way, I’m not special. I’m just a carpenter up here in Maine, taking care of my own house. It just turns out that my work is also helping other people to take care of their houses. Well, yes, it’s not unique but special that I have this compulsion to share what I’m learning and figuring out. But the ability to share it — well, no matter who you are, if your neighbor sees you fixing your windows, and comes over and knocks on your door and asks about how to do it, you would show him. This is just an extension of that. Now we can have neighbors further afield.
JU: Yes. There was a time when the work people did was visible. You saw what they did.
JL: You saw what the people next to you did.
JU: That’s right. And you understood what the different kinds of work were, because you saw people doing that work. But then, in the industrial age, dad went off to work, he disappeared in the morning, and showed up again at the end of the day, and work was a black box. Who knew what dad did?
JL: That’s the industrial disconnect. And there’s a disconnect on the marketing side as well. Through the last half of the 20th century, as the industrial revolution gears up to grind itself into nothing — which is now happening — the method of marketing to more people than needed stuff was to disconnect the people from each other, so that everybody needed something, instead of sharing with their family or neighbors. Everybody needed their own lawnmower. But you figure your lawnmower is sitting idle in your garage for 99% of its time. One lawnmower could easily mow everybody’s lawn on the block.
But that’s the consumer culture that was developed by manufacturers. So very few people now know to run that glazing compound to seal the glass to the wooden frame. This is purposeful. They don’t want people to know how to run glazing because that limits the market for vinyl plastic imitation windows.
So I only have one person on the block I can teach locally, but I can connect with more people with interactive video. Because of the access to the long tail, I can be teaching lots of people who need to know that.
JU: Here’s another aspect I wanted to ask you about. When it’s hard to see how work is done, it’s hard to know what it’s like to be a person who does that kind of work. Unless it’s in the family, you won’t see it, and even then you probably won’t. You don’t have the family or community scope in which to see other kinds of work being done. And lacking that, you can get pretty far down an educational path before you realize that the path isn’t for you at all.
JL: Right. So, I’ve been focused on task-specific demonstration, but you’re talking about another thing that’s happening with video over the Internet — life blogging, or life broadcasting. I don’t think anybody’s doing that as a tradesperson. What is it like to wake up at 4:30 AM, so you can be on the site working on the windows, all day long, and then get in your pickup truck and drive back home? As you say, a lot of people could go all the way through school, and study building construction at the college level, and then take specialty courses in historic carpentry work, and by the time they’re in their early 20s they’re well-educated and have a good set of hands-on skills — and then realize that they don’t like to get up early in the morning.
JU: You’ve painted the downside, and that’s fair, people should understand that, but on the upside, the life blogging should also communicate how you feel when you drive by a house that you’ve restored, and how you know the people living there feel as a result of the work you’ve done.
JL: Absolutely. This is the heart side of the work that the industrial revolution leaves out. It boils everything down to mind and hand, and leaves out the heart. That is the heart side, when you drive by those buildings you helped restore, last month or last year or 20 years ago. It is the reason why we get up early in the morning to go to work. You know that you’re helping people who live in and use those buildings.
JU: Now there are certainly many people who will feel that these methods they get paid to practice are proprietary knowledge they wouldn’t want to reveal. My argument is that in a lot of cases, by demonstrating expertise you’ll attract more work than you lose, and that it’ll often be more interesting and rewarding work. What’s your experience?
JL: Both of those ideas do play strongly in the building trades. It’s a real tradition to keep secrets. Going back hundreds and hundreds of years, with the guild systems, there were ways to control the sharing of that kind of knowledge. And it’s still the case. Not every plasterer who can do those decorative Ionic capitals wants everybody to know exactly how they do it. But they do want everybody to know that it can be done.
You’re right, this is how artisans can do good marketing — by letting people know what is involved, by showing some of these methods, and they don’t have to give up all their secrets in order to do that. But you can help people to understand that it’s not just a machine spitting out product, it’s people making stuff with their minds and their hands and their hearts.
That’s another part of how I use Internet video. I go to some of my colleagues’ shops, as well as my own, and show what this is all about, because it is not well understood by the public. Video can get to the nuances of the heart side of this work.
JU: Also, if you can show me how to take care of some basic things for myself, maybe I can turn around and hire you to do something really special.
JL: Yeah. I’m hoping that we’re now in a post-modern cultural movement, which is what I think you’re talking about. Back in the 1970s I was already working in this realm of making fine things by hand, and there was a groundswell of interest. That’s when Alex Haley’s Roots phenomenon happened. It was important because it touched the hearts of people in America. That’s really what our restoration work is about, it’s the connnection with the people who once lived in these buildings. It wasn’t the national trust and the President telling us to save buildings, it was people who wanted to save them because their grandfathers built them.
JU: So where do you fall along the continuum of trade secrets and knowledge sharing?
JL: I’m at the extreme end of sharing everything I know. I’m a one-person microbusiness and always have been. I grew up in the midwest where sharing what you knew, and helping people, was what life was about, for everybody. That was the culture. It was a natural for me. It didn’t seem like it was worth keeping secrets.
My dad said that if you want to do well in trades, you have to let people know what you do. This is what it’s been all about for me — letting enough people know.
JU: And you have found incredible marketing power in doing what you do?
JL: Oh yeah. As I was working as a tradesperson in the 70s, and a contractor in the 80s, I made a shift because I’d been doing a good job of documenting my work. That’s something else I learned from my father. I also had the documents he created for his work, going back to the 20s, this huge information resource that I had to share.
JU: Really? What did he document?
JL: He documented his work in the arts and trades. He was a commercial artist through the 20s, then shifted into furniture and buildings at the craftsman/artisan level.
JU: And he left behind detailed logs of his practice?
JL: Yeah, detailed files of every project he ever worked on. So I learned that as part of my carpentry and woodworking, growing up in his shop, and continued it when I left his shop and came east to work on old buildings. So by the early 1980s I had this whole backlog of my own work to share. And by sharing it, I created extraordinary interest in my work. Back then it was through the print media — Fine Woodworking, Old House Journal, Fine Homebuilding — and a lot of people learned about the work I was doing to restore columns on old porches, saving windows, doing woodwork repairs. When I learned something I thought was worth sharing, I’d write an article about it. The editors loved it, and their readers did too, it was the authentic stuff, what was really going on out there in the field.
With that body of knowledge, by the late 1980s I was consulting on projects, helping people solve problems with their buildings. That meant I could be on even more projects, helping more people, and if I was writing about what I was learning, then each project was an order of magnitude larger. If I’m doing hands-on work on buildings I might only be helping a few people. If I’m consulting, it might be tens of people. If I’m writing, we figured ten or fifteen thousand people were using my articles. Each is a jump in magnitude. Then of course the Internet, where I got an early start in 1994 and 95.
JU: I’m sure a lot of folks will look at your example and feel that, since they’ll never become featured writers for magazines, there’s no point in doing this kind of sharing in a more modest way. But I think there’s benefit at any level of engagement. You’ve clearly thought through the dynamics of the communication pattern here: one-to-many, multi-level distribution. But for a lot of people, even with electronic media, that isn’t obvious. They’ll still spend a lot of time doing one-to-one communication. They’ll write something up, they’ll even take some pictures, but then they’ll just email that to somebody else.
JL: Two birds with one stone. I realized that if I wanted to accomplish the things I want to get done in my life, I have to get more than one result for every action or activity. The print — and now online — publications that I do are my marketing program, so I don’t have to spend money on advertising. And now you call, and want to talk with me, and if I was only getting one benefit from that, I wouldn’t be able to say yes. But I can already see two or three things that’ll come from talking to you, so I can say yes.
Say I’m thinking of taking on a project to help my neighbor rebuild her front steps. OK, I can earn some money. And I can take a series of photos for a print article, and that’ll bring some more income but it’ll also help with my personal goal of sharing more, and then I can easily shoot a little video that I can broadcast on the Internet and that will help an astonishing number of people. So I can’t say no, because I’m getting multiple benefits. But I would have to say no if the only benefit was getting paid to fix the steps.
JU: You’ve really thought it through.
JL: The key is that the video camera and the computer and the Internet are just tools, no different from my table saw and push stick, or my old wooden hand plane. They’re all just tools, and they’re all in the same kit for me, and I’m a tool user, and I help people with their old buildings.
How can people do this? I’ve found a balance. Instead of watching television, I make television.
JU: Well said. Thanks John!

Wiring the web (redux).
Information technologists often recite David Wheeler’s famous aphorism:
Any problem in computer science can be solved with another layer of indirection.
Often, though, they omit the corollary:
But that usually will create another problem.
Those problems used to plague only IT folk. But now we’re all involved. Effective social information management is quite severely constrained by the fact that regular folks are not (yet) taught the basics of computational thinking.
For example, when I explain my community calendar project to prospective contributors, they invariably assume that I’m asking them to enter their data into my database. It’s quite hard to convey: that the site isn’t a database of events, only a coordinator of event feeds; that I’m only asking them to create feeds and give me pointers to their feeds; that this arrangement empowers them to control their information and materialize it in contexts other than the one I’m creating.
I’m having some success explaining this model, but it’s slow going. People don’t take naturally to the indirection and abstraction.
Here’s another example. I know various folks who are trying to create online resource directories of one kind or another. I’ve identified a pattern, which I call collaborative list curation, that is an ideal way to solve this problem. Consider this directory of blogs for the Monadnock region. It looks like any other such directory, but it’s made differently. Again, there is no explicit database. Entries come from the del.icio.us tag delicious.com/judell/monadnockblog — a personal collection whose items are, currently, the same as those in the global collection delicious.com/tag/monadnockblog.
I’m subscribed to the global collection at feeds.delicious.com/v2/rss/tag/monadnockblog which means I can monitor it for new items, vet them, and transfer those I want to include to my personal collection. If I wanted to delegate that editorial control, I would point my directory-making service at the del.icio.us account of a trusted associate and have it camp on that account’s monadnockblog tag instead of (or in addition to) my own.
Of course this is all way too indirect for any normal person to grok, which is why nothing has been added to the global collection. Even many IT-savvy folks, I’m finding, don’t take naturally to this model.
That said, I’m finding that once I can get people to walk through one of these experiences, and see the connection — OK, I do this over here, and that happens over there, and it can also happen somewhere else, and I’m in control — the light bulb does go on.
Now we need to take forward-thinking evangelists like me out of the loop, and get people to discover for themselves how to wire the web. If Live Clipboard didn’t exist, we’d have to invent it. Oh wait. It doesn’t, and we do.

Two IronPythonic spreadsheets.
I should get a life, I know, but I can’t help myself, one of my favorite pastimes is figuring out new ways to wrangle information. One of the reasons that IronPython had me at hello is that, my fondness for the Python programming language notwithstanding, IronPython sits in an interesting place: on Windows, side by side with Office, where a lot of information gets wrangled — particularly in spreadsheets.
There are now two interestingly different IronPython applications that marry Python and the spreadsheet. The first, Resolver One, I wrote about last year and featured in a screencast. In this case, IronPython runs the whole show. It drives the user interface, and it also drives the recalculation engine.
More recently Blue Reference, whose Inference suite integrates statistical and analytical tools like MATLAB and R into Office, has taken a different tack. Its Inference for .NET taps the general-purpose scripting capabilities of the dynamic .NET languages, including IronPython and IronRuby.
Now to be clear, I’m not in Blue Reference’s target market. Their customers are doing scientific and technical work that benefits from the ability to embed live R or MATLAB analysis into documents. I don’t know, but would be curious to find out, how those folks — or others — might also want to leverage more general-purpose glue languages like IronPython or IronRuby.
In any case, there are clear tradeoffs between the two approaches. With Inference, the IronPython engine is loosely coupled to the Office apps. That buys you the full fidelity of the applications, but costs you Pythonic impedance.
With Resolver One there is no impedance. The application and your data are made of Pythonic stuff. You give up a ton of affordances in order to get that unification, but it enables some really interesting things.
Here’s one example: row- and column-level formulae. This is a pretty handy idea all by itself. Instead of putting a formula into the first row of a column and then copying it down, you put it into the column header where it applies to the whole column automatically.
Michael Foord has a nice example (screencast, article) that shows how to do some nifty data aggregation using Python list comprehensions.
He starts with a worksheet of People:
| Name | Age | Country | Job |
| Stan | 23 | USA | Blogger |
| Wendy | 66 | AUS | Analyst |
| Eric | 33 | UK | Developer |
In a second worksheet, he aggregates by Country, like so:
| Country | People | Number of People | Average Age |
| USA | [<Stan>,<Kenny>,<Craig>] | 3 | 30.7 |
| UK | [<Eric>,<Kyle>] | 3 | 41.3 |
Here’s the column-level formula that does that:
=[person for person in <People>.ContentRows if person['Country'] == #Name#_]
In other words, for each row make a list of People whose Country attribute equals the value in the Name column of the row. And stick that value into the current cell. If you’re familiar with Python, you’ll notice that the syntax — [<Eric>,<Kyle>] — looks like how Python prints out a list. That because it really is a Python list sitting in that cell.
Now the other columns can refer to that list. Here’s Number of People:
=len(#People#_)
Here’s Average Age:
=AVERAGE(person['Age'] for person in #People#_)
This idea of having live Python objects sitting in a spreadsheet is what really grabbed me the first time I saw Resolver, and it still does.
Here’s another little example of my own. Yesterday I was revisiting some of the code I used in my crime analysis project. These kinds of projects invariably turn into pipelines that transform data one stage at a time. Typically I store those intermediate results in files, which tends to be awkward.
This time around, I did the pipeline as a Resolver spreadsheet like so:

The column-level formula on D combines the fields in A, B, and C into an URL-encoded string in D.
The formula on E calls a geocoding service with an URL made from the string in D and puts the XML result in E.
The formula on F parses the XML in E, creates a Python dictionary, and dumps that into F.
The formulae on G and H extract the lat and lon values out of the object in F and stick them into G and H.
I dunno, maybe it’s just me, but I think that’s cool.

A recipe for industrial transformation.
When Tom Raftery pointed me to this gloomy assessment I had to go back and remind myself of what I found hopeful in Saul Griffith’s extraordinary energy talk at ETech.
Saul concedes a 2-degree-C rise in temperature by 2033. The question is what it will take to hold the line. He thinks we’ll need to build and deploy something like this mix of clean new energy production:
100 sq meters of solar voltaic cells per second for the next 25 years (2TW)
50 sq meters of solar thermal mirrors per second for the next 25 years (2TW)
1 100 megawatt wind turbine every 5 minutes for the next 25 years(2TW)
1 3 gigawatt nuclear plant every week for the next 25 years (3TW)
3 100 megawatt geothermal steam turbines every day for the next 25 years (2TW)
1250 sq meters of bio-fuel-producing algae every second for the next 25 years (.5TW)
Can we do it? The recipe calls for 11.5 terawatts of new (and carbon-free) power supply over the next 25 years, and we created 6 in the last 25 years. So, it’s “within the scale of what we know how to do.”
Now consider these existing capacities:
Cans. We produce 110 billion aluminum cans per year. Turned into thermal mirrors, that’s 200GW solar thermal/year. “If you make Coke and Pepsi into solar thermal companies, in 10 years you get to your 2 terawatts of solar thermal. It’s within our industrial capacity to do that.”
Phones. “Nokia makes 9 phones/second. Within Nokia + Intel + AMD there is roughly the capacity to make the needed photovoltaics.”
Cars. “GM makes 1 car every 2 minutes. GM + Ford = 1 wind turbine every 5 minutes.”
Of course it’s crazy to imagine retargeting our industrial capacity in such dramatic fashion, and turning it on a dime, isn’t it?
Not necessarily. For months I’ve been meaning to blog a segment from a Lester Brown podcast, which I can’t find now, but here’s the same point from his book Plan B 3.0: Mobilizing to Save Civilization:
In his State of the Union address on January 6, 1942, one month after the bombing of Pearl Harbor, President Roosevelt announced the country’s arms production goals. The United States, he said, was planning to produce 45,000 tanks, 60,000 planes, 20,000 anti-aircraft guns, and 6 million tons of merchant shipping. He added, “Let no man say it cannot be done.”
No one had ever seen such huge arms production numbers. But Roosevelt and his colleagues realized that the world’s largest concentration of industrial power at that time was in the U.S. automobile industry. Even during the Depression, the United States was producing 3 million or more cars a year. After his State of the Union address, Roosevelt met with automobile industry leaders and told them that the country would rely heavily on them to reach these arms production goals. Initially they wanted to continue making cars and simply add on the production of armaments. What they did not yet know was that the sale of new cars would soon be banned. From early 1942 through the end of 1944, nearly three years, there were essentially no cars produced in the United States.
In addition to a ban on the production and sale of cars for private use, residential and highway construction was halted, and driving for pleasure was banned. Strategic goods—including tires, gasoline, fuel oil, and sugar—were rationed beginning in 1942. Cutting back on private consumption of these goods freed up material resources that were vital to the war effort.
The year 1942 witnessed the greatest expansion of industrial output in the nation’s history—all for military use. Wartime aircraft needs were enormous. They included not only fighters, bombers, and reconnaissance planes, but also the troop and cargo transports needed to fight a war on distant fronts. From the beginning of 1942 through 1944, the United States far exceeded the initial goal of 60,000 planes, turning out a staggering 229,600 aircraft, a fleet so vast it is hard even today to visualize it. Equally impressive, by the end of the war more than 5,000 ships were added to the 1,000 or so that made up the American Merchant Fleet in 1939.
In her book No Ordinary Time, Doris Kearns Goodwin describes how various firms converted. A sparkplug factory was among the first to switch to the production of machine guns. Soon a manufacturer of stoves was producing lifeboats. A merry-go-round factory was making gun mounts; a toy company was turning out compasses; a corset manufacturer was producing grenade belts; and a pinball machine plant began to make armor-piercing shells.
In retrospect, the speed of this conversion from a peacetime to a wartime economy is stunning. The harnessing of U.S. industrial power tipped the scales decisively toward the Allied Forces, reversing the tide of war. Germany and Japan, already fully extended, could not counter this effort. Winston Churchill often quoted his foreign secretary, Sir Edward Grey: “The United States is like a giant boiler. Once the fire is lighted under it, there is no limit to the power it can generate.”
This mobilization of resources within a matter of months demonstrates that a country and, indeed, the world can restructure the economy quickly if convinced of the need to do so. Many people—although not yet the majority—are already convinced of the need for a wholesale economic restructuring. The purpose of this book is to convince more people of this need, helping to tip the balance toward the forces of change and hope.
And FDR engineered that transformation in less time than we’ve been occupying Iraq. So as Jan 20 approaches, I find myself wondering if maybe, just maybe, the new guy can galvanize a similar response.

My rationalization for buying a Wii Balance Board.
Azure calendar aggregator: Part 1.
For about a week now, I’ve been running a service in the Azure cloud that aggregates calendar events from Eventful.com and from a diverse set of iCalendar feeds. As I mentioned last month, my aim is to recreate and then extend my experimental elmcity.info community information hub, while exploring and documenting the evolution of Azure and the layered services emerging on top of it.
I haven’t written a whole lot about programming here for a while, because I’ve trying to to explain the whys and wherefores of syndication-oriented communication to a wider audience. But as I build out this service I’m learning a lot about cloud-based software development in general, and about Azure in particular, and I want to narrate this work. I’ll try to do it in a way that will inform developers who currently use Microsoft tools and technologies, as well as those who don’t. But I’ll also try to be accessible to folks who don’t write software, yet would like to learn something about the opportunities that cloud computing is creating as well as the challenges it poses.
The service, as it currently exists, is running as an Azure worker role. That means it does input, processing, and output, but presents no user interface. The inputs are Eventful.com, accessed by way of its API, and a growing set of public iCalendar feeds. The processing involves reading calendar events and normalizing them to a common intermediate format. The output is currently XML to the Azure blob store, one file for Eventful and another for the iCalendar feeds.
I’m only allocating one instance of this worker process, and that’s probably enough horsepower for any single community’s events. But I’d like to be able to scale out the aggregator to serve other communities as well, potentially many others. Turning up the dial to do that would be a nice illustration — and test — of the cloud computing fabric.
The existing aggregator at elmcity.info is written in Python, and my original plan was to port it with minimal change to IronPython on Azure. That didn’t work out because, although bare-bones IronPython code runs on Azure as I show here, you quickly run into restrictions imposed by Azure’s security sandbox. The trust policy, defined here, is based on a feature of the .NET platform known as code access security (CAS).
When you upload code to the Azure cloud, or run it in the local development fabric, the hosting environment only partly trusts your code, and also only partly trusts any components used by your code. This is part of a layered, defense-in-depth security strategy, prudent for the same reason that it’s prudent to run your own computer as a partly-trusted user instead of an all-powerful administrator. It is also problematic for the same reason. A lot of Windows applications used to require administrative privilege in order to run properly, and some — though fewer month by month — still do. Similarly, a lot of .NET components that run happily in the fully-trusted environment of your local computer won’t run in Azure’s medium-trust environment, or (what’s nearly equivalent) in Internet Information Server 7 (IIS 7) when its security mode is set to medium trust.
I am no expert on the subject of code access security, but here’s what I think:
- The medium-trust policy is probably a good thing.
- It does, however, impede instant gratification when you’re mixing components from various sources.
- But that impedance will diminish as more component builders adopt the good practice of not making their components unnecessarily require full trust.
I think that IronPython is likely to become such a component, once the dust settles from the recent 2.0 release. (If you care about this issue, you can vote up its priority.) Meanwhile I’ve been working in C#, which has been a fascinating experience. On the one hand, I believe that dynamic languages like Python are excellent choice for agile development everywhere, and especially in the fluid environment of the cloud. On the other hand, I’m not a language bigot and have always appreciated the virtues of statically-typed languages.
My basic philosophy has always been to use a mix of best-of-breed tools in order to gain maximum leverage. The combination of IronPython and C#, on the .NET platform, is a really powerful one, for the same reason that the Jython/Java combo is. On this project, even though I am not yet deploying any code written in IronPython, I often use IronPython to test C# components that I’ve written or acquired.
Along the way, I’ve been recalling something IronPython’s creator, Jim Hugunin, said at the Professional Developers Conference back in October. Jim’s talk followed one by Anders Hejlsberg, the creator of C#. Anders showed an experimental future version of C# that makes use of the Dynamic Language Runtime which supports IronPython and IronRuby on .NET. The effect was to create an island of dynamic typing within C#’s otherwise statically-typed world. We all appreciated the delicious irony of a static type called ‘dynamic’.
Jim might have sounded a bit wistful when he said: “I’m not sure what a dynamic language is any more.” But I think this blurring of boundaries is a wonderful thing. Many smart people I deeply respect value the static typing of C#. Some of the same smart people, and many different ones, value the dynamic typing in languages like Ruby and Python. If I can leverage the union of what all of those smart people find valuable, I’ll happily do so.
I’ll have more to say about this project, and of course code to share, as things evolve. Meanwhile, though, I want to acknowledge Doug Day at DDay Software. When I switched from Python to C#, the key component I needed was an iCalendar module equivalent to MaxM’s excellent Python iCalendar module, which I’m using at elmcity.info. Doug’s DDay.iCal met the need. It’s a solid, cleanly-built, open source .NET component that enables code written in any of the .NET family of languages to parse, and generate, iCalendar (RFC 2445) files.
And now back to the project, which reminds me of the era at BYTE during which I got to build stuff while writing about what I was building. It’s great fun. And as John Leeke so eloquently says, it engages the mind, the hands, and the heart.

Lightweight event syndication with trusted feeds.
If you check the elmcity.info events page for March 7, 2008 you’ll see that Beau Bristow is performing at Keene State College at 8PM. The Eventful item that has syndicated to the events page doesn’t say anything else. There’s no link to beaubristow.com, though it’s easy enough to find. And there’s no more precise venue than Keene State College, though that’ll be easy enough to find as well, when the time comes.
But the item carries enough information to participate in a (still mostly nascent) network of calendar events. Beau Bristow doesn’t know that his concert shows up at elmcity.info, or that on March 7 it’ll show up at citizenkeene.ning.org and cheshiretv.org. And he shouldn’t need to know. But he ought to be able to take it for granted that events he posts to some kind of syndication source — could be Eventful, could be another public service, could be a personal iCalendar feed — will propagate.
I am particularly fascinated by the lightweight, ad-hoc interaction between Eventful, Beau Bristow, and elmcity.info. This lightness is a powerful enabler. If you’re Beau, and you need to promote 18 events in 18 towns, some of which you may only visit once in your career, you don’t have time — and can’t pay for the help — to build relationships in all those places. But you can assert that you’ll be in those places, on specified dates, doing a specified thing. And under the right circumstances, that’s enough.
The question I’ve been exploring is how to create those circumstances. One aspect of the answer, and the one I want to focus on here, is trusted feeds.
Originally, at elmcity.info, any Flickr photo mentioning “Keene NH” showed up in the photo stream, and any Eventful event located within 15 miles of the center of Keene showed up in the event stream. That arrangement was clearly open to abuse. Even though Flickr and Eventful try to take responsibility for their stuff, my aggregator had to take more responsibility for the subsets of their stuff it manages. So I created two lists of trusted contributors. One is a list of Flickr account names, and the other is a list of Eventful account names.
When the aggregator runs, a couple of times a day, it puts previously-unseen account names into a holding tank and writes those names to RSS feeds which I monitor here and here.
Yesterday I found Dan York in the Flickr holding tank, and Beau Bristow in the Eventful holding tank. I happen to know Dan, but even if I didn’t, it only takes a minute to judge that his Flickr portfolio is legitimate. I don’t know Beau, but again it’s easy to determine that his Eventful presence is legitimate. So I marked both accounts as trusted, and today their contributions appear on the site.
If a trusted account ever abuses that trust, it’s easily revoked.
When I tell folks about this model of event syndication, they sooner or later realize that it’s an invitation to spam and ask about that. My answer is trusted feeds. It would be impossible to moderate every event flowing through your network. But it’s easy to moderate a much smaller number of event sources.

Databasing trusted feeds with del.icio.us.
In my last entry, I sketched a strategy for maintaining lists of the Eventful and Flickr accounts that I consider trusted sources for the elmcity.info event and photo streams. I didn’t spell out exactly how I plan to maintain those lists, in the Azure rewrite of the service that I’m now doing, but David Hochman read my mind:
It sure would be interesting to syndicate those lists from a trusted del.icio.us feed, leveraging tags as a public data store, and allowing others to trust your trusted lists.
It sure would. And that’s just what I’m doing.
Part One: The User’s View
Here’s the del.icio.us account:
delicious.com/elmcity
Here are the trusted ICS feeds:
elmcity/trusted+ics+feed
Here are the trusted Eventful contributors:
elmcity/trusted+eventful+contributor
Here are the new Eventful contributors — that is, ones I’ve not yet marked as trusted:
elmcity/new+eventful+contributor
This is wildly convenient in several ways. For starters, I get a feed of new Eventful contributors for free:
feeds.delicious.com/v2/rss/elmcity/eventful+new+contributor
Anyone who subscribes to that feed is alerted to the appearance of a previously-unseen contributor of events within 15 miles of Keene. Here’s one:
eventful.com/users/jheslin
Clicking that link reveals that jheslin has created one venue, but so far no events. That’s not enough evidence on which to base a trust/no-trust decision. So what I’d do, in that case, is just delete the del.icio.us bookmark. If the aggregator were to see another event from jheslin, he (or she) will show up again in the feed. In that case, if jheslin has created events that look legitimate, I can decide to trust him (or her). How? Trivially, by editing the bookmark and changing the new tag to trusted.
That’s easy enough, but I don’t want to be forever responsible for monitoring this feed and making trust decisions. And thankfully I needn’t be. When I delegate that job to somebody else, I’ll just need to transfer the credentials to the del.icio.us/elmcity account, and explain what it means for an Eventful account to be bookmarked at del.icio.us/elmcity with a new or trusted tag, and how to decide when to promote an Eventful account from new to trusted.
The same technique can apply to other account-based event sources — for example, upcoming.org. It also applies to feed-based sources. I’ve been encouraging event publishers in Keene to create iCalendar feeds. Those feeds have URLs, and to include them in the aggregation, somebody just needs to bookmark them under the elmcity account with the tags trusted and ics and feed. Like this.
Same for new and trusted Flickr accounts that feed the photos page, for blogs that feed the blog directory, and for any other class of resource that might be contributed.
Part Two: The Developer’s View
Notice that I haven’t had to write any Web forms, any Ajax code, any database CRUD (create/read/update/delete) logic. Del.icio.us, a database with a Web user interface, takes care of all that. Which is fine by me, because life’s too short to write any more CRUD or Web UI than I have to. I’d rather do more interesting things.
By the same token, life’s too short to write more than a few lines of code to drive the CRUD apparatus. As I mentioned last time, I’m writing the core of the Azure event aggregator in C# rather than Python, because IronPython isn’t yet ready for prime time on Azure. I worried that a C# implementation would be too verbose, but I’ve been pleasantly surprised.
Here’s a C# method that reads a del.icio.us RSS feed and returns a dictionary (aka hashtable, aka associative array) of titles and links:
00 const string rssbase = "http://feeds.delicious.com/v2/rss/elmcity";01 public static Dictionary<string,string> get_delicious_feed(string args)02 [Macro error: Can't compile this script because of a syntax error.]
The Python equivalent is more concise, but not by much. I am, admittedly, deferring any discussion of the Utils class which I’m using to make the .NET Framework’s HttpWebRequest/HttpWebResponse classes feel more Pythonic to me.
Also noteworthy here is the use of the generic collection class, Dictionary (lines 3, 11, 12), instead of the more Pythonic (and Java-like) Hashtable. I’ll also defer discussion of tradeoffs between Dictionary and Hashtable until I’ve learned more about them.
Finally, I’ll defer discussion of the LINQ-to-XML idioms (lines 6-10) until I’ve learned more about the tradeoffs between LINQ-to-XML and the XPath style which I’m more familiar with, and which is more widely available.
For now, I’ll just observe that this C# method is readable, debuggable, and Azure-deployable.
Here are some of the ways the above method will be used in the service:
get_delicious_feed("trusted+feed+ics")get_delicious_feed("trusted+eventful+contributor")get_delicious_feed("new+flickr+contributor")For example, here’s the method that the aggregator uses to check whether or not to include an Eventful event contributed by a given Eventful account:
01 public static bool isTrustedEventfulContributor(string accountname)02 [Macro error: Can't compile this script because of a syntax error.]
The regular expression at line 4 matches URLs like this:
eventful.com/users/judell/created/events
If you check the corresponding Eventful page you’ll see why the aggregator posts bookmarks with addresses in this format. That way, the human who’s monitoring the feed can easily click through to eyeball the events created by a new user whose legitimacy needs to be checked.
To see how isTrustedEventfulContributor makes its yes/no determination, we need to unpack the match_url method. Here’s the first version I wrote:
private static bool match_url(Dictionary<string,string> dict, Regex re, string url) [Macro error: Can't compile this script because of a syntax error.]
This worked, but didn’t have the concise, functional, Pythonic feel that I like. So I went back to the drawing board and came up with another version:
private static bool match_url(Dictionary<string,string> dict, Regex re, string url) [Macro error: Can't compile this script because of a syntax error.]
This works identically, and it’s much closer to what I’d do in Python: Filter a list using a lambda expression.
Part Three: Conclusion
If you’re not a programmer — and in particular, a programmer who would be interested in Azure, or in a comparison between C# and Python — your eyes glazed over when you got to part two. That’s fine. There’s still an important takeway for you. Del.icio.us (and any del.icio.us-like service) is a database! You can use it, without doing any programming, to maintain lists of arbitrary sets of resources that can be queried and edited, with equal ease, by humans and by programs.
Whatever you can identify with a URL is fair game. You can invent your own simple business logic by defining rules for what tags to use, and when and how to change them. You can monitor RSS feeds, in any feedreader, in order to be alerted when monitored items change. You can share or delegate the work by sharing or delegating access to the del.icio.us account. And last but not least, when you need to get a programmer to make use of this database you and your collaborators have built, that person’s job will be drop-dead simple.

Visible Workings (redux).
For me, one of the 2008’s most important (but least remarked-upon) ideas was spelled out in this post which details how Ward Cunningham implemented Brian Marick’s notion of Visible Workings. The idea, briefly, is that businesses can wear (non-confidential aspects of) their business logic on their sleeves, observable to all.
In a year of devastating consequences ensuing from the lack of transparency in business, you’d think Ward and Brian would be celebrated for this work. No such luck. Partly, I’m sure, because their insights flow from the realm of software development and software testing, and don’t generalize in an obvious way.
It struck me this morning that yesterday’s item on using del.icio.us to manage trusted feeds may help to broaden the appeal of the idea.
In that item I mainly talked about the logistical benefits of the approach. You write less code, and you get to leverage existing infrastructure for data management, web UI, collaboration, and syndicated alerts. That’s all good. But there’s also a transparency benefit which I neglected to point out.
At this moment, for example, del.icio.us/elmcity is a snapshot of the feeds and contributors known to, and classified by, the live version of my service at elmcity.info/events. That version uses private lists of trusted feeds, and of new and trusted contributors. I haven’t yet cut over to the newly-rewritten Azure version, but when I do, it will use these public lists instead.
The del.icio.us/elmcity snapshot reports that there are 41 Eventful contributors of which 37 are trusted and 4 are new.
Why are the four new contributors still sitting in the holding tank? One I mentioned yesterday. jheslin created a venue, but no events. I plan to delete that contributor and wait to see if he or she shows up again with actual event contributions.
That leaves TallWilly, blahblah25, and michellelewis. Why are they still sitting in the holding tank? Here’s the crucial point: I’m not sure. I know that I reviewed them when they showed up, and applied a policy. If it were written down, which until now it hasn’t been, it would use language like “legitimate” and “substantive” to define the kinds of contributions that move a new contributor into the trusted bucket. But I can’t actually say how I applied that policy in these cases.
So let’s investigate. First, TallWilly. Clicking through, I find that TallWilly is no longer an Eventful user. Obviously I’ll want to remove him from the new bucket. Implicit rule now stated: Must be an Eventful user.
Second, blahblah25. Clicking through, I find only one event. Seems legit, and so far I haven’t required more evidence than a single legit event, so why didn’t I promote blahblah25? Oh, I see. Jan 4, 1900 12:30 AM isn’t a reasonable start date. Implicit rule now stated: Date must be reasonable.
(Of course there’s more to the story here. blahblah25’s bogus date was either a human error or a software error, or both. Ideally the aggregator, when rejecting a contribution on that basis, would notify the contributor and invite a correction.)
Third, michellelewis. Why didn’t I decide to trust her? Turns out it was just a mistake! Clicking through, I find an entire schedule of concerts, including this one at Fritz Belgian Fries on April 3, 2009. That event, and future events posted by michellelewis, absolutely belong on the calendar.
I only discovered this mistake by reviewing the lists of new and trusted contributors. In the existing version of the system, I’m the only one who can do that. But in the new version, everyone can. More eyeballs, fewer bugs.
Even more interesting, to me, is notion of developing and applying policy-driven business logic in a transparent way. Of course business processes can’t always work that way. But the default, now, is that none do. Sometimes, maybe more often than we imagine, we could flip that default. It would be an interesting experiment to try.

Feed validation revisited: The parallel universe of iCalendar feeds.
If you were tuned into the blogosphere back in 2001, you’ll recall lots of chatter about RSS feed validation. RSS came in multiple flavors. Anyone could whip up a feed purporting to be in one or another of those formats, and many of us did. There were all kinds of questions about how and why feeds did or didn’t conform to the various specifications.
Nowadays we have even more flavors. There’s RSS 2.0. And there’s Atom, which isn’t a member of the RSS family at all, it’s a different species of feed format. And yet you rarely hear about problems with feeds that can’t be read and processed by feedreaders.
I think there are two reasons why RSS/Atom-style feeds work pretty well nowdays. First, there’s the Feed Validator. Mark Pilgrim and Sam Ruby put a huge amount of effort into this excellent tool. Why? Here is their explanation:
Despite its relatively simple nature, RSS is poorly implemented by many tools. This validator is an attempt to codify the specification (literally, to translate it into code) to make it easier to know when you’re producing RSS correctly, and to help you fix it when you’re not.
The second reason is that RSS/Atom-style syndication has been happening in a lot of places for a long time now. A lot of people have used, and helped to refine, the tools and techniques.
Now I’m exploring the parallel world of calendar syndication, using ICS feeds instead of RSS/Atom feeds. And it feels like 2001 all over again. There are ICS feeds out there, but nowhere near as many as RSS/Atom feeds. And my hunch is that even when ICS feeds are published, they’re often unused, so there isn’t enough feedback to flush out problems. Finally, the ICS equivalent of the RSS/Atom Feed Validator — a service called iCalendar Validator, based on a Java library called iCal4j — isn’t anywhere near as comprehensive and informative as the RSS/Atom Validator.
Here’s a chart that lists the iCalendar feeds currently being collected by the elmcity.info calendar aggregator.
As you can see, the results are all over the map. Some purportedly valid feeds won’t load using one iCalendar library, some won’t load using another. Some purportedly invalid feeds do load.
I expect things will get worse before they get better. There are only a handful of different ICS producers represented here, but the two labeled homegrown were created directly or indirectly in response to my project. If we recapitulate the RSS/Atom experience with ICS, and lots more ad-hoc ICS feeds arrive on the scene, charts like this will go even redder.
To make them go green, we’ll need a more robust ICS validator.

A conversation with Jeff Jonas about connecting dots.
On this week’s Interviews with Innovators show I spoke with Jeff Jonas whose work (and narration of that work on his blog) first captured my interest in 2007.
If you follow Jeff you’ll know what he means when he uses phrases like perpetual analytics, non-obvious relationship awareness, semantic reconciliation, sequence neutrality, and anonymous resolution. If not, and if you’re interested in how we can connect the dots across siloes of data, I recommend that you peruse his blog first and then listen to this interview, which clarifies a couple of points I’d been wondering about.
One of Jeff’s tenets is that new information has be able to answer old questions, and answer them in near-realtime. On the face of it that seems impossible. How can you compare a newly-ingested fact with every existing fact in a database, and run every imaginable query?
Well of course you can’t, and don’t, visit every record in the database. You consult an index, and the interesting question becomes: What kind of index? In Jeff’s world, it’s an index based on keys that represent entities (people, places, organizations) and “features” (locations, relationships). And these entities are fuzzily defined. I think of them as clouds of associations. So for example the key for Jon Udell would point to items where Jon is misspelled as John. Most systems abhor this kind of variation, but Jeff embraces it, and I find that fascinating.
Another intriguing idea was reported by Phil Windley in his write-up on Jeff’s ETech talk:
Jeff treats query as data. When a query is made against the context, and gets no response, it’s stored in the database. Later if data shows up that matches the query, you get a match. Treating queries like data makes it so you don’t have to ask every question every day.
Here again, I wondered how you avoid running every query against every new fact. What does it mean for data to “match” a query? Part of the answer, as I understand it, is that both queries and data are indexed semantically, using keys that encompass clouds of associations.
Another part of the answer emerged in this interview. You have to be really sure about those associations. If you put a John Udell record into the Jon Udell bucket, you had better be certain that this is a legitimate misspelling in an item that refers to a particular instance of Jon Udell (i.e., me, not this guy), rather than a legitimate reference to one of the John Udells.
Now that I know about this constraint, the whole thing makes more sense.

iCalendar validation issues #1 and #2: blank lines, PRODID and VERSION.
Sam Ruby offers the following advice to those of us who would like to improve the interoperability of iCalendar feeds:
Identifying real issues that prevent real feeds from being consumed by real consumers and describing the issue in terms that makes sense to the producer is what most would call value.
I’ll be documenting issues as I encounter them. Here’s the first: Should feeds use, or not use, blank lines between components? (A component is a chunk of text representing an event, or something else that can show up in an iCalendar file, like a todo item.)
The presence of blank lines is a reason why this feed is one of two I’m tracking that won’t parse in DDay.iCal.
The unmodified feed looks like this:
BEGIN:VEVENT...stuff...END:VEVENTBEGIN:VEVENT...stuffEND:VEVENT
Part of the “fix” is to make it look like this:
BEGIN:VEVENT...stuff...END:VEVENTBEGIN:VEVENT...stuffEND:VEVENT
But I’ve put “fix” in air quotes because, well, who’s wrong in this case? The feed producer (in this case, the Keene Chamber of Commerce), or the feed consumer (in this case, DDay.iCal)?
I looked at the spec and didn’t find evidence pointing one way or the other. Neither did this person:
> 1) yes, KOrganizer adds empty lines between VEVENT, VTODO and> VJOURNAL. I just checked the specification (RFC 2445), and it> doesn't say anything about blank lines... (neither explicitly> allowed, nor explicitly not allowed)
This is a perfect example of why the process that Mark Pilgrim and Sam Ruby went through for RSS/Atom feeds will be so valuable for iCalendar feeds. Quite a few details that affect interoperability turn out to depend on assumptions and interpretations that aren’t explicit.
Maybe I’m misreading the spec, and it really does forbid blank lines between components. If so, great, the validator can enforce that rule. But maybe it neither allows nor forbids. In that case, the validator can say so, and suggest a best practice. In this case, my guess is that the best practice would be not to include blank lines.
But I said that remvoing the blank lines is only part of the “fix” — and here’s why. When I remove them, the feed still won’t parse in DDay.iCal, but for a different reason. Now the problem lies here:
BEGIN:VCALENDARX-WR-CALNAME:GKCCBEGIN:VEVENT...stuff...
In this case, the reason is clearly stated in the spec. A feed is supposed to include VERSION and PRODID properties like so:
BEGIN:VCALENDARVERSION:2.0PRODID:-//hacksw/handcal//NONSGML v1.0//ENBEGIN:VEVENT
If I inject those into the Chamber of Commerce feed, and remove blank lines, it parses in DDay.iCal.
Note that the unmodified feed is reported to be valid by this iCal4J-based validator. A more robust validator, in the style of the Pilgrim/Ruby RSS/Atom validator, would fail the feed, and would cite the relevant part of the spec in its explanation of the failure.
The spec says, by the way, that both VERSION and PRODID are required elements. When I saw that DDay.iCal was rejecting the Chamber of Commerce feed, which contains neither, I figured that was why. And sure enough, it accepts this:
BEGIN:VCALENDARVERSION:2.0PRODID:Keene Chamber of CommerceX-WR-CALNAME:GKCCBEGIN:VEVENT
But it also accepts this:
BEGIN:VCALENDARVERSION:2.0X-WR-CALNAME:GKCCBEGIN:VEVENT
And this:
BEGIN:VCALENDARPRODID:Keene Chamber of CommerceX-WR-CALNAME:GKCCBEGIN:VEVENT
But not this:
BEGIN:VCALENDARPRODID:Keene Chamber of CommerceBEGIN:VEVENT
Eventually I twigged to the fact that it’s evidently just looking for two (or more) non-empty lines between the BEGINs. For example, this parses:
BEGIN:VCALENDARFOO:BARBAZ:FOOBEGIN:VEVENT
In practice this isn’t a big deal. None of the metadata matters to me, for my purposes, so my aggregator can just elide it before sending a feed to the parser. But the metadata might matter for someone, for some purpose. A proper validator would help ensure that it will be available to those people, for those purposes, by enabling feed producers and feed consumers to more easily produce and consume valid feeds.
For what it’s worth, I’m going to track this category of issue using the tag icalvalid, and I invite other interested parties to do the same. As in the case of the grl2020 tag, I know the tag can appear in a variety of places including del.icio.us, Technorati, WordPress, and nowadays of course Twitter. So I’ll create a metafeed that tracks icalvalid in all of those places.
Update: OK, here’s the icalvalid metafeed, based on this Yahoo Pipe.

iCalendar validation issue #3: Quoted-printable vs HTML.
Next up in my series of iCalendar validation examples: The Frost Free Library feed. It fails in three of the four parsers I tried here, and should have failed in all. It begins like so:
BEGIN:VCALENDARVERSION:2.0X-WR-CALNAME:Frost Free Library | January 06, 2009 - February 05, 2009PRODID:-//strange bird labs//Drupal iCal API//ENBEGIN:VEVENTDTSTART;VALUE=DATE-TIME:20090106T203000ZDTEND;VALUE=DATE-TIME:20090106T203000ZSUMMARY;ENCODING=QUOTED-PRINTABLE:Library TeaDESCRIPTION;ENCODING=QUOTED-PRINTABLE:<p>Normal 0 false false false Mic=rosoftInternetExplorer4</p>=0D=0A<br class=3D"clear" />URL;VALUE=URI:http://www.frostfree.org/node/505UID:http://www.frostfree.org/node/505END:VEVENTEND:VCALENDAR
It’s hard to know exactly what the feed producer thought it was doing here, but the feed should fail because no valid content line can begin with rosoft…. Adding a blank space at the beginning of all such lines will, I think, make the feed at least nominally valid.
But a robust validator would have more to say on the subject. It would notice that this feed is trying to publish HTML content, and would point out that there’s an ALTREP (alternative representation) for this purpose. Setting aside the fact that this feed doesn’t seem to have any actual HTML content, I believe the right way to encode such content would be something like this:
BEGIN:VCALENDARVERSION:2.0X-WR-CALNAME:Frost Free Library | January 06, 2009 - February 05, 2009PRODID:-//strange bird labs//Drupal iCal API//ENBEGIN:VEVENTDTSTART;VALUE=DATE-TIME:20090106T203000ZDTEND;VALUE=DATE-TIME:20090106T203000ZSUMMARY;ENCODING=QUOTED-PRINTABLE:Library TeaDESCRIPTION;ALTREP="CID:xyz":Basic description here.URL;VALUE=URI:http://www.frostfree.org/node/505UID:http://www.frostfree.org/node/505END:VEVENTEND:VCALENDARContent-Type:text/htmlContent-Id:xyz <html><body> <p><b>Enhanced description here</b> Body of enhanced description.</p> </body></html>
I don’t know to what extent ALTREPs are actually produced and consumed. My guess is rarely, and that producers might want to lean toward plain text with line folding when that’s sufficient. But that’s just my guess, I’d be interested to hear from folks who know.
