Loitering on the sidewalk of life: 2007

Tuesday, November 06, 2007

Hosting my own blogs

Well, I finally deployed Roller on the server I share with Ky. So, it looks like I will start blogging using that server rather than Blogger. I like Blogger, but it does not allow me the same kind of control over the blogging engine.

So, come over to my blog server.

Wednesday, September 05, 2007

The limits of mathematics

I have been fascinated by Godel's incompleteness theory for a long time. Of course, being fascinated by it does not mean I know very much about it. This is the same as my fascination on playing music. I knew how to play tons of musical instruments. I just don't play any of them worth a dime. The seed of the fascination for the incompleteness theory started when I was in third year in university, when my professor introduced us to the secrets of parsing. Now, how would that have anything to do incompleteness? Well, that was when I started reading journal papers: I was so impressed by the ability for the computer to do parsing that I had to know to know everything about it. Again, that does not mean I knew about the topic in a a deep way, just that I found tons of papers on parsing written by Knuth and other people that I did not understand very well: I knew what the different issues were, I just did not know why and how they could be solved. I remember trying to impress the professor by implementing Early's algorithm for a tiny simple parsing assignment (in Simula 67!). I did not get a high mark, as he obviously did not like the prank.

That summer, I stayed in the libraries, roaming the different stacks of books. I stayed mostly on the computer science side of things. What I found was nothing short of astonishing: what I had studied until then was like a grain of sand on the beach. It was beyond insignificant, compared to all the sand on the beach of knowledge.

Then, for whatever reason, the book "Godel, Esther, Bach" partly to blame, I started drifting over to the philosophy of mathematics. Again, like whatever I was fascinated with, I never spent very much time on anything on that topic, just surfing from one book to another. I started wondering why, even after I had attempted to read the GEB book for a few times by then, that these books were all written in the 1920's, 1930's. I did not understand at all.

Then, quite naturally, I guess, I moved over to reading Bertrand Russell's books, most of them his popular books. Obviously, I did not know what was philosophy and what was not. I just read everything he wrote that I could understand. But that was good: I was connected back to foundation of mathematics: Russell's "Principia Mathematica". That was when I put all these things together: Godel's theory killed all of them. Suddenly, the whole field withered. I bet there are still people doing research in that field; I just don't think they are as excited as the mathematicians and logicians in the 1920's and 1930's studying in the same field.

Fast forward to ten years later, when someone forwarded Bill Joy's "Why the future does not need us" to me. The bibliography was wonderful, and it rekindled the biggest question I had after those years of thinking about the reason for the incompleteness theory. Then it hit me, when I found a book called "The Limit of Mathematics". I tried to read it, but I wasn't too fascinated by the topic of the book. But I did think of the reason, well, I think that is the reason for the limits of mathematics: whereas the nature has no collection, no grouping, and has only individuals, mathematics (and our thinking) must deal with not individuals, only on groups, only on collections, only on (perceived) shared characteristics. Without that, we would have no theories. No science.

But why would that make any difference?

Because we lose information by grouping. If we have a theory that has only one or some simple groups, then that theory can be complete and consistent. Any time we allow the grouping to be complex enough, we start losing information. Once you are in a state where information has been lost, you can't come back to the state before the information was lost. Or indeed you could, only with the price of getting outside the universe you have constructed.

Again, so what? That has nothing to do with programming or software, the implicitly theme of this blog. Well, I was a functional programming fanatic once. For a short time. When I was studying for my doctorate. Don't get me wrong, functional programming is a wonderful thing. My intellectual awakening can be traced back to functioning programming: my reading of John McCarthy's 1960 paper. But. Functional programming is too neat. Too tidy.

My holy grail has been shifted from a functional programming world to something equally unrealistic quite a few years ago: the representation of the real world, with all its glorious mess. In 1999, I realized that the tools we have would not do it. I was ecstatic when I heard about the semantic web, before I found out how mathematical rigorous it was going to be. No. Rigor does not interest me. I want the messiness of reality.

Now, messiness does not mean being inaccurate. What we have now usually is very neat and tidy and rigorous. But it is not accurate. I can prove whatever I want with the wonderful schema I have designed just yesterday. But this rigor does not show whether it is accurate with respect to the reality it is supposed to represent.

I was hopeful for a little while when I was thinking about what ontologies can help me in getting that messiness.

In a way, it is funny how deep mathematics is inside me, that I know I must get out of this mathematics of generalization and move on to something that I can use to represent the real world, a world of individuals, a something where we can handle truth in different layers, just like in the real world. I just can't. It is so much a part of me that I don't know what it is like not having this mathematics, the learned instinct to group things.

Perhaps I am trying to get back the primordial way of looking at nature: everything and every phenomenon is considered in itself. But I don't want to lose the ability to accumulate knowledge either. So, I don't want that purely individualistic view, but I want it just organized enough to be able to allow me to differentiate individuals. but not so much that I end up with individuals, with no connections with each other. So, it is not simply having a unique identification for each object. After all, these objects are all the same, just the values of their attributes.

That, I think, is also the reason why programming is so hard to get right. As humans, we think in layers, moving up and down at will, but we know, by intuition, the different layers of understanding we have. Programming, right now, has no such thing: you must program on the lowest layer. All the abstractions, all the frameworks, they all mask the details, but not allowing us to move up and down the different layers as we wish. In other words, we avert our eyes when we work on something specific, and get the tools, the framework, the abstractions to mask the details of the lowest layer that, unfortunately, is never very far. We have to know all possible paths the program we are writing to handle the cases we want it to handle, on the lowest layer. That is why Visual Basic is so terrible whenever you use it for anything remotely complex: that by design you can't get to the lowest layer sometimes, and because the ability not to consider the lowest layer is only an illusion, when the time comes to handle some lowest layer detail that Visual Basic hides from you, you are stuck. On the other hand, C gets you to program on the lowest layer, even when you can hide some of the complexities away makes for very reliable, very robust systems. And you must think of all the possible ways things can go wrong.

Both are simply lousy if you need to handle the moving of time, the changing of the relationships over time of the things modeled in the programs. That is why all live systems are a mess, and all living database schema is impossible to understand unless you were present at the creation of the world.

I think that is again the influence of mathematics: we are trained to think in an instant in time. On the other hand, the real world does not stop. It keeps changing. Our programming languages, our tools, our systems, they are always constructed for a snapshot. Evolving anything to get it moving in time is so difficult, time-consuming, and particular for each case, that we pretend we don't need to handle the movement of time.

We need a different paradigm. A paradigm that allows us to amalgamate independent agents to get the amalgam to do something we want it done. Instead of thinking about raw speed, we should aim for efficient of action. Instead of thinking 10 million possible chess moves per second, we get a program to reason like a human. After all, the chess game is an artificial game, not like it has anything to do with the reality itself. Change of relationship is actually change of relationship among the members of the amalgam.

Where am I? I must be losing my mind now.

Tuesday, August 07, 2007

Hiatus

I came back from a visit to my family last weekend. In those three weeks I was over there, I did not check my email once. The only times I was using the computer, I was teaching my niece Web tools or my nephew Mindstorm programming.

In other words, I was away from the Internet world. I did not do a single worthy thing. I did not even bother thinking about the projects I was working on.

Every day, I got up around 7 a.m., had my Nescafe (terrible taste), and talked to my mom about the old days. The most enjoyable thing I can imagine: the older I get, the more I want to maintain that continuity from the previous generation to my generation. And the tales are simply mesmerizing. The Japanese occupation. The hardship after the indiscriminate bombing of the allied planes. The general chaos after the Japanese surrender and before the return of the British. The struggle for survival in the post-war years.

It is so difficult to imagine a life filled with so much drama, all tragic. We had it made, our generation did.

Saturday, June 30, 2007

Curiosity and the Computing Profession

Alan Kay said that computing professionals are not curious, especially about the past. How obvious!

Let's start from the top, the CTOs. Laugh now, laugh. I think most of us know that even if they are curious, they won't know enough to understand both the question and answer.

The middle managers? Well, if it is not about resources and deadlines, they don't know what it is. And how important would it be anyway?

The development manager? Lack of curiosity is almost a badge of honour for them. For that is why they are managing and not programming.

Project managers? Laugh some more.

The programmers? Well, since you must know all the fashionable buzzwords, why would you want to know anything that has nothing to do with the buzzwords? Let's see what is fashionable right now, and learn as little as possible about it, as long as it is enough to fool the interviewer, all is good.

I knew about this lack of curiosity since the beginning of my career. Initially, I attributed it to my mostly electrical engineering educated colleagues. I did not think they thought much about computer science. They programmed for a living. That is all, nothing more than that. Then I talked to a person who was advocating design patterns, and realized he was not a bit curious about where design patterns came from. Christopher Alexander? Who? After explaining who he is, his reaction was a shrug. I knew enough to change the topic.

But then, the author of the article made an obvious mistake too: he said Alan Kay invented object-oriented programming. Alan Kay is one of the Gods in computer science. But, he invented an object oriented programming language. Not object-oriented programming itself.

Oh well.

Update: In case it is not clear, this is only a very general observation. I don't mean every CTO is clueless. Some obviously are not. The same for other roles.

Tuesday, June 26, 2007

Java Versus C, part 2

In my previous post on C versus Java, I talked about the reluctance to re-factor for me. It is much harder to convince myself to change any part of the C program, especially far-reaching changes, or those that change the structure of the program.

The application has been in pilot for a few weeks now. So far, I have one reboot-inducing bug, and some minor issues. The reboot-inducing bug was due to freeing memory that has already been freed.

I am quite satisfied with the application. And once in a while, I would find myself admiring the code I wrote. That never happened before.

This morning, when I was reading an article on TheServerSide.com about C++ versus Java, something just hit me. Even though the 'one small problem and Palm is going to reboot' is certainly not good for the health of the heart, the necessity to be ultra-organized to ensure that does not happen makes the code much better.

I avoided C++ because I was really worried about using too much memory. I also avoided it because hunting memory leaks is not exactly my cup of tea. Making the matter worse is of course the amount of hassle to debug a Palm app on the device. It is also because C++ is more opaque than C. Even in C, I found myself making everything as simple as possible. If a statement can't be understood in one glance, I change it so that it can be.

So, that is the advantage of using C versus Java for me: it forces me to think of the simplest solution to a problem, to organize well, to code defensively, and to be paranoid about what each statement would do to the system. So, if a C program works well, it is inevitably well-crafted. It will have to be for it to work well.

Still, I have many ideas to make the application much better, but I am scared to re-factor the application to use these ideas.

So, as usual, nothing is perfect.

Wednesday, June 06, 2007

Java Versus C

I just finished, more or less, a Palm OS application in C. Strictly speaking, it is not C, rather, it is using C++ to write a C program. In other words, I am not using any object-oriented features of C++.

Programming in that style of C is actually the only way to control exactly what is going on in terms of memory for Palm OS. You see, Palm OS has no memory protection; an error that causes a segmentation fault on Unix will cause a Palm OS device to reboot. Because of that, I became careful about where every byte is. With C++, it is tougher to account of the bytes, as the virtue of encapsulation also means opacity some times.

After the application is done, I started thinking about the experience I had, and the experience I had for the last few years, when I programmed in Java almost exclusively, other than some Javascript, and Perl. All of these languages handle memory for the programmer.

In terms of programming, all of the languages are fine. They have different styles. But I am not a language bigot; I use a language when it is the right language to use, or it is the best language to use. In a particular context.

I can now say that an incredibly big advantage of using Java (or Javascript, or Perl...) is that refactoring is easy for these languages and very practical especially with an IDE like Eclipse.

For C, especially for an operating system like Palm OS, refactoring is a hairy thing. Instead of trying to find code to refactor, I found myself making sure every change is a necessary change. After a few reboots that took forever to figure out, I became a lot more defensive in my programming, and I also became very defensive in terms of changing anything.

I believe that is the true cost of not having automatic memory management in a language: it make programmer not to have the courage to perform major re-factoring.

Maybe Palm OS is an exception, as it does not have very good debugging facilities, and the simulator is one version down, the emulator two versions down, and device debugging is not fun, to say the least. But still, the disadvantage is real, if quality is concerned. Refactoring is the only way to increase quality of an application, especially if it changes frequently.

Thursday, January 18, 2007

What GWT means to me

I don't remember who said it, but someone said that human affairs simply don't escape from the same basic set of patterns. Over the years, the patterns will repeat over and over again, each time with different tools, technologies and understandings, but the basic patterns don't change much. I call that the Theory of Deficient Imagination in Human. We just can't escape our bodies and everything embodied in it.

When I first heard that Google made GWT available for any programmer to use, I jumped for joy. It was one of the dreams I had on the web business, that someone would decide to treat all the browsers the way a previous generation of computer scientists treated the computers they had to program in: build an abstract layer on top to shield them from the programmer. That revolution happened more than 40 years ago, and this revolution in Web technology had just begun by Google. Eventually, I hope, programmers will forget how to build HTML pages and program in JavaScript. I think that even CSS will join in this group of technologies, and be forgotten by most programmers. Instead, programmers will use toolkits like GWT or something in the same vein, where the browsers are simply shielded, and program in a high-level language they are familiar with, like Java for GWT. I bet Ruby, Python, and whatever languages that stroke the fancy of some programmers, will also join in the revolution, and basically make HTML and JavaScript as important to know as assembly languages were when I finished university in the early 1990s.

Now, don't get me wrong, I have nothing against HTML or JavaScript. I have nothing against hand-coding them. In fact, the automatic generation tools I have seen until GWT are so bad in generating HTML and JavaScript that I think using them is simply foolish. On the other hand, even when I was struggling with assembly languages of various kinds fifteen years ago, I had nothing against them either. I just did not want to use them for big projects.

Am I equating HTML+JavaScript and assembly languages? You bet I am. Am I saying these sets of technologies are the same or at least similar? Not at all. JavaScript is a high level object oriented programming language. HTML is a mark up language for presentation. Whereas assembly languages are, well, symbolic representation of machine code, the stuff that processors actually run on.

Still, these two sets of technologies share the same characteristics in some important ways.

1. Compatibility is an issue. There used to be many assembly languages. All similar, but different in significant ways, other than one or two brain-dead ones, like 8086. Or some really primitive ones, like 6502, which has only one general register. Porting programs from one machine to another was hell. Well, writing a JavaScript program that will run on all the major browser is a pain, not yet hell, but worse in another way: when you write an assembly language program, you know it would run on one particular kind of machine, you have no illusion that it might run on a different machine. When you write JavaScript programs, you hope it would be run on the major browsers. So you try to program it in a way that it would be browser compatible. That is good and bad, obviously; and the result is that most JavaScript scripts are partially cross-browser compatible. Infuriating at times, if you use FireFox rather than IE exclusively.

2. Assembly languages are permissive: you have the whole machine to screw up. For processors without memory protection, this can be really destructive. But then, you don't want any restrictions at all for assembly languages. Same thing with browsers, they must be very permissive: since HTML and JavaScript are simply to help in presentation, unless there is no way to show some content, the browser should try its best to render it. What this means is of course that the browser has a different understanding of what "not working" means. This is not good for web applications programmers. They need to know that something is not right as much as possible, just the opposite of being permissive.

A toolkit like GWT is like C in the 1980s: finally, we could forget about the architecture, especially if we were using GCC. It is not exactly 100% portable, but good enough for most programs. Many programs can be ported quite easily from architecture to architecture. Of course, C and C++ became less portable again when Microsoft started putting in tons of weird constructs in them in Visual C and C++. Even better, symbolic debuggers became the norm for debugging programs. We did not need to know the assembly language of the machine to debug the program any more.

So, that was the situation: programmers wrote in a high-level language, debugged in symbolic debuggers. The programs were still compiled into assembly language: the best of both worlds.

Of course, many people thought high-level languages were over-rated. They would rather program in assembly language. You see, these high-level languages were simply not efficient enough. And the assembly language code their compilers generated was not elegant and worse, impossible to read by humans. What if we had to debug the assembly code?

Compiler technologies improved greatly over the next 20 years, and rendered this objection more or less meaningless. And I don't know many recent computer science graduates know how to program in any assembly language. In fact, some of them don't seem to know basic processor architecture at all. I guess they don't have to that need to write the vast majority of programs now.

While I love the audacity of the developers of GWT to come up with this most interesting technology, something that finally allows me to build real-world web applications easily, I am waiting for them to remove all abilities for the programmers to get to JavaScript and HTML. When I was writing simple recursive descent parsers for my toy languages, I always kept the asm() construct so that I could code in assembly language in case I needed the efficiency of something in assembly language. I never used it, actually. I can imagine the day when GWT is good enough that doing anything in JavaScript or HTML is simply not advisable.

So, what is my dream technology to build web applications? Well, perhaps a new language that is designed for this purpose. The GWT developers have to keep up with the evolving Java language. But if the language itself is meant for web applications, then there is no such problem. Of course, the whole thing should be seamless: you write a web application, without worrying about which part is part of the client, which part is part of the server. The compiler will figure it out for you. The deployment is not needed. To make the program available, it is simply that, a package that you execute, just like a normal program.

So, what do we need to make this into reality? Well, a server, like Tomcat, but not like Tomcat. You don't deploy to the server. You 'run' the program, like a Java program. The deployment to the server is simply automatic: the program runs whenever the process is there. You hit control-c, and the program dies. On the client side, why, you don't need to know anything more than the entry point to the program on the server. One single URL.

I wish I had the time to design a language for web applications, and build the server. And you know what, at that point, you really don't need the desktop; you only need a well-known browser.

I can't wait.