Michael Weber: Random Bits and Pieces

...Please don't assume Lisp is only useful for Animation and Graphics, AI, Bioinformatics, B2B and E-Commerce, Data Mining, EDA/Semiconductor applications, Expert Systems, Finance, Intelligent Agents, Knowledge Management, Mechanical CAD, Modeling and Simulation, Natural Language, Optimization, Research, Risk Analysis, Scheduling, Telecom, and Web Authoring just because these are the only things they happened to list.

Kent M. Pitman

Lispin is yet another attempt to rid Lisp (or rather Scheme, at the time of writing) of parentheses:


;;; Try indented syntax yourself

defun factorial (n)
  if (<= n 1)
    the 1
    * n
      factorial (- n 1)

Lisp Logo (by Conrad Barsky) Curiously, the site links to my colorbox prototype as previous work. There is, however, a noteworthy difference in mindset between the two: I do not conflate source code and its presentation. It was not at all my intention to get rid of syntactical structure. Instead, I toyed with replacing its presentation by means of nested color boxes (think structure editor). The parentheses can be hidden (optionally, by the way), but they are still very much present in source code, because it makes the job of syntax-directed tools (parsers, editors, ...) easier!

Getting rid of the parentheses at the syntax level has been proposed a number of times. The advantages are not clear. However, the disadvantage is the increased importance of whitespace, and I think this is gravely the wrong direction. Source code editors often feel not obliged to preserve the exact whitespace characters in non-binary files, #\Tab expands to differing amount of #\Space characters, etc. Thus, program semantics may change without clear visible clue, which is as wrong as it can get.

Haskell with its off-side rule suffers from the same issues, but at least it has a type system to catch such whitespace-related errors. In Python, Common Lisp and other dynamic languages, such errors could easily go unnoticed until run-time.

For new (Emacs-using) Lispers who feel the presence of parentheses is distracting, I would recommend trying out parenface instead. It allows assigning a low-contrast color to parentheses. It might be all that is needed, so that you can proceed immediately to the state of complaining about missing Lisp libraries. Ahem...

UPDATE 2006-12-07: c.l.l Lispin Discussion

I do not read comp.lang.lisp regularly, but there is an ongoing discussion about Lispin (via Paolo Amoroso).

UPDATE 2006-12-08: Reddit says...

This article is discussed on reddit, so let me clarify and expand on a few points:

  • pjdelport wrote:
    I think the quote [In Python, Common Lisp and other dynamic languages, wrong indentation can easily go unnoticed.] meant Common Lisp with indentation-based syntax, instead of just parens.

    I meant to say with whitespace-based indentation, sorry for being unclear. Actually, I would say that Common Lisp as it is, is already indentation-based, sort of. Programmers rely on the indentation to understand code, not on counting parentheses. New lisp programmers (and even seasoned ones) fall into the trap of wrongly-indented code from time to time, when they indent code manually instead of relying on their editor (try out paredit, it can save your day). Fortunately, with parentheses and an almost universally accepted indentation scheme around, it is easy to reindent Lisp code automatically, and spot the mistake. Not so with whitespace-based indentation.

    Parentheses are merely a structuring tool for editors and other source-transformation tools (like macros). However, they are visible, and people keep complaining about them. Their first solution is "I will change the syntax", instead of thinking in the direction of a structure editor (see, e.g., LavaPE).

  • Cynos wrote:
    I came in precisely to comment about this. In Python, wrong indentation either doesn't compile or produces notably wrong results.

    Let me back up my claim with some data points. Exhibit A: pseudo whitespace-based indented code:

    
    if <condition>:
        <stuff>
        <stuff>
    <oops>
    

    How can a compiler know that <oops> was supposed to be guarded by the condition?

    I have seen python projects from students and watched them extending bodies of existing python code (web applications, if you must know). They experienced specifically the issues I mentioned. The code appeared to work at first despite wrong indentation (and thus semantics) in places. What can be done about it? Testing. Right. How can we ever be sure that we really caught all problems? Rigorous testing. Unfortunately, this is not easy (and subject for another rant, perhaps). So, again please, I am slow: why introduce more sources of errors?

    Sometimes, I could witness whitespace problems even when kiebitzing over students' shoulders long enough: they copy&pasted a perfectly good block of code from another source file. It looks exactly the same in the other buffer. They add and adjust a few lines in the middle, run the program and everything looks fine. At some point things start going wrong. If they are lucky, the web application throws a backtrace into the browser window. Unless bad indentation caused a logical error, in which case the application just not behaves like it should. But the code looks fine, so why does the error happen at all?!
    Another three Edit-(Compile-)Restart-Test cycles, and the poor programmers start reindenting everything in the general neighborhood of their last edits, until things perhaps appear to be working again.

    The problem surfaces especially when different IDEs were used for editing python code, some parts were written in Eclipse with a python plugin, others with vi or PyPE or whatever else, and perhaps with different #\Tab conversion settings.

    (Note: If somebody now retorts with coding standards, I will counter-retort with herding cats and single point of failure!)

What can we learn from this? My argument here is not that Lisp syntax is superior. It is that a programming language should enable writing program which give unambiguous visual clue for their underlying semantics, and we should separate between syntax and presentation. A parenthesis-less, indentation-based view for Lisp code is fine from this perspective, if it is unambiguous. If you don't like the parentheses, consider a structure editor instead of making the syntax whitespace-dependent.

UPDATE 2008-02-09: Nice Try, no Cookie...

Sam wants to disagree, but lacks arguments. So, he decides to miss the point and rant instead. Fair enough, but then I am allowed to shoot back:

Executive summary #2: Significant whitespace is fundamentally wrong because none of his editors are able to handle it, and the TAB-vs-SPACE issue can cause incorrect program behavior.

Well, not quite. I thought I wrote about it above, but let me try again to make myself clear. I argue that significant whitespace has no advantage, but it complicates syntax-directed tools: lexers needs to keep track of arbitrary and arcane whitespace rules, editors too, pretty-printers, refactoring tools, etc.. In addition, it is easy to confuse one whitespace character for another, because they all look the same! Whitespace is for presentation, I argue that the type and amount of whitespace present should not alter the meaning of a program. Also, fact is that editors traditionally are very lax in their handling of whitespace. I gave examples, too.

News at 11

He fails outright, however, to mention how much trouble they have writing Lisp code for the first time.

Not only is this completely besides the point, I would also like to see that claim backed up before I buy it. Maybe we should ask people who use a Lisp for teaching about their experience.

Sam then continues to insist that whitespace should be significant, and to fix editor issues he argues for rules like:

[...] it is equally true that such editors allow tabs only before a non-tab character. This is important: pay attention here, people! If you allow a tab after a space, then you run the risk of munging indents.

I find that completely arbitrary. Who came up with it?

Mike continues, program semantics may change without clear visible clue, which is as wrong as it can get. I've never had this happen.

The only argument I can see here seems to go along the lines of It never happened to me, therefore it cannot happen, which is not exactly convincing.

More of the Same

Sam follows up by two more attacks on Lisp which continue to be besides the point, except the second about the syntax of IF is factually wrong, even:

according to standard Lisp (or Scheme) indentation guidelines, the consequent and the alternate clauses are at the same level of indent, with nothing in between them. It gets worse: according to Lisp's definition of the (if) form, you get one, and only one consequent clause, but as many alternate clauses as you need. That means you must remember to use the (progn) or (block) form for the consequent, but it's not needed for the alternate. That means, in Mike's structured editor concept, you cannot tell the difference between them. Unless you consciously remember to use (progn) or friends. Aww, shucky darn; where is your god now?

First, a structure editor would know the syntax of IF. Case in point: Emacs knows how to indent an IF form correctly, be it in Emacs Lisp (which matches the description he gives, except that it is nowhere near standard) or Scheme and Common Lisp, which both have (IF <exp> <exp> <exp>) syntax. Lastly, having code like the example below is quite uncommon:


(if predicate
    consequent
    (progn
      alternative-1
      alternative-2
      alternative-3))

Instead, the PROGN is replaced by a form which does a little more than just sequencing (for example binding forms like LET), if appropriate, or the whole IF is replaced by:


(cond (predicate
       consequent)
      (t
       alternative-1
       alternative-2
       alternative-3))

In fact, this can be done completely automatically, for example in Emacs with my recent Redshank mode.

Haskell does better than python (here)

Mike further demonstrates his ignorance of indentation-based systems by claiming Haskell does better than Python, because apparently, it's all part of the type system. Haskell with its off-side rule suffers from the same issues, but at least it has a type system to catch such whitespace-related errors. What? Since when does lexical scoping (a function of the parser) have anything to do with typing (which occurs after it's parsed.)

Oh dear. Scoping is mainly about the binding of identifiers. Anyway, a small Haskell program might illustrate the point I was trying to make.

module Main where

import Control.Monad.State

data TraceLevel = Quiet | Verbose deriving (Ord,Eq)
type C = StateT TraceLevel IO

main = evalStateT mainC Quiet

logC :: TraceLevel -> IO () -> C ()
-- assume that f should not be able to access C's
-- state, hence it has type IO () instead of C ()
logC x f = do
  l <- get
  if x <= l then lift f else return ()

mainC :: C ()
mainC = do
  logC Verbose $ do
    putStrLn "1"
    putStrLn "2"

logC is a logging function with support for different levels of verbosity. It is used in mainC. Now consider the indentation of the first putStrLn gets mangled accidentally, and indents to the right. GHC will complain with a parse error: foo.hs:21:4: parse error on input `putStrLn'. This is probably what Sam had in mind.

Now consider the second putStrLn happens to get mangled instead: it gets outdented, like so:

mainC = do
  logC Verbose $ do
    putStrLn "1"
  putStrLn "2"

GHC (rightly) throws a type error:

foo.hs:21:2:
    Couldn't match expected type `StateT TraceLevel IO ()'
           against inferred type `IO ()'
    In the expression: putStrLn "2"
    In the expression:
        do (logC Verbose) $ (do putStrLn "1")
         putStrLn "2"
    In the definition of `mainC':
        mainC = do (logC Verbose) $ (do putStrLn "1")
                 putStrLn "2"  

Now, where was my statement above wrong again?

Testing is NOT EASY either (if one wants some guarantees)

Testing IS NOT HARD. The level of rigour needed to detect indentation problems is precisely enough unit testing to cover all execution paths through the function you're coding. And, that is precisely the amount of unit testing you need in shippable-quality production code! This is not a coincidence. Nor is it terribly hard to do! If you do find that you cannot exercise all execution paths of your function (probably where Mike gets the idea that rigorous testing is hard)

Testing is easy, rigorous testing is not. Here is a small quiz: how many execution paths does the following program have:


(let ((i 0))
  (loop while (plusp (random 2))
        do (incf i))
  i)

Is it possible that the program returns 0? Yes, so that's at least one execution path. How about 1? Sure, another path (if it would be the same path, the result would be the same.) How about 1000? 10000? And so on. Their probabilities decrease, but they could happen.

Testing all execution paths is clearly not possible in general, but perhaps Sam meant an approximation like coverage of all edges in the control flow graph as a testing strategy. Unfortunately, this is not exhaustive testing, and bugs can still hide on obscure execution paths.

There is a reason why a whole branch of computer science is concerned with proving correctness of programs, why in some application areas already more than mere testing is used to actually ensure correctness of software.