Transcription

THIRTEENHints on programminglanguage designThis paper originated from Hoare's keynote address at the ACM S I G P L A Nconference in Boston, October 1973 (although it did not appear in theproceedings, it was distributed at the conference). It was subsequently printedas a Stanford Artificial Intelligence Memo (AIM-224, STAN-CS-73-403) inDecember of that year. The version printed here was published as [43].Hoare was active in committees working on the design and control ofprogramming languages over many years: he made many contributions to theA L G O L Bulletin (e.g. [13]), was a member of IFIP's WG 2.1 (1962-74) andeven chaired the ECMA TC10 group working on the standardization of PL/I.This paper is perhaps the most rounded of his published comments onlanguage-design philosophy. The earlier [25] makes more pointed references(including a very positive one to APL); [40] and [57] make further points.This paper must clearly be read in the context of its date of composition(1973). The relative weight of comment on debugging and reasoning aboutprograms clearly changed as a result of his own later research. Also, a richernotion of types would be appropriate today. But the sound advice in this papertranscends any minor aspects in which it might be considered to be out ofdate. (Many versions exist of the story about the Mariner I Venus probe. Allof them blame software; they differ as to the precise details.)Subsequent to this publication, Hoare and Wirth consulted for SRI on their'Yellow' language response to the 'Tinman' requirements. Their consistentadvice to simplify even this language was unheeded - but the final Adalanguage (the 'Green' proposal) was even more baroque.AbstractThis paper presents the view that a programming language is a tool that should assist theprogrammer in the most difficult aspects of his art, namely program design, documentation, and debugging. It discusses the objective criteria for evaluating a language design,C. A. R. Hoare, Hints on programming language design. In C. Bunyan (ed.) ComputerSystems Reliability, State of the Art Report Vol. 20, pp. 505-34. Reprinted with permission.Copyright 1974, Pergamon/Infotech.193

194E S S A Y S IN C O M P U T I N GSCIENCEand illustrates them by application to language features of both high-level languagesand machine-code programming. It concludes with an annotated reading list, recommended for all intending language designers.13.1Introductionlwould like in this paper to present a philosophy of the design andevaluation of programming languages that I have adopted anddeveloped over a number of years, namely that the primary purpose of aprogramming language is to help the programmer in the practice of his art. Ido not wish to deny that there are many other desirable properties of aprogramming language: for example, machine independence, stability ofspecification, use of familiar notations, a large and useful library, existingpopularity, or sponsorship by a rich and powerful organization. Theseaspects are often dominant in the choice of a programming language by itsusers, but I wish to argue that they ought not to be. I shall therefore expressmyself strongly. I fear that each reader will find some of my points wildlycontroversial; I expect he will find other points that are obvious and evenboring; I hope that he will find a few points that are new and worthpursuing.My approach is first to isolate the most difficult aspects of the programmer's task, and state in general terms how a programming-language designcan assist in meeting these difficulties. I discuss a number of goals, whichhave been followed in the past by language designers, and which I regard ascomparatively irrelevant or even illusory. I then turn to particular aspects offamiliar high-level programming languages, and explain why they are insome respects much better than machine-code programming, and in certaincases worse. Finally, I draw a distinction between language-feature designand the design of complete languages.13.2PrinciplesIf a programming language is regarded as a tool to aid the programmer, itshould give him the greatest assistance in the most difficult aspects of hisart, namely program design, documentation, and debugging.(1) Program design. The first, and very difficult, aspect of design isdeciding what the program is to do, and formulating this as a clear, precise,and acceptable specification. Often just as difficult is deciding how to do it:how to divide a complex task into simpler subtasks, and to specify the

HINTS ON PROGRAMMING-LANGUAGE DESIGN195purpose of each part, and define clear, precise, and efficient interfacesbetween them. A good programming language should give assistance inexpressing not only how the program is to run, but what it is intended toaccomplish; and it should enable this to be expressed at various levels, fromthe overall strategy to the details of coding and data representation. Itshould assist in establishing and enforcing conventions and disciplines thatwill ensure harmonious co-operation of the parts of a large program whenthey are developed separately and finally assembled together.(2) Programming documentation. The purpose of program documentation is to explain to a human reader the way in which a program works sothat it can be successfully adapted after it goes into service, to meet thechanging requirements of its users, or to improve it in the light of increasedknowledge, or just to remove latent errors and oversights. The view thatdocumentation is something that is added to a program after it has beencommissioned seems to be wrong in principle and counter-productive inpractice. Instead, documentation must be regarded as an integral part of theprocess of design and coding. A good programming language will encourage and assist the programmer to write clear self-documenting code,and even perhaps to develop and display a pleasant style of writing.The readability of programs is immeasurably more important than theirwriteability.(3) Program debugging. Program debugging can often be the mosttiresome, expensive, and unpredictable phase of program development,particularly at the stage of assembling subprograms written by manyprogrammers over a long period. The best way to reduce these problems isby successful initial design of the program, and by careful documentationduring the construction of code. But even the best-designed and bestdocumented programs will contain errors and inadequacies, which thecomputer itself can help to eliminate. A good programming language willgive maximum assistance in this. Firstly, the notations should be designed toreduce as far as possible the scope for coding error; or at least to guaranteethat such errors can be detected by a compiler, before the program evenbegins to run. Certain programming errors cannot always be detected in thisway, and must be cheaply detectable at run time; in no case can they beallowed to give rise to machine- or implementation-dependent effects, whichare inexplicable in terms of the language itself. This is a criterion to which Igive the name security. Of course, the compiler itself must be utterlyreliable, so that its user has complete confidence that any unexpected effectwas occasioned by his own program. And the compiler must be compactand fast, so that there is no appreciable delay or cost involved in correctinga program in source code and resubmitting for another run; and the objectcode too should be fast and efficient, so that extra instructions can beinserted even in large and time-consuming programs in order to help detecttheir errors or inefficiencies.

196ESSAYS IN COMPUTING SCIENCEA necessary condition for the achievement of any of these objectives isthe utmost simplicity in the design of the language. Without simplicity, eventhe language designer himself cannot evaluate the consequences of hisdesign decisions. Without simplicity, the compiler writer cannot achieveeven reliability, and certainly cannot construct compact, fast, and efficientcompilers. But the main beneficiary of simplicity is the user of the language.In all spheres of human intellectual and practical activity, from carpentry togolf, from sculpture to space travel, the true craftsman is the one whothoroughly understands his tools. And this applies to programmers too. Aprogrammer who fully understands his language can tackle more complextasks, and complete them more quickly and more satisfactorily, than if hedid not. In fact, a programmer's need for an understanding of his languageis so great that it is almost impossible to persuade him to change to a newone. No matter what the deficiencies of his current language, he has learnedto live with them; he has learned how to mitigate their effects by disciplineand documentation, and even to take advantage of them in ways that wouldbe impossible in a new and cleaner language which avoids the deficiency.It therefore seems especially necessary in the design of a new programming language, intended to attract programmers away from their currenthigh-level language, to pursue the goal of simplicity to an extreme, so that aprogrammer can readily learn and remember all its features, can select thebest facility for each of his purposes, can fully understand the effects andconsequences of each decision, and can then concentrate the major part ofhis intellectual effort on understanding his problem and his programs ratherthan his tool.A high standard of simplicity is set by machine or assembly codeprogramming for a small computer. Such a machine has an extremelyuniform structure, for example a main store consisting of 2 m wordsnumbered consecutively from zero up, a few registers, and a simple synchronous standard interface for communication with and control ofperipheral equipment. There is a small range of instructions, each of whichhas a uniform format; and the effect of each instruction is simple, affectingat most one register and one location of store or one peripheral. Even moreimportant, this effect can be described and understood quite independentlyof every other instruction in the repertoire. And finally, the programmer hasan immediate feedback on the compactness and efficiency of his code.Enthusiasts for high-level languages are often surprised at the complexity ofthe problems that have been tackled with such simple tools.On larger modern computers, with complex instruction repertoires andeven more complex operating systems, it is especially desirable that ahigh-level language design should aim at the simplicity and clear modulardescription of the best hardware designs. But the only widely usedlanguages that approach this ideal are FORTRAN, LISP, and A L G O L 60,

HINTS ON PROGRAMMING-LANGUAGE DESIGN197and a few languages developed from them. I fear that most more modernprogramming languages are getting even more complicated; and it isparticularly irritating when their proponents claim that future hardwaredesigns should be oriented towards the implementation of this complexity.13.3DiscussionThe previous two sections have argued that the objective criteria for goodlanguage design may be summarized in five catch phrases: simplicity,security, fast translation, efficient object code, and readability. Howeverdesirable these may seem, many language designers have adopted alternative principles that belittle the importance of some or all of these criteria,perhaps those that their own languages have failed to achieve.13.3.1SimplicitySome language designers have replaced the objective of simplicity by that ofmodularity, by which they mean that a programmer who cannot understandthe whole of his language can get by with a limited understanding of onlypart of it. For programs that work as the programmer intended this may befeasible; but if his program does not work, and accidentally invokes somefeature of the language that he does not know, he will get into serioustrouble. If he is lucky the implementation will detect his mistake, but he willnot be able to understand the diagnostic message. Otherwise, he is evenmore helpless. If to the complexity of his language is added the complexityof its implementation, the complexity of its operating environment, andeven the complexity of institutional standards for the use of the language, itis not surprising that when faced with a complex programming task so manyprogrammers are overwhelmed.Another replacement of simplicity as an objective has been orthogonalityof design. An example of orthogonality is the provision of complexintegers, on the argument that we need reals and integers and complex reals,so why not complex integers? In the early days of hardware design, somevery ingenious but arbitrary features turned up in order codes as a result oforthogonal combinations of the function bits of an instruction, on thegrounds that some clever programmer would find a use for them; and someclever programmer always did. Hardware designers have now learned moresense; but language designers are clever programmers and have not.The principles of modularity, or orthogonality, insofar as they contributeto overall simplicity, are an excellent means to an end; but as a substitute for

198E S S A Y S IN C O M P U T I N GSCIENCEsimplicity they are very questionable. Since in practice they have proved tobe a technically more difficult achievement than simplicity, it is foolish toadopt them as primary objectives.13.3.2SecurityThe objective of security has also been widely ignored; it is believed insteadthat coding errors should be removed by the programmer with theassistance of a so-called checkout compiler. But this approach has severalpractical disadvantages. For example, the checkout compiler and thestandard compiler are often not equally reliable. Even if they are, it isimpossible to guarantee that they will give the same results, especially on asubtly incorrect program; and, when they do not, there is nothing to helpthe programmer find the mistake. For a large and complex program theextra inefficiency of the debugging runs may be serious; and even on smallprograms the cost of loading a large debugging system can be high. Youshould always pity the fate of the programmer whose task is so difficult thathis program will not fit into the computer together with your sophisticateddebugging package. Finally, it is absurd to make elaborate security checkson debugging runs, when no trust is put in the results, and then removethem in production runs, when an erroneous result could be expensive ordisastrous. What would we think of a sailing enthusiast who wears hislife-jacket when training on dry land but takes it off as soon as he goes tosea? Fortunately, with a secure language, the security is equally tight forproduction and for debugging.13.3.3Fast translationIn the early days of high-level languages it was openly stated that speed ofcompilation was of minor importance, because programs would be compiled only once and then executed many times. After a while it was realizedthat the reverse was often true, that a program would be compiledfrequently while it was being debugged; but instead of constructing a fasttranslator, language designers turned to independent compilation, whichpermits a programmer to avoid recompiling those parts of his program thathe has not changed since the last time. But this is a poor substitute for fastcompilation, and has many practical disadvantages. Often it encourages oreven forces a programmer to split a large program into modules that are toosmall to express properly the structure of his problem. It entails the use ofwide interfaces and cumbersome and expensive parameter lists at inappropriate places. And even worse, it prevents the compiler from adequately

H I N T S ON P R O G R A M M I N G - L A N G U A G EDESIGN199checking the validity of these interfaces. It requires additional file space tostore bulky intermediate code, in addition to source code, which must, ofcourse, never be thrown away. It discourages the programmer from makingchanges to his data structure or representation, since this would involve aheavy burden of recompilation. And finally the linkage editor is oftencumbersome to invoke and expensive to execute. And it is all so unnecessary, if the compiler for a good language can work faster than the linkageeditor anyway.If you want to make a fast compiler even faster, I can suggest threetechniques, which have all the benefits of independent compilation andnone of the disadvantages.(1) Prescan. The slowest part of a modern fast compiler is the lexicalscan, which inputs individual characters, assembles them into words ornumbers, identifies basic symbols, removes spaces, and separates thecomments. If the source text of the program can be stored in a compactform in which this character handling does not have to be repeated,compilation time may be halved, with the added advantage that the originalsource program may still be listed (with suitably elegant indentation); andso the amount of file storage is reduced by a factor considerably greaterthan two. A similar technique was used by the PACT I assembler for theIBM 701.(2) Precompile. This is a directive that can be given to the compilerafter submitting any initial segment of a large program. It causes thecompiler to make a complete dump of its workspace, including dictionaryand object code, in a specified user file. When the user wishes to add to hisprogram and run it, he directs the compiler to recover the dump andproceed. When his additions are adequately tested, a further precompileinstruction can be given. If the programmer needs to modify a precompiledprocedure, he can just redeclare it in the block containing his mainprogram, and normal ALGOL-like scope rules will do the rest. Anoccasional complete recompilation will consolidate the changes after theyhave been fully tested. The technique of precompilation is effective only onsingle-pass compilers; it was successfully incorporated in the ElliottALGOL programming system.(3) Dump. This is an instruction that can be called by the user programduring execution, and causes a complete binary dump of its code andworkspace into a named user file. The dump can be restored and restarted atthe instruction following t h e dump by an instruction to the operatingsystem. If all necessary data input and initialization is carried out before thedump, the time spent on this as well as recompilation time can be saved.This provides a simple and effective way of achieving the F O R T R A N effectof block data, and was successfully incorporated in the implementation ofElliott ALGOL.

200E S S A Y S IN C O M P U T I N GSCIENCEThe one remaining use of independent compilation is to link a high-levellanguage with machine code. But even here independent compilation is thewrong technique, involving all the inefficiency of procedure call and all thecomplexity of parameter access at just the point where it hurts most. A farbetter solution is to allow machine code instructions to be inserted in-linewithin a high-level language program, as was done in Elliott ALGOL; or,better, provide a macro facility for machine code, as in PL/360.Independent compilation is a solution to yesterday's problems; today ithas grown into a problem in its own right. The wise designer will prefer toavoid rather than solve such problems.13.3.4Efficient object codeThere is another argument, which is all too prevalent among enthusiasticlanguage designers, that efficiency of object code is no longer important;that the speed and capacity of computers is increasing and their price iscoming down; and that the programming-language designer might as welltake advantage of this. This is an argument that would be quite acceptable ifused to justify an efficiency loss of ten or twenty percent, or even thirty andforty percent. But all too frequently it is used to justify an efficiency loss ofa factor of two, or ten, or even more; and worse, the overhead is not only intime taken but in space occupied by the running program. In no otherengineering discipline would such avoidable overhead be tolerated, and itshould not be in programming-language design, for the following reasons:(1) The magnitude of the tasks we wish computers to perform is growingfaster than the cost-effectiveness of the hardware.(2) However cheap and fast a computer is, it will be cheaper and faster touse it more efficiently.(3) In the future we must hope that hardware designers will pay increasingattention to reliability rather than to speed and cost.(4) The speed, cost, and reliability of peripheral equipment are notimproving at the same rate as those of processors.(5) If anyone is to be allowed to introduce inefficiency it should be the userprogrammer, not the language designer. The user programmer can takeadvantage of this freedom to write better-structured and clearer programs, and should not have to expend extra effort to obscure thestructure and write less clear programs just to regain the efficiency thathas been so arrogantly pre-empted by the language designer.There is a widespread myth that a language designer can afford to ignoremachine efficiency, because it can be regained when required by the use of a

HINTS ON PROGRAMMING-LANGUAGE DESIGN201sophisticated optimizing compiler. This is false" there is nothing that thegood engineer can afford to ignore. The only language that has beenoptimized with general success is FORTRAN, which was very specificallydesigned for that very purpose. But even in FORTRAN, optimization hasgrave disadvantages:(1) An optimizing compiler is usually large, slow, unreliable, and late.(2) Even with a reliable compiler, there is no guarantee than an optimizedprogram will have the same results as a normally compiled one.(3) A small change to an optimized program may switch off optimizationwith an upredictable and unacceptable loss of efficiency.(4) The most subtle danger is that optimization tends to remove from theprogrammer his fundamental control over and responsibility for thequality of his programs.The solution to these problems is to produce a language for which a simplestraightforward 'non-pessimizing' compiler will produce straightforwardobject programs of acceptable compactness and efficiency: similar to thoseproduced by a resolutely non-clever (but also non-stupid) machine-codeprogrammer. Make sure that the language is sufficiently expressive to enablemost other optimizations to be made in the language itself; and finally,make the language so simple, clear, regular, and free from side-effects thata general machine-independent optimizer can simply translate an inefficientprogram into a more efficient one with guaranteed identical effects, andexpressed in the same source language. The fact that the user can inspect theresults of optimization in his own language mitigates many of the defectslisted above.13.3.5ReadabilityThe objective of readability by human beings has sometimes been denied infavour of readability by a machine; and sometimes even been deniedin favour of abbreviation of writing, achieved by a wealth of defaultconventions and implicit assumptions. It is of course possible for a compileror service program to expand the abbreviations, fill in the defaults, andmake explicit the assumptions. But, in practice, experience shows that it isvery unlikely that the output of a computer will ever be more readable thanits input, except in such trivial but important aspects as improved indentation. Since in principle programs should be read by others, or re-read bytheir authors, before being submitted to the computer, it would be wise forthe programming language designer to concentrate on the easier task ofdesigning a readable language to begin with.

202E S S A Y S IN C O M P U T I N G13.4SCIENCEComment conventionsIf the purpose of a programming language is to assist in the documentationof programs, the design of a superb comment convention is obviously ourmost important concern. In low-level programming, the greater part of thespace on each line is devoted to comment. A comment is always terminatedby an end-of-line, and starts either in a fixed column or with a specialsymbol allocated for this purpose:L D A X [THIS IS A C O M M E N TThe introduction of free format into high-level languages prevents the useof the former method; but it is surprising that few languages have adoptedthe latter. ALGOL 60 has two comment conventions. One is to enclose thetext of a comment between the basic word comment and a semicolon:comment this is a comment;This has several disadvantages over the low-level convention"(1) The basic word comment is too long. It occupies space that would bebetter occupied by the text of the comment, and is particularlydiscouraging to short comments.(2) The comment can appear only after a begin or a semicolon, although itwould sometimes be more relevant elsewhere.(3) If the semicolon at the end is accidentally omitted, the compiler willwithout warning ignore the next following statement.(4) One cannot put program text within a comment, since a comment mustnot contain a semicolon.The second comment convention of ALGOL 60 permits a commentbetween an end and the next following semicolon, end, or else. This hasproved most unfortunate, since omission of a semicolon has frequently ledto the compiler ignoring the next following statement:. end this is a mistake A [i] " x;The FORTRAN comment convention defines as comment the whole of aline containing a C in the first column.C THIS IS A C O M M E N TIts main disadvantages are that is does not permit comments on the sameline as the code to which they refer, and that it discourages the use of shortcomments. An unfortunate consequence is that a well-annotatedFORTRAN program occupies many pages, even though the greater part ofeach page is blank. This in itself makes the program unnecessarily difficultto read and understand.The comment convention of COBOL suffers from the same disadvan-

HINTSON P R O G R A M M I N G - L A N G U A G EDESIGN203tages as that of FORTRAN, since it insists that commentary should be in aseparate paragraph.More recently designed languages have introduced special bracketingsymbols (e.g./* and *[) to enclose comments, which can therefore be placedanywhere in the program text where they are relevant:]* THIS IS A C O M M E N T */But there still remains the awkward problem of omitting or mispunchingone of the comment brackets. In some languages, this will cause omission ofstatements between two comments; in others it may cause the whole of therest of the program to be ignored. Neither of these disasters are likely tooccur in low-level programs, where the end-of-line terminates a comment.13.5SyntaxAnother aspect of programming-language design that is often consideredtrivial or arbritrary is its syntax. But this is also a mistake: the designershould select and observe the best possible syntactic framework for hislanguage, for two important practical reasons"(1) In a modern fast compiler a significant time can be taken in theassembly of characters into meaningful symbols (identifiers, numbers,and basic words) and in checking the context-free structure of theprogram.(2) When a program contains a syntactic error it is important that thecompiler should be able to pinpoint the error accurately, to diagnose itscause, recover from it, and continue checking the rest of the program.Recall the first American space probe to Venus, reportedly lost becauseF O R T R A N cannot recognize a missing comma in a D O statement. InF O R T R A N the statement:D O 17 I 1 10looks to the compiler like an assignment to a (probably undeclared)variable D0171:D 0 1 7 I 110.In low-level programming, the use of fixed field format neatly solves bothproblems. The position and length of each meaningful symbol is known,and it can be copied and compared as a whole without even examining theindividual characters; and if one field contains an error it can be immediately pinpointed, and checking can be resumed at the very next field.Fortunately, free-format techniques have been discovered that solve theproblems nearly as neatly as fixed format. The use of a finite-state machine

204E S S A Y S IN C O M P U T I N GSCIENCEto define the assembly of characters into symbols, and one of the morerestrictive forms of context-free grammars (e.g. precedence or topdown orboth) to define the structure of a program: these must be recommended toevery language designer. It is certainly possible for a machine to analysemore complex grammars, but there is every indication that the humanprogrammer will find greater difficulty, particularly if an error is presentor even only suspected. If a compiler cannot diagnose the syntax of anindividualstatement until it reaches the end of the program, what hope hasa poor human?As an example of what happens when a language departs from thebest-known technology, that of context-free syntax, consider the case of thelabelled END. This is a convention whereby any identifier between an ENDand its semicolon automatically signals the end of the procedure with thatname, and of any enclosed program structure even if it has no END of itsown. At first sight this is a harmless notational convenience, which PeterLandin might call 'syntactic sugar'; but in practice the consequences aredisastrous. If the programmer accidentally omits an END anywhere in hisprogram, it will automatically and without warning be inserted just beforethe next following labelled END, which is very unlikely to be where it waswanted. Landin's phrase for this would be 'syntactic rat poison'. Wiseprogrammers have therefore learned to avoid the labelled END, which is agreat pity, since if the labelled END had been used merely to check thecorrectness of the nesting of statements it would have been very useful, andpermitted earlier and cleaner error recovery, as well as remaining within thedisciplines of context-free languages. Here is a classic example

programmer in the most difficult aspects of his art, namely program design, documenta- tion, and debugging. It discusses the objective criteria for evaluating a language design, C. A. R. Hoare, Hints on programming language design. In C. Bunyan (ed.) Computer Systems Reliability, State of the Art Report Vol. 20, pp. 505-34. Reprinted with .