Wednesday, November 28, 2012


Copyright 2012 by Shawn H Corey. Some rights reserved.
Licence under CC BY-SA 3.0

In reviewing my two last post, I think I made a mistake. In creating a grammar, I overlooked the possibilities of the IDE. The integration of the IDE is suppose to make programming easier. In following a grammar, I have made it more like an editor. Although editors have been used for decades, this project is suppose to be a better way.

Sunday, November 25, 2012

Sunday, November 18, 2012

Grammar - Part 1

Time to look at some grammar. Here's is the top few levels of it. You should consider this to be a rough draft; it is by no means final.

Expressions in square brackets, "[" and "]", may appear zero or once. Expressions in curly braces, "{" and "}", may appear zero, once, or many times.


    module ::= module_declaration
               { uses_section }
               { configuration_declaration }
               [ local_declarations
                   { configuration_initiation }

A module is a name space which separates its contents from those of other modules.

Module Declaration

    module_declaration ::= "module" module_name

States the name of the module.

Uses Section

    uses_section ::= "use" module_name [ "from" file_spec ]
                     [ "import"
                         { subroutine_name }

Tells which modules this one uses. Imported subroutines can be used with just their names. If the module is not store in a file with the same name, a file spec can be added so it will be found.

Configuration Declaration

    configuration_declaration ::= type configuration_variable_name

This is used to declare configuration variables so they may be used in the subroutines. Use of them in a subroutine is read only. Their values are set in the initiation section.

Subroutine Definitions

   subroutine_definitions ::= subroutine_definition
                              { subroutine_definition }

At least one subroutine must appear in a module.

Subroutine Definition

   subroutine_definition ::= [ "private" ] "subroutine" subroutine_name
                             [ given_section ]
                             [ returns_section ]
                             [ exceptions_section ]
                             "return" [ returns_list ]

A private subroutine cannot be imported by another module. A block is a list of statements (to be determined in another article).

Given Section

    given_section ::= "given"
                      type parameter_variable_name [ "←" default_value ]
                      { type parameter_variable_name [ "←" default_value ] }

This section lists the parameters needed by the subroutine. Parameters may be listed by position or set by name. A default value may be defined to make the parameter optional.

Returns Section

    returns_section ::= "returns"
                        type return_variable_name [ "←" default_value ]
                        { type return_variable_name  [ "←" default_value ] }

List the items returned with the optional of setting their default values. Items may be listed by position or by name.

Exceptions Section

    exceptions_section ::= "except"
                           "when" exception_identifier
                           { "when" exception_identifier }

This lists what exceptions the calling code must handle. Exceptions that are not handled cause fatal errors. Exceptions will be detailed in a future article.

Returns List

    returns_list ::= return_variable_name { "," return_variable_name }

A repeat of the returned variables to remind the programmer what the results are.

Local Declarations

    local_declarations ::= "local"
                           type local_variable [ "←" default_value ]
                           { type local_variable [ "←" default_value ] }

Local variables are used to hold data for the configuration initiation. This is so complex calculations need not be repeated. They may be calculated once and stored in local variables.

Note that because local variables are defined after the subroutines, they may not appear in the subroutines.

Configuration Initiation

    configuration_initiation ::= "initialize" configuration_variable_name
                                 "set" "read" "only" configuration_variable_name

Configuration variable are used to adapt the module to its environment. Note that the last step is to set them to read only.

Saturday, November 17, 2012

A Slope Subroutine

Another subroutine; this time with exceptions.

    subroutine slope
            Point 1st
            Point 2nd
            Number slope
            when infinite slope
            when points too close to determine slope


        Boolean overflowed ← FALSE
        Number underflowed ← 0

        Number Δy ← 2nd'y - 1st'y
            when overflow
                overflowed ← TRUE
            when underflow
                increment underflowed

        Number Δx ← 2nd'x - 1st'x
            when overflow
                overflowed ← TRUE
            when underflow
                increment underflowed

        if underflowed = 2
            declare points too close to determine slope

        if overflowed
            Δy ← 2nd'y ÷ 2 - 1st'y ÷ 2
            Δx ← 2nd'x ÷ 2 - 1st'x ÷ 2

        slope ← Δy ÷ Δx
            when divide by zero
                declare infinite slope
            when overflow
                declare infinite slope

    return slope

Friday, November 16, 2012

Design Philosophy

The design philosophy I am trying to follow.

Design Philosophy Statement

Let the computer do what it's best at, which is keeping track of the details and presenting them in a meaningful fashion.

More Statements

  • If something is tedious, let the computer do it.

  • If something is complicated, let the computer simplify it.

  • Out of sight is out of mind. Everything that's relevant to a problem must be on screen at the same time. Anything not relevant should not be present.

  • Details should be handled by the language; application concepts, by the programmer.

  • Things that behave differently should look different.

Thursday, November 15, 2012


It's time to discuss exceptions.

The Problem with Exceptions

Exceptions in many languages have a problem: they violate encapsulation.

For example: suppose module A calls a subroutine in module B, which calls one in C and C throws exception X. Module B ignores X but A handles it. Now suppose we change C in module B to D, which throws exception Y. We now have to change A to handle Y instead of X. More than that, we have to go through every piece of code that uses B and makes sure that Y is handle instead of X.

But the problem is more involved than that. If another module, E, calls both B and C, then the handler for X needs to be divided into two: one for X form C and on for Y from B.

And to top things off, if A fails to catch Y, an error report is made and the apps dies. But the error just says an uncaught exception was created by D. It does not indicate where the correction needs to be made.

Changing C to D is not local; it violates encapsulation.

The Solution

The solution is simple: exceptions can only be thrown to their immediate caller.

For the example above: A calls B which calls C. C throws X but B must catch it. Then B reformats it and re-throws it as its own exception, Z. A now handles Z instead of X.

If we made the change in B from C to D, then B must catch Y and re-throw it as Z. All the changes are in B. No other changes are needed. This is encapsulation. (Everyone is happy.☺)

And if B fails to catch Y, an error report is made saying D threw Y but B failed to catch it. The programmer realizes that a change must be made to B for the app to be happy again. Error reporting is more comprehensive. (Everyone is happy once again.☺)

Wednesday, November 14, 2012

A Correction

I noticed a problem with the binary-tree walk subroutine, so I'm reposting both corrected versions.


Version 1:

    subroutine walk
            Node tree
            List values

        if not empty tree:
            push values, filter not empty, walk tree.left, tree.value, walk tree.right

    return values

Version 2:

    subroutine walk
            Node tree
            List values

        if not empty tree:
            List left  ← walk tree.left
            List right ← walk tree.right
            push values, filter not empty, left, tree.value, right

    return values

The filter function applies the Boolean subroutine to each item in a List and returns only those for which its results are TRUE.

Empty Variables

Some command and functions for dealing with empty variables.



    erase variable

Removes the contents of a variable and makes it empty.



    Boolean bool ← empty variable

Determines if a variable is empty and returns TRUE if it is, FALSE if it isn't. This is the opposite of filled.



    Boolean bool ← filled variable

Determines if a variable has content and returns TRUE if it has, FALSE if it doesn't. This is the opposite of empty.

Monday, November 12, 2012


Time to look at the tools to use with my language.


There have been many attempts to create a graphical programming language or, at least, a graphical IDE. I think these are doomed to failure. Here's why.

When I started programming, flowcharts were The Tool to use. Being graphical by nature, they should have been easy to use. But they weren't.

Flowcharts took a long time to create. Drawing boxes and arrows took time and if you made a mistake, corrections were painful. To make a correction, you had to erase the problem area and squeeze the new diagram into the space. Then came pseudo-code.

Pseudo-code was much easier to use. For one thing, you didn't have to draw an arrow to the next step; just place it on the next line. In fact, there were no arrows at all.

Mistake were easy to take care of. Just draw a line through them and rewrite the line. If the correction was too big for the paper, start over on a new one. (When using flowcharts, programmers were reluctant to start over because of the time already put into the flowchart.)

Writing via pseudo-code was much easier than drawing via flowcharts. And because of this, writing code has dominated programming.

But now, I think it's time for a change.

Point-and-Click IDE

Currently, text editors and IDEs have syntax highlighting. But they do not highlight the syntax, just the tokens. It's time to add more.

By incorporating the grammar of the language into the IDE, syntax errors are immediately detected. No more edit-compile-correct-errors cycles. Problems are shown as the happen and corrections can be made immediately.

And if the IDE knows the syntax, it can store the AST rather than the text. This allows many benefits:


Since the syntax is known, code can be created by point-and-click. The programmer points to the syntax he wants to add and the IDE places it at the cursor. The only things the programmer have to type are the identifiers: variables names, files, etc.

Style Becomes Preferences

The style of programming can be stored a preferences since none are stored in the AST. Things like the indentation of if's and loops can be whatever the programmer wants: 2 spaces, 4 spaces, 1cm, ½ inch. If a different programmer opens the file, his preferences are used. No more arguments over coding style; each programmer can have their own.

Reserved Word Can Be in Any Language

Since the token is stored in the AST, not the text, keywords can be programmer preferences. They can change a keyword, like subroutine to function and it makes no difference. Other programmers when they look at the code, still see subroutine. Or not, if they changed their own.

And if the words can be changed, then they language can be too. A program written in English can be read in Arabic or Japanese. It only depends on the programmer's preference.

Finally if the IDE stores the ABT, semantic errors are detected immediately. The semantics of a program are what links the variables to the statements via the symbol table. This means a programmer cannot use a variable that's out of scope or the wrong type. Only valid choices are given, so errors are reduced.

Logical errors are still left for the programmers to make. ☺

Sunday, November 11, 2012


Here is a list for the types I envision for my language.

Core Types

Core types are those that are integrated into the language. They do more than simply hold data; they have an influence on how the app runs.


Booleans are used in the if statement.


Lists are used in subroutines calls for their parameters and returned values.


Any is used to indicate that any type is acceptable. Since a core type can be use for it, it too must be a core type.


Subroutines are first-class citizen of the language. This means they can be passed around by other subroutines and stored in variables. Since they can also be executed, they are core types.

Common Types

These types are used for most programming requirements.


Text types hold a sequence of characters. They also have an encoding and a locale. They can be parsed by Rules.


The number type is for arithmetic. Numbers are stored as BCD or decimal-floating point. This is to prevent non-integer numbers from doing unexpected truncations.


Records are programmer-defined types. They allow the app to build complex, tree structures.

Note that they cannot build cyclic-structures or even DAGs. This may seems a sever limitation but XML can only build trees too. To build more complex structures, do what XML does: use indirect references.

Weird Types

These types were developed by the computer science community. As such, the novice programmer is unfamiliar with them. This could be a good thing and a bad thing. It's good because they have no expectations of them and no hindrance to learning. It's bad because they have to learn everything from scratch.


Strings hold a sequence of bytes. They can be parsed by formal regular expressions. They can be pack or unpack, that is, a subsequence of bytes can be convert to and from other types.

Streams: File, Folder, Pipe, Socket, Terminal

Streams are used for I/O. They often have an internal state; this is called being mutable. Because they are mutable, they cannot be used as subroutine arguments or return values.

Streams work with Text and Numbers, which covers the majority of all I/O. But sometimes, it needs to read and write Strings, raw sequences of bytes.


Rules are Perl-compatiable regular expressions (PCRE). They can be passed around by subroutines. They are used to parse Text. Although they are called regular expressions, they are not; they are much more expressive and powerful. Perhaps they should be called Irregular Expressions. ☺

Because they can also be executed, they are core types. But since mastering them is a separate endeavour, I placed them in this section.

Plug-in Types

Plug-in types can be used to extend the language. They replace operator overloading in other languages. But unlike operator overloading, they are not stuck with the syntax and semantics of the language; they can define they own.

Complex Numbers, Matrices

Formulas of these types use the common arithmetic syntax.

Integers, Date & Times, Money, Intervals, Units

These place restrictions on the standard types and may require special formatting and parsing when used with I/O.

Associated Lists

Since I'm a Perl Monger, I'm used to having associated lists, or hashes as they are known as in the Perl community. So, I couldn't create a language without them. ☺

Saturday, November 10, 2012


Auto-threading is when the compiler decides when to thread. I decided that my language would allow the compiler to automatically use light-weight threads where it can. Light-weight threads are those that do not limit their memory access. They can read and write to any part of the app's memory. It is up to the compiler to prevent any conflict by only thread when the threads will not access the same memory. To support this, the language itself has the following features.

Copy on write

To speed up subroutine calls, parameters are passed by reference but should the subroutine change them, a copy is made. To do this, a reference count is kept¹. If it is more than 1, a copy is made before the write occurs. This allows each thread to work with its own copy of memory and avoid collisions.

¹ The reference count is also used for garbage collection. If it reaches zero, the memory is reclaimed and reused.

No Globals

There are no global variables. Since languages like Java already do not allow globals, this restraint shouldn't be a great burden to programmers. Subroutines can only write to their own variables, which include their arguments.

Read-only Module Configuration Variables

When a module is loaded, it goes through an initiation phase. During this phase, the module can write to its module-scoped variables. This allows the module to adapt to the system it is in. But once the phase is over, the variables become read-only. They cannot be changed during the running of the app.

Complex Data Records: Binary Trees

In order to see how these restraints work, below is an example of a binary tree. Two subroutine are provided: insert a new value; and walk through the tree returning a list of its values. Both of them use the down-and-up technique of passing the thingy to modify down and then back up.

This is the record of a node in the binary tree:

    record Node:
        Any     value
        Node    left
        Node    right

You will notice that unlike other languages, there are no references (aka pointers). The language places a special marker in the variable to indicate the slot is empty, which can be tested for.


This is the subroutine to insert a value:

    subroutine insert
            Node tree
            Any  value
            Node tree

        if empty tree:
            tree.value ← value
        else if value < tree.value:
            tree.left ← insert tree.left, value
            tree.right ← insert tree.right, value

    return tree

The first part of the code declares the subroutine's interface. Note that the type Any is a special type that denotes any other type.

In the code, this line: tree.value ← value uses auto-vivification to create a new Node. All the elements of the new Node are created as empty. Then its value is set.

Finally, subroutines ends with a return statement. Although there must be a return at the end of every subroutine, additional returns may be present inside.


This subroutine walks through the tree and returns a list of values.

    subroutine walk
            Node tree
            List values

        if not empty tree:
            push values, walk tree.left
            push values, tree.value
            push values, walk tree.right

    return values

A fairly straight-forward subroutine. But it can be rewritten as:

    subroutine walk
            Node tree
            List values

        if not empty tree:
            List left  ← walk tree.left
            List right ← walk tree.right

            push values, left, tree.value, right

    return values

Written this way, the compiler can auto-thread to walk the left and right sides of the tree.

Friday, November 09, 2012

Modular Paradigm

Welcome to my blog on my design decisions for my programming language. I haven't decided a name for it yet since most of the good ones are taken. ☺

Decision: To Use the Modular Paradigm.

Now that OO has been in use for a while, some of its flaws have become apparent. Which is why I chose the modular paradigm over OO. Some of my reasons are:

Which classes to use is not easily apparent.

Unless the app is simulating a model, the choice of which classes to create is fuzzy. I suppose this is common among all apps; if we knew how to do it, it's because we did something similar before. That means we could reuse the code and most of the work is already done. But in OO, there seems to be a reluctance to throw out classes. I think this is because, once completed, the developer has a hard time forgetting the class.

With modular design, on the other hand, there is seldom the sense that the module is complete. There is less reluctance to tinker with it. It may not be correct but that's easily changed.

Inheritance does not encourage code reuse.

In most projects, class inheritance forms a wide and shallow tree. Code reuse through inheritance is not common. In fact, if a class is not inherited and has only the root class for its ancestors, then it is no more than a module.

Objects have states.

This is the biggest reason I chose modular design; objects have states. The attributes of each objects are state variables. The problem with them is that the developer has to remember its value and the more classes with attributes, the more likely he won't remember. And when a developer doesn't remember, bugs appear.

Advantages of Modular Design.

Modules are name spaces.

Subroutines in modules need only have a unique name within its module. Different modules can reuse the same name.

Modules can be reused.

Reusing code is a result of good library practises. Modules lend themselves to this better than OO.

Modules less like to have state variables.

Modules are thought of a set of subroutines. Unlike OO, they do not insist on storage within them. They are less likely to have state variables.