REBOL for COBOL programmers |
November 26, 2018
In programming, the term "refactoring" refers to re-working code to make
it better without changing what it does. That could mean making it
better-looking, more compact, easier to maintain, and so on.
This document is an exercise in taking some code that I wrote and
refactoring it into a more REBOL-like form. At a few times in the past,
things I wrote made it into public view, and various REBOL experts were
kind enough to point out where I could have done better. This article
captures that help in case it could be of use to others.
An important concept in programming is the idea of egoless programming.
Don't be offended if someone shows you a better way. Learn from it and
adapt.
The target audience would be a beginner who has used the "crutch" of formatting REBOL like other programming languages in order to get some useful work done, and now is ready to move toward a programming style that is more like the REBOL as she should be spoken.
Note by Carl about using REBOL like other languages (paragraph 5)
The following examples are going to be REBOL code that starts out as looking a bit like other languages (COBOL, Python come to mind) and is transformed to looking a bit more REBOL-like. This assumes that the REBOL code you might find on the internet, written, perhaps, by the inventor of REBOL, has been written the "right" way.
A question you might ponder as you read the examples, is, what makes something more or less readable. Should code have lots of comments, some comments, or no comments. Should comments be gathered in one area or is it more helpful to have them distributed throughout the code. Is it easier to read code that is spread out on many lines (provided they match the style guide), or is fewer lines of tighter code actually easier to read.
Apologies to those familiar with the concept, but in some computer languages, before the program can be run it must be "compiled" which means that a language translator transforms it into the actual instructions run by the computer hardware. What that means is that references to things in memory (values, instructions) are set up so that they are ready to go when the program runs. In an "interpreted" language like REBOL, when the program runs, references to values have to be looked up as the program runs. In other words, if the program adds BILL-AMOUNT and SALES-TAX in a compiled program, the memory locations of BILL-AMOUNT and SALES-TAX are figured out in advance, so when the program run, the calculation could be as fast as one hardware instruction (depending on the hardware). But, in an interpreted language, the program would have to look up BILL-AMOUNT and SALES-TAX in a table of words, locate the values for each, and then add the values. That's a lot of extra work. Normally one does not notice because computers are so fast, but in chewing through lots of data, lots of little things add up.
What that means for coding is that if you can reduce the code it actually can make a difference. Longer words take more time to parse. Every time a word appears in a program it has to be looked up in some sort of table. The REBOL syntax allows for streamlining the code.
This example was taken from a document on working with fixed-format and CSV files. If you have read that, this was copied straight out.
The problem is, given a string, add blanks to the right end to pad it out to a specified lengh. As a first pass, let's think through how we would solve this. First, let's decide to make the solution into a function that we could use over and over again. We will give the function a string and a length, and it will return that same string, padded on the right with enough blanks to make it as long as the specified length. If the specified length is less than the length of the input string, we will chop off the end because the final result we want is a string of the specified length.
What, in general, should we do? Probably, find out how long the input string is, subtract that length from the specified length, and that will tell us how many blanks to add. If the string already is longer than the specified length, then just chop off and return that many characrers.
So what would we need for coding? We will trim the input of leading and trailing blanks on the theory that removing the leading blanks is what we usually will want to do. Maybe we should have a "variable" for that. We wil have to find the length of the input, calculate the number of blanks to add, and then assemble the padded output for return to the caller. So we might need "variables" for the results of those actions. Here is a function that should do the job.
REBOL [ title: "SPACEFILL function" ] SPACEFILL: func [ "Left justify a string, pad with spaces to specified length" INPUT-STRING FINAL-LENGTH /local TRIMMED-STRING LENGTH-OF-TRIMMED-STRING NUMBER-OF-SPACES-TO-ADD FINAL-PADDED-STRING ] [ TRIMMED-STRING: copy "" TRIMMED-STRING: trim INPUT-STRING LENGTH-OF-TRIMMED-STRING: length? TRIMMED-STRING either (LENGTH-OF-TRIMMED-STRING < FINAL-LENGTH) [ NUMBER-OF-SPACES-TO-ADD: (FINAL-LENGTH - LENGTH-OF-TRIMMED-STRING) FINAL-PADDED-STRING: copy TRIMMED-STRING loop NUMBER-OF-SPACES-TO-ADD [ append FINAL-PADDED-STRING " " ] ] [ FINAL-PADDED-STRING: COPY "" FINAL-PADDED-STRING: copy/part TRIMMED-STRING FINAL-LENGTH ] ] ;; Uncomment to test ;print rejoin [{'} SPACEFILL " ABCD1234 " 10 {'}] ;haltIf you program in COBOL, this looks good. But for an interpreted language there are some considerations. Any time you have a word that is not needed, resources are required to parse the word, look it up in a table, allocate space in a table if it is a new word, and who knows what else. So if we can shorten it up, it will run faster. The REBOL syntax, with its way of having one function pass its results up the chain to other functions, allows for considerable tightening. Here is a more compact version.
REBOL [ title: "SPACEFILL function, improved" ] SPACEFILL: func [ "Left justify a string, pad with spaces to specified length" INPUT-STRING FINAL-LENGTH ] [ head insert/dup tail copy/part trim INPUT-STRING FINAL-LENGTH #" " max 0 FINAL-LENGTH - length? INPUT-STRING ] ;; Uncomment to test ;print rejoin ["'" SPACEFILL " ABCD1234 " 10 "'"] ;haltFirst, let's be sure we understand it.
REBOL functions are evaluated from left to right, which means we have to work our way into the innermost function first because that produces the results passed to the functions to the left.
The innermost function is "trim" which takes the spaces off both ends of the INPUT-STRING.
The next function is the "copy/part" which makes a copy of the INPUT-STRING but for only as many characters as specified in the FINAL-LENGTH. The reason for this is that the caller might have asked for a final length less than the actual length of the data being padded. That makes no sense, but it must be accounted for. If the caller asked for a final length greater that the INPUT-STRING, as would be normal, the "copy/part" will copy only as many characters as there actually are in the trimmed INPUT-STRING.
The next function is "tail" which positions us to the end of the trimmed and copied string.
The next function is the "insert/dup" function which adds the "space" character (#" ") to the tail of the copied string, for a specified number of times. And what is that specified number? It is the maximum of zero (in case we don't have to add any) or however many more spaces we need to reach the desired length. And how many is that? It is the FINAL-LENGTH minus the number of characters we already have, which is the current length of the INPUT-STRING.
And finally, to make sure we return to the caller the padded version of INPUT-STRING, we position ourselves to the head of INPUT-STRING.
Now let's note the improvements.
There are no local variables, compared to our previous version. We trim the INPUT-STRING, but we don't have to store it in a tempoary variable because we can just pass it up the line of function calls. Similarly, the LENGTH-OF-TRIMMED-STRING and NUMBER-OF-SPACES-TO-ADD are calculated oh the fly and don't need temporary variables. And the FINAL-PADDED-STRING is not necessary because we just pad the INPUT-STRING and pass that back to the caller.
And finally, to go the last step in REBOL-izing the original SPACEFILL function, we will shorten up some of our variables and condense the code a bit to get:
REBOL [ title: "SPACEFILL function, improved" ] SPACEFILL: func [txt len] [head insert/dup tail copy/part trim txt len #" " max 0 len - length? txt] ;; Uncomment to test ;print rejoin ["'" SPACEFILL " ABCD1234 " 10 "'"] ;haltAnd so we come to the question, which version is "better"? REBOL was created for small programs, so maybe if you understand the wordy one better, and it does the required job, the wordy one is "better" because you can understand it better. The correct answer probably is that the final version is better because it uses REBOL as it was designed to be used. Maybe the longer version is the programming equivalent of pounding a nail into a board with the handle of your electric drill.
This is just a code snippet from a larger example, showing how things can be streamlined, without significant loss of clarity, by skillful use of the REBOL functions.
In this example, the SHOW-PICKED function is called when an item is picked from a text-list called TLIST, and the program feeds back the selected value in the info field called TPICKED.
SHOW-PICKED: does [ either TLIST/picked [ set-face TPICKED TLIST/picked ] [ set-face TPICKED "Nothing" ] ]This is a nice straightforward COBOL-like statement. If something was picked, feed it back, otherwise feed back "Nothing." (Not relevant at this time is that it appears that if nothing is picked, the result is not none, and so this code does not actually work.)
The first optimization/refactoring can be to take out one of the calls to set-face. The "either" function will return the result of evaluating one of the two following blocks. In this case, that will be the result of TLIST/picked, or "Nothing." The result of "either" can be passed back to set-face.
SHOW-PICKED: does [ set-face TPICKED either TLIST/picked [TLIST/picked] ["Nothing"] ]The "either" function means "this OR that" and there is another function that is a shortcut for that concept, the "any" function. The "any" function takes a block of stuff, and returns the first item in the block that is not false or none.
SHOW-PICKED: does [ set-face TPICKED any [TLIST/picked "Nothing"] ]In this case, the two items in the block are 1) the result of TLIST/picked, or 2) the string "Nothing." Theoretically (see note above) if TLIST/picked is none, it will not be returned, but the string "Nothing" is NOT none, and so would be returned if the first item was not.
REBOL was designed for this dense style of coding. If it is hard to read initially, you have my permission (for what it's worth) to back off from full REBOL-ness in order to get some work done. What you might find after a while is that you begin to drift toward REBOL-ness as you become handier with it. During this evolution, you should continue to look at scripts by the REBOL experts. REBOL, even used like COBOL, still is a bit of a power tool. There is even more power to be unleashed if you can embrace its style.