REBOL for COBOL programmers |
Fixed-format file
Date written: December 30, 2015 This page explains a REBOL way to do a COBOL-like thing, namely, read a fixed-format text file. The general planCOBOL was created in the days of punch cards and various kinds of "unit-record" data. Fields were at specific character positions in records, and were just strings of characters. What a string of characters actually represented was defined in the program at compile time. In other words, a value of $123.45 would be represented as 12345 at some known spot in a data record, and the fact that it was a decimal number was specified in the program as a PICTURE of 999V99, and a currency sign was supplied at printing time with the edit picture of $$$9.99. In REBOL, such a data field would be in its record just as you would read it, as $123.45, and the REBOL interpreter would know it was a currency amount because of its format. And also, the value would be in a known order in the record, in other words, it might be the fourth value, but its character position could vary. What is presented below is an idea for getting one's hands on data items in a fixed-format record. Basically, what one must do is copy out a string of characters for a given length at a given character position, but this trick provides a way to refer to those fields by name. What is NOT covered is any datatype conversion. In other words, the procedure below will get one the digits 12345 from a fixed-foramt record, and will assign a name to that string of digits, but will not provide any indication of what kind of data 12345 is, whether or not there is a decimal point, and so on. Sample dataThe code presented below works in the general case, but the whole script is a demo consisting of the general code plus some statements to execute it and show the results. That code to show the results assumes two small text files of test data, as shown: File rtest1.txt STEVEN 01-MAR-1951 MR. SMITH 02-APR-1952 File rtest2.txt WILLIAM 01-FEB-1953 MR. JOHNSON 02-MAY-1954 The specific planThe approach presented below takes advantage of some features of REBOL. On feature is that words in REBOL, that is, variables in COBOL terms, can be created at run time. That is like a COBOL program defining WORKING-STORAGE variables at run time, which can't be done. Another feature is the "object" feature. You can encapsulate variables, data, and executable code into an object, and then you can make instances of that object. What we will do is make an object for a fixed-format file. The object will contain all we need to read a file and assign variable names to the various fields in a record. Then, we can make instances of that object. We want to use an object for a fixed-format file because then we could have one program that reads any number of such files, and have the code for a fixed-format file exist only once. Without the "object" concept, if we wanted to write a program that used two files, we would have to have two copies of the code, and then use "find and replace" on the various variable names so those two copies of code could exist in the same program. The programShown below is the code for the fixed-format-file object, inside a demo program that invokes that object and prints some test results. Following the program code is a list of the test results. Here are some notes on how the object works. The object is called FFF for Fixed-Format File. In the test code, you will see that we make an object called FILE1 cloned from FFF, and a second object called FILE2 cloned from FFF. That way, we can refer to FILE1/data-name or FILE1/procedure-name. To organize the code, and to follow a COBOL-like way of doing things, we have a procedure to "open" a fixed-format file. What does that do? We supply to that procedure two things. One is the name of the file, which we use when we read the entire file into memory. The other is a block consisting of the field names we expect to find, and, for each field name, the location of it in the form of a "pair" showing the starting position and the length. The test code shows an example. The field names are recognized by REBOL as "words" which can be assigned values, which we will do when we read a record. In the "open" procedure, we store those words and positions in a block for future reference. Another COBOL-like procedure is one to "read" a "record." What that actually means is to get the next text line from the entire file of text lines that was brought into memory by the "open" procedure, and then to set values to all the field names supplied to the "open" procedure. An end-of-file condition is detected by noting, at "open" time, the number of lines in the file, and then counting records as we "read" them, and setting the EOF flag when we try to read more than we have. The procedure for setting field names to values is the part that shows the power of REBOL. First, obviously, is to check if we are trying to read more lines than we have, and set the EOF flag if necessary. Otherwise, pick off the next line and pull out data. The pulling of the data is done in this handful of lines: RECORD: make object! [] foreach [FIELDNAME POSITION] FIELDS [ RECORD-AREA: head RECORD-AREA RECORD-AREA: skip RECORD-AREA (POSITION/x - 1) RECORD: make RECORD compose [(to-set-word FIELDNAME) copy/part RECORD-AREA POSITION/y] ] The first step is to make a blank object called RECORD. This object will hold the field names and values, and we will refer to FILE1/RECORD/data-name when we want to reference a data value. Note the importance of this if you did not already. The FFF object is general enough that it will work for any such file with any number of fields, and you specify names and positions for the fields when you "open" the file. That is the beauty of REBOL. The next step is to loop through the list of field names plus locations, and for each, set the field name word to the value, with the value being extracted as substring from the line of text. To get the value, we position the RECORD-AREA word to the head of the record area, and then skip forward to the location of the data. This is done with the "skip" function, but note that we are skipping, so this is a zero-relative operation. In other words, if the field value starts at position 1, we don't want to skip 1 or that would put us at position 2. We want to skip to to the position MINUS 1. Then when we are at that posiiton, we will copy off a number of characters indicated the the field length, which is the second number in the "pair" that defines the location of the data. The setting of the field name to a value is one of those single lines of REBOL code that can be vexing to a beginner. Before the loop, we made a blank object called RECORD. Inside the loop, for each field, we make an object called RECORD which is an instance of RECORD. In other words, on the first pass it will be an empty object. When you make an object that is cloned from another object, and specify a block after the "make" function, the result is a new object that is the object from which you are cloning PLUS the block you specified after "make." So on the first pass, you clone a blank object and add to it the value of the first field, on the second pass you clone that object with the first field and add to it the second field, and so on to the end of the field name list. Now, what exactly are you adding to the RECORD object on each pass? The "compose" function takes a block of stuff, evaluates it, and returns a block with the stuff you evaluated. The "to-set-word" function takes a word and returns a "set-word" which is a word followed by a colon. So that block after the "compose" function will contain a field name plus a colon, followed by a value extracted out of the data. In other words, that RECORD object will contain repetitions of "field-name: value" and the RECORD object becomes data inside the instance of the FFF object. That is the cool feature of REBOL compared to COBOL. Code is data and REBOL can write its own code (or data) at run time. Here is a listing of the Fixed-Format File object inside a test program to show how it works. REBOL [ Title: "Fixed-Format File object" ] FFF: make object! [ FILE-ID: none ;; file name passed to "open" function FIELDS: none ;; [fieldname locationpair fieldname locationpair, etc] FILE-DATA: [] ;; whole file in memory, as block of lines RECORD-AREA: "" ;; one line from FILE-DATA, for picking apart RECORD: none ;; an object we will create to make new words available RECORD-NUMBER: 0 ;; for keeping track of which line we picked FILE-SIZE: 0 ;; number of lines in FILE-DATA EOF: false ;; set when we "pick" past end ;; -------------------------- OPEN-INPUT: func [ FILEID [file!] ;; will be a file name FIELDLIST [block!] ;; will be sets of word! and pair! ] [ ;; -- Save what was passed to us. FILE-ID: FILEID FIELDS: copy [] FIELDS: copy FIELDLIST ;; -- Read the entire file into memory and set various items in preparation ;; -- for reading the file a record at a time. FILE-DATA: copy [] FILE-DATA: read/lines FILE-ID FILE-SIZE: length? FILE-DATA RECORD-NUMBER: 0 EOF: false ] ;; -------------------------- READ-RECORD: does [ ;; pick a line if there are lines left to be picked RECORD-NUMBER: RECORD-NUMBER + 1 if (RECORD-NUMBER > FILE-SIZE) [ EOF: true return EOF ] RECORD-AREA: copy "" RECORD-AREA: copy pick FILE-DATA RECORD-NUMBER ;; Set the words passed to the "open" function to values extracted ;; out of the data, based on the locations passed to the "open" function. ;; Put those words and values in the RECORD object. RECORD: make object! [] foreach [FIELDNAME POSITION] FIELDS [ RECORD-AREA: head RECORD-AREA RECORD-AREA: skip RECORD-AREA (POSITION/x - 1) RECORD: make RECORD compose [(to-set-word FIELDNAME) copy/part RECORD-AREA POSITION/y] ] ] ] ;; Now test the object by making two instances of it. FILE1: make FFF [] FILE1/OPEN-INPUT %rtest1.txt [NAME 1X20 DOB 21X11] FILE1/READ-RECORD print [FILE1/RECORD/NAME " " FILE1/RECORD/DOB] FILE1/READ-RECORD print [FILE1/RECORD/NAME " " FILE1/RECORD/DOB] FILE1/READ-RECORD probe FILE1/EOF probe FILE1/RECORD print "-------------------------" FILE2: make FFF [] FILE2/OPEN-INPUT %rtest2.txt [NAME 1X20 DOB 21X11] FILE2/READ-RECORD print [FILE2/RECORD/NAME " " FILE2/RECORD/DOB] FILE2/READ-RECORD print [FILE2/RECORD/NAME " " FILE2/RECORD/DOB] FILE2/READ-RECORD probe FILE2/EOF probe FILE2/RECORD print "-------------------------" halt Here is the result of running the above program. STEVEN 01-MAR-1951 MR. SMITH 02-APR-1952 true make object! [ NAME: "MR. SMITH " DOB: "02-APR-1952" ] ------------------------- WILLIAM 01-FEB-1953 MR. JOHNSON 02-MAY-1954 true make object! [ NAME: "MR. JOHNSON " DOB: "02-MAY-1954" ] ------------------------- >> |