I can maybe help you, as 2 years ago I have accomplished exactly what you are doing now.
I had to design a MySQL Datawarehouse, including the ETL system, based exclusively on files from a RM COBOL ERP application running on Linux.
The application had more than 600 files, and it was still unclear how much of them would finally end up in the database. Most of the important files were indexed, on COMP fields to make it harder, and one of the obvious requirement was that all relationships between files and their indexed keys could be reproduced on the database. So I potentially needed every field of every file.
Giving the number of files, it was out of question to treat all the files, manually and one by one.
I saw only one pragmatic solution to my problem: applying automatic programming. Ie coding a program that would generate programs, from only one source: the cobol copybooks.
I had some restrictions (set by the client) on the technology that I was allowed to use. I finally ended up with a VB.NET application that take the COBOL copybooks in input, and :
- Generates COBOL programs that convert the data in something exploitable, by reading the original indexed files and writing the records in a sequential text file.
- Generates VBA modules with all the code needed to import those data files from MS Access into MySQL (including CREATE TABLE and Indexes)
At the beginning of the project, I ran into exactly the same issues than you now, notably those damn REDEFINES. I found the task of listing and coding all copybook possibilities, if not impossible, at least hazardous.
So I looked into another way, and found this :
CB2XML
COBOL copybook to XML converter: SourceForge
This saved me weeks of hard work on copybook parsing and interpreting.
It can parse COBOL copybooks to change them into an XML file describing perfectly all PICTURE with a lot of useful attributes, like length or type. It fully support COBOL'86 standards.
Example with an Invoice file ( Facture in french)
000001 FD FACTURE.
000006 01 REC-FACTURE.
000011 03 FS1 PIC X.
000016 03 FS2.
000021 05 FS2A PIC 9.
05 RFS2B PIC X(8).
000026 05 FS2B REDEFINES RFS2B PIC 9(8).
000031 03 FS3.
000036 05 FS3A PIC 9.
000041 05 FS3B PIC X(10).
000046 03 FS4.
000051 05 FS4A PIC 99.
000056 05 FS4B PIC 99.
000061 05 FS4C PIC 99.
000066 03 FS5 PIC X(5).
000071 03 FS6 PIC X(20).
000076 03 FS7 PIC 9.
000081 03 FS8 PIC S9(9)V99 COMP-3.
000086 03 FS9 PIC S9(9)V99 COMP-3.
000091 03 FS10 PIC 9.
000096 03 FS11 PIC S9(9)V99 COMP-3.
000101 03 FS12 PIC S9(9)V99 COMP-3.
000106 03 FS13 PIC S9(9)V99 COMP-3.
000111 03 FS14-15 OCCURS 10.
000116 05 FS14 PIC 9.
000121 05 FS15 PIC S9(9)V99 COMP-3.
000126 05 FS16 PIC S9(9)V99 COMP-3.
000131 03 FS17 OCCURS 10 PIC S9(9)V99 COMP-3.
000136 03 FS18 PIC 9(6).
000141 03 FS19 PIC 9.
000241 03 FILLER PIC X.
Turns into this :
<copybook filename="FD8.COP.CLEAN">
<item display-length="428" level="01" name="REC-FACTURE" position="1" storage-length="428">
<item display-length="1" level="03" name="FS1" picture="X" position="1" storage-length="1"/>
<item display-length="9" level="03" name="FS2" position="2" storage-length="9">
<item display-length="1" level="05" name="FS2A" numeric="true" picture="9" position="2" storage-length="1"/>
<item display-length="8" level="05" name="RFS2B" picture="X(8)" position="3" redefined="true" storage-length="8"/>
<item display-length="8" level="05" name="FS2B" numeric="true" picture="9(8)" position="3" redefines="RFS2B" storage-length="8"/>
</item>
<item display-length="11" level="03" name="FS3" position="11" storage-length="11">
<item display-length="1" level="05" name="FS3A" numeric="true" picture="9" position="11" storage-length="1"/>
<item display-length="10" level="05" name="FS3B" picture="X(10)" position="12" storage-length="10"/>
</item>
<item display-length="6" level="03" name="FS4" position="22" storage-length="6">
<item display-length="2" level="05" name="FS4A" numeric="true" picture="99" position="22" storage-length="2"/>
<item display-length="2" level="05" name="FS4B" numeric="true" picture="99" position="24" storage-length="2"/>
<item display-length="2" level="05" name="FS4C" numeric="true" picture="99" position="26" storage-length="2"/>
</item>
<item display-length="5" level="03" name="FS5" picture="X(5)" position="28" storage-length="5"/>
<item display-length="20" level="03" name="FS6" picture="X(20)" position="33" storage-length="20"/>
<item display-length="1" level="03" name="FS7" numeric="true" picture="9" position="53" storage-length="1"/>
<item display-length="11" level="03" name="FS8" numeric="true" picture="S9(9)V99" position="54" scale="2" signed="true" storage-length="6" usage="computational-3"/>
<item display-length="11" level="03" name="FS9" numeric="true" picture="S9(9)V99" position="60" scale="2" signed="true" storage-length="6" usage="computational-3"/>
<item display-length="1" level="03" name="FS10" numeric="true" picture="9" position="66" storage-length="1"/>
<item display-length="11" level="03" name="FS11" numeric="true" picture="S9(9)V99" position="67" scale="2" signed="true" storage-length="6" usage="computational-3"/>
<item display-length="11" level="03" name="FS12" numeric="true" picture="S9(9)V99" position="73" scale="2" signed="true" storage-length="6" usage="computational-3"/>
<item display-length="11" level="03" name="FS13" numeric="true" picture="S9(9)V99" position="79" scale="2" signed="true" storage-length="6" usage="computational-3"/>
<item display-length="13" level="03" name="FS14-15" occurs="10" position="85" storage-length="13">
<item display-length="1" level="05" name="FS14" numeric="true" picture="9" position="85" storage-length="1"/>
<item display-length="11" level="05" name="FS15" numeric="true" picture="S9(9)V99" position="86" scale="2" signed="true" storage-length="6" usage="computational-3"/>
<item display-length="11" level="05" name="FS16" numeric="true" picture="S9(9)V99" position="92" scale="2" signed="true" storage-length="6" usage="computational-3"/>
</item>
<item display-length="11" level="03" name="FS17" numeric="true" occurs="10" picture="S9(9)V99" position="215" scale="2" signed="true" storage-length="6" usage="computational-3"/>
<item display-length="6" level="03" name="FS18" numeric="true" picture="9(6)" position="275" storage-length="6"/>
<item display-length="1" level="03" name="FS19" numeric="true" picture="9" position="281" storage-length="1"/>
List of all XML attributes
I will be lazy here and just copy/paste my VB.NET code, there's a comment that explains clearly each attribute
For Each Attribute As Xml.XmlAttribute In itemNode.Attributes
Select Case Attribute.Name
Case "name" ' FIeld name
Case "level" ' PICTURE level
Case "numeric" ' True if numeric data type
Case "picture" ' COmplete PICTURE string
Case "storage-length" ' Variable storage lenght
Case "usage" ' If COMP field, give the original COMP type ("computational-x")
Case "signed" ' true if PIC S...
Case "scale" ' Give number of digits afeter decimal point
Case "redefined" ' true if the field is redifined afterwards
Case "redefines" ' If REDEFINES : give the name of the redefined field
Case "occurs" ' give the number of occurences if it's an ARRAY
Case "position" ' Give the line position in the original copybook
Case "display-length" ' Give the display size
Case "filename" ' Give the FD name
With the help of this XML structure I have achieved all the goals and beyond.
The generated COBOL programs that convert the indexed files (readable only with RM cobol runtime) into flat files deals with every field, ARRAYS and REDEFINES included.
- For REDEFINES: I create a field for both the "primary" PICTURE, and all its REDEFINES alterations, and their type matches their COBOL PICTURE
- For ARRAYs, I create a field for each element, and also a huge field containing the whole array "line"
- For COMPUTATIONAL fields, I just move the original COMP into the exact same DISPLAY PICTURE
Not all the fields have a purpose when they are in the database but at least everything is available all the time
With the invoice file above, the SEQUENTIAL text file copybook becomes this :
Auto generated COBOL
FILE SECTION.
* -----------------------------------------------------------
* INPUT FILE
COPY "FD8.COP" .
* -----------------------------------------------------------
* OUTPUT FILE
FD FACTURE-DWH.
01 REC-FACTURE-DWH.
03 FS1-DWH PIC X.
03 FS2-DWH PIC X(9).
03 FS2A-DWH PIC 9.
03 RFS2B-DWH PIC X(8).
03 FS2B-DWH PIC 9(8).
03 FS3-DWH PIC X(11).
03 FS3A-DWH PIC 9.
03 FS3B-DWH PIC X(10).
03 FS4-DWH PIC X(6).
03 FS4A-DWH PIC 99.
03 FS4B-DWH PIC 99.
03 FS4C-DWH PIC 99.
03 FS5-DWH PIC X(5).
03 FS6-DWH PIC X(20).
03 FS7-DWH PIC 9.
03 FS8-DWH