This is a COBOL Copybook parser in Python featuring the following options:
- Parse the Copybook into a usable format to use in Python
- Clean up the Copybook by processing REDEFINES statements and remove unused definitions
- Denormalize the Copybook
- Write the cleaned Copybook in COBOL
- Strip prefixes of field names and ensure that the Copybook only contains unique names
- Can be used from the command-line or included in own Python projects
Because I couldn't find a COBOL Copybook parser that fitted all my needs I wrote my own. It doesn't support all functions found in the Copybook, just the ones that I met on my path: REDEFINES, INDEXED BY, OCCURS.
On a day to day basis I use it so that Informatica PowerCenter creates only 1 table of my COBOL data instead multiple.
The code uses the pic parser code from pyCOBOL.
This code is licensed under GPLv3.
The easiest way to install python-cobol is using pip:
$ pip install python-cobolBelow is an example Copybook file before and after being processed.
Before:
	00000 * Example COBOL Copybook file                                     AAAAAAAA
	00000  01  PAULUS-EXAMPLE-GROUP.                                        AAAAAAAA
	00000       05  PAULUS-ANOTHER-GROUP OCCURS 0003 TIMES.                 AAAAAAAA
	00000           10  PAULUS-FIELD-1 PIC X(3).                            AAAAAAAA
	00000           10  PAULUS-FIELD-2 REDEFINES PAULUS-FIELD-1 PIC 9(3).   AAAAAAAA
	00000           10  PAULUS-FIELD-3 OCCURS 0002 TIMES                    AAAAAAAA
	00000                           PIC S9(3)V99.                           AAAAAAAA
	00000       05  PAULUS-THIS-IS-ANOTHER-GROUP.                           AAAAAAAA
	00000           10  PAULUS-YES PIC X(5).                                AAAAAAAAAfter:
	         01  EXAMPLE-GROUP.                                                     
	           05  FIELD-2-1 PIC 9(3).                                              
	           05  FIELD-3-1-1 PIC S9(3)V99.                                        
	           05  FIELD-3-1-2 PIC S9(3)V99.                                        
	           05  FIELD-2-2 PIC 9(3).                                              
	           05  FIELD-3-2-1 PIC S9(3)V99.                                        
	           05  FIELD-3-2-2 PIC S9(3)V99.                                        
	           05  FIELD-2-3 PIC 9(3).                                              
	           05  FIELD-3-3-1 PIC S9(3)V99.                                        
	           05  FIELD-3-3-2 PIC S9(3)V99.                                        
	           05  THIS-IS-ANOTHER-GROUP.                                           
	             10  YES PIC X(5).                                                 You can use it in two ways: inside your own python code or as a stand-alone command-line utility.
Do a git clone from the repository and inside your brand new python-cobol folder run:
$ python cobol.py example.cblThis will process the redefines, denormalize the file, strip the prefixes and ensure all names are unique.
The utility allows for some command-line switches to disable some processing steps.
	$ python cobol.py --help
	usage: cobol.py [-h] [--skip-all-processing] [--skip-unique-names]
	                      [--skip-denormalize] [--skip-strip-prefix]
	                      filename
	Parse COBOL Copybooks
	positional arguments:
	  filename              The filename of the copybook.
	optional arguments:
	  -h, --help            show this help message and exit
	  --skip-all-processing
	                        Skips unique names, denormalization and .
	  --skip-unique-names   Skips making all names unique.
	  --skip-denormalize    Skips denormalizing the COBOL.
	  --skip-strip-prefix   Skips stripping the prefix from the names.	The parser can also be called from your Python code. All you need is a list of lines in COBOL Copybook format. See example.py how one would do it:
import cobol
with open("example.cbl",'r') as f:
    for row in cobol.process_cobol(f.readlines()):
    	print row['name']It is also possible to call one of the more specialized functions within cobol.py:
- 
clean_cobol(lines) Cleans the COBOL by converting the cobol informaton to single lines 
- 
parse_cobol(lines) Parses a list of COBOL field definitions into a list of dictionaries containing the parsed information. 
- 
denormalize_cobol(lines) Denormalizes parsed COBOL lines 
- 
clean_names(lines, ensure_unique_names=False, strip_prefix=False, make_database_safe=False) Cleans names of the fields in a list of parsed COBOL lines. Options to strip prefixes, enforce unique names and make the names database safe by converting dashes (-) to underscores (_) 
- 
print_cobol(lines) Prints parsed COBOL lines in the Copybook format