The GDSII Stream Format
See also: stream_utils
Jim Buchanan 6/11/96 --------------------------------------------------------------------------- This file is for use by people having experience with GDSII format stream files and the CAD systems that read/write them. It won't make a lot of sense without that background. Since knowing the library structure without knowing about records and data types would be of marginal use, and knowing about records and data types without knowing about the library structure would be worse, you might have to scan through this s few times before it makes sense. Beyond that, let me say that the stream format is quite simple. I suspect that the people at Calma put a lot of thought into creating a file that would be as easy to read in and parse as possible. I suspect that they did this due to the modest computers that they had to work with. Their results were impressive here and with the GDSII system as a whole. A bit dated now, but in its day... Speaking of dated, Stream Format allows records to be written out to multiple reels of tape. Handy on those old 9 track drives. When a file was written to a tape, it was written in 2048 byte physical blocks. The file was padded with NULL characters so that it was always a multiple of 2048 bytes. I've noticed that some (OK, many) stream files that were originally written to disk using more modern software also pad the file to a multiple of 2048 bytes using NULL characters. --------------------------------------------------------------------------- A stream file consists of records. These records are built up of 16 bit words. This means that all stream files should have an even number of bytes. The first two words, or four bytes, are called the "Record Header" A record can be as small as 4 bytes long. The GDSII Stream format manual says that a record may be infinitely long, but frankly, I don't see how it can get over 65535 bytes long, since the first two bytes of the record header are an unsigned integer that defines the length of the record. The leftmost bit of the first byte is valued at 32768, the rightmost bit of the second byte is valued at 1. The third byte is the record type. This will tell what part of the library this record describes. There are values for such things as the beginning of a structure, the beginning of a boundary, the end of a structure, and so on. There is a section below with hexadecimal values of the various record types and a brief description of the types. The fourth, and last, byte in the record header is the data type. This, along with the record length, tells the parser what to expect in the rest of the record. There may be more than one piece of data, but the rest of the record will be of this type. You can tell how many pieces of data are in the record by knowing the number of bytes in the record and the size of the data type. Actually, the data type seems redundant, since each record type has only one valid data type. Perhaps the Calma people were thinking of future needs? They sure did that with the layer numbers and data types. Calma allowed only 64 layers and data types, but the Stream Format has room for 65535 of each. --------------------------------------------------------------------------- There are seven data types listed in the GDSII Stream Format Manual v6.0, one is listed as not being used at the time. I doubt anyone has had the chutzpah to start using it since. The first is "No data present". The code is 0x00. This means that the entire record is 4 bytes long. An example of an element with no data would be ENDLIB which marks the end of a library. The second is called a "Bit array". The code is 0x01. It's simply two bytes. The meaning of each bit depends on the record type that the bit array is found in. The third data type is a "Two-Byte Signed Integer". The code is 0x02. It is an integer between -32768 and 32767. It is stored in twos complement format, with the most significant byte first. Some examples from the book: 0x0000 = 1 0x0020 = 2 0x0089 = 137 0xffff = -1 0xfffe = -2 0xff77 = -137 The fourth data type is a "Four-Byte Signed Integer". The code is 0x03. Same basic thing as a two byte integer, but with four bytes. The fifth data type is the "Four-Byte Real". The code is 0x04. This is the one that seems to have never been used, so I'll describe the eight byte real in a bit more detail. Basically though, the first bit is the sign (1 = negative), the next 7 bits are the exponent, you have to subtract 64 from this number to get the real value. The next three bytes are the mantissa, divide by 2^24 to get the denominator. value = (mantissa/(2^24)) * (16^(exponent-64)) In the above, we use the actual values of the fields in the stream file for the mantissa and exponent. The sixth data type is the "Eight Byte Real". The code is 0x05. This one gets a little more use. The first (most significant) bit of the first byte is the sign, one means negative, 0 means positive. The 7 least significant bits of the first byte are the exponent in "excess 64" notation. You must subtract 64 to get the true value. I'll show the subtraction in the formula below. The remaining 7 bytes are the mantissa, with a binary point to the left of the most significant figure. The formula below uses the unsigned integer value of these 7 bytes as the numerator of a fraction. value = (mantissa/(2^56)) * (16^(exponent-64)) The seventh and final data type is the "ASCII String". The code is 0x06. The length of this string is always equal to the length of the record minus the four bytes used for the record header. If this number is not even, a NULL character (0x00) is added to the end. This is another artifact of the 16 bit words that the stream file format assumes. --------------------------------------------------------------------------- This is the format of a stream file. The records shown within square brackets '[]' are optional. The or bar '|' indicates one or the other. The structure '{}+' is used to indicate one or more instances. The angle brackets '<>' indicate that further definition is below. Sort of an "include" element. The actual stream file: HEADER BGNLIB [LIBDIRSIZE] [SRFNAME] [LIBSECUR] LIBNAME [REFLIBS] [FONTS] [ATTRTABLE] [GENERATIONS] [FORMAT | FORMAT {MASK}+ ENDMASKS] UNITS [{BGNSTR STRNAME [STRCLASS] [{<element>}+] ENDSTR}+] ENDLIB An element portion of a stream file: <boundary> | <path> | <sref> | <aref> | <text> | <node> | <box> [{PROPATTR PROPVALUE}+] ENDEL Boundary portion of an element: BOUNDARY [ELFLAGS] [PLEX] LAYER DATATYPE XY Path portion of an element: PATH [ELFLAGS] [PLEX] LAYER DATATYPE [PATHTYPE] [WIDTH] [BGNEXTN] [ENDEXTN] XY SREF portion of an element: SREF [ELFLAGS] [PLEX] SNAME [STRANS [MAG] [ANGLE]] XY AREF portion of an element: AREF [ELFLAGS] [PLEX] SNAME [STRANS [MAG] [ANGLE]] COLROW XY Text portion of an element: TEXT [ELFLAGS] [PLEX] LAYER TEXTTYPE [PRESENTATION] [PATHTYPE] [WIDTH] [STRANS [MAG] [ANGLE]] XY STRING Node portion of an element: NODE [ELFLAGS] [PLEX] LAYER NODETYPE XY Box portion of an element: BOX [ELFLAGS] [PLEX] LAYER BOXTYPE XY --------------------------------------------------------------------------- A stream file may be broken up over multiple reels of tape. I haven't seen this since the days of 9 track tapes on reels, but just in case, here we go... Tape 1: HEADER several complete stream records TAPENUM TAPECODE LIBNAME Intermediate tape(s): TAPENUM TAPECODE LIBNAME more complete stream records TAPENUM last tape: TAPENUM TAPECODE LIBNAME more complete stream records ENDLIB A concatenation of all of the tapes, without the tape id stuff (and I presume w/o the extra LIBNAMEs) should be a valid stream file as described above. --------------------------------------------------------------------------- OK, here's the part you've been waiting for, what the records mean... Record type Data type 0x00 HEADER 0x02 INTEGER_2 Start of stream, contains version number of stream file. < v3.0 0x0000 0 v3.0 0x0003 3 v4.0 0x0004 4 v5.0 0x0005 5 v6.0 0x0258 600 0x01 BGNLIB 0x02 INTEGER_2 Beginning of library, plus mod and access dates. Modification: year, month, day, hour, minute, second Last access: year, month, day, hour, minute, second 0x02 LIBNAME 0x06 STRING The name of the library, supposedly following Calma DOS conventions. Using later tools, such as ISS LTL-100, it seems more flexible than that, but it won't allow any old thing you want. If memory serves, Calma DOS allowed 6 characters in a file name, with a 2 character extension. 0x03 UNITS 0x05 REAL_8 Size of db unit in user units, size of db unit in meters. To calculate the size of a user unit in meters, divide the second number by the first. 0x04 ENDLIB 0x00 NO_DATA End of the library. 0x05 BGNSTR 0x02 INTEGER_2 Begin structure, plus create and mod dates in the same format as the BGNLIB record. 0x06 STRNAME 0x06 STRING Name of a structure. Up to 32 characters in GDSII, A-Z, a-z, 0-9, _, ?, and $ are all legal characters. 0x07 ENDSTR 0x00 NO_DATA End of a structure. 0x08 BOUNDARY 0x00 NO_DATA The beginning of a BOUNDARY element. 0x09 PATH 0x00 NO_DATA The beginning of a PATH element. 0x0a SREF 0x00 NO_DATA The beginning of an SREF element. 0x0b AREF 0x00 NO_DATA The beginning of an AREF element. 0x0c TEXT 0x00 NO_DATA The beginning of a TEXT element. 0x0d LAYER 0x02 INTEGER_2 Layer specification. On GDSII this could be 0 to 63, LTL allows 0 to 255. Of course a 3 byte integer allows up to 65535... 0x0e DATATYPE 0x02 INTEGER_2 Datatype specification. On GDSII this could be 0 to 63, LTL allows 0 to 255. Of course a 3 byte integer allows up to 65535... 0x0f WIDTH 0x03 INTEGER_4 Width specification, negative means absolute In data base units. 0x10 XY 0x03 INTEGER_4 An array of XY coordinates. An array of coordinates in data base units. Path: 2 to 200 pairs in GDSII Boundary: 4 to 200 pairs in GDSII Text: Exactly 1 pair SREF: Exactly 1 pair AREF: Exactly 3 pairs 1: Array reference point 2: column_space*columns+reference_x 3: row_space*rows+reference_y Node: 1 to 50 pairs in GDSII Box: Exactly 5 pairs 0x11 ENDEL 0x00 NO_DATA The end of an element. 0x12 SNAME 0x06 STRING The name of a referenced structure. 0x13 COLROW 0x02 INTEGER_2 Columns and rows for an AREF. Two 2 byte integers. The first is the number of columns. The second is the number of rows. In an AREF of course. Neither may exceed 32767 0x14 TEXTNODE 0x00 NO_DATA "Not currently used" per GDSII Stream Format Manual, v6.0. Would be the beginning of a TEXTNODE element if it were. 0x15 NODE 0x00 NO_DATA The beginning of a NODE element. 0x16 TEXTTYPE 0x02 INTEGER_2 Texttype specification. On GDSII this could be 0 to 63, LTL allows 0 to 255. Of course a 3 byte integer allows up to 65535... 0x17 PRESENTATION 0x01 BIT_ARRAY Text origin and font specification. bits 15 to 0, l to r bits 0 and 1: 00 left, 01 center, 10 right bits 2 and 3: 00 top 01, middle, 10 bottom bits 4 and 5: 00 font 0, 01 font 1, 10 font 2, 11 font 3, 0x18 SPACING UNKNOWN "Discontinued" per GDSII Stream Format Manual, v6.0. 0x19 STRING 0x06 STRING Character string. Up to 512 char in GDSII 0x1a STRANS 0x01 BIT_ARRAY Bits 15 to 0, l to r 15=refl, 2=absmag, 1=absangle, others reserved for future use. 0x1b MAG 0x05 REAL_8 Magnification, 1 is the default if omitted. 0x1c ANGLE 0x05 REAL_8 Angular rotation factor in ccw direction. If omitted, the default is 0. 0x1d UINTEGER UNKNOWN User integer, used only in V2.0, when instreamed, should be converted to property attribute 126. 0x1e USTRING UNKNOWN User string, used only in V2.0, when instreamed, should be converted to property attribute 127. 0x1f REFLIBS 0x06 STRING Names of the reference libraries. Starts with name of the first library and is followed by the second. There are 44 bytes in each, NULLS are used for padding, including filling in an entire unused field. 0x20 FONTS 0x06 STRING Names of the textfont definition files. 4 44 byte fields, padded with NULLS if a field is unused or less than 44 bytes. 0x21 PATHTYPE 0x02 INTEGER_2 Type of path ends. 0: Square ended paths 1: Round ended 2: Square ended, extended 1/2 width 4: Variable length extensions, CustomPlus The default is 0 0x22 GENERATIONS 0x02 INTEGER_2 Number of deleted or backed up structures to retain. Seems a bit odd in an archive... From 2-99, default is 3. 0x23 ATTRTABLE 0x06 STRING Name of the attribute definition file. Max size 44 bytes. 0x24 STYPTABLE 0x06 STRING "Unreleased feature" per GDSII Stream Format Manual, v6.0. 0x25 STRTYPE 0x02 INTEGER_2 "Unreleased feature" per GDSII Stream Format Manual, v6.0 0x26 ELFLAGS 0x01 BIT_ARRAY Flags for template and exterior data. bits 15 to 0, l to r 0=template, 1=external data, others unused 0x27 ELKEY 0x03 INTEGER_4 "Unreleased feature" per GDSII Stream Format Manual, v6.0. 0x28 LINKTYPE UNKNOWN "Unreleased feature" per GDSII Stream Format Manual, v6.0. 0x29 LINKKEYS UNKNOWN "Unreleased feature" per GDSII Stream Format Manual, v6.0. 0x2a NODETYPE 0x02 INTEGER_2 Nodetype specification. On GDSII this could be 0 to 63, LTL allows 0 to 255. Of course a 3 byte integer allows up to 65535... 0x2b PROPATTR 0x02 INTEGER_2 Property number. 0x2c PROPVALUE 0x06 STRING Property value. On GDSII, 128 characters max, unless an SREF, AREF, or NODE, which may have 512 characters. 0x2d BOX 0x00 NO_DATA The beginning of a BOX element. 0x2e BOXTYPE 0x02 INTEGER_2 Boxtype specification. On GDSII this could be 0 to 63, LTL allows 0 to 255. Of course a 3 byte integer allows up to 65535... 0x2f PLEX 0x03 INTEGER_4 Plex number and plexhead flag. The least significant bit of the most significant byte is the plexhead flag. Because of this, you can "only" have 2^24 plex groups. Or is that 2^24-1? I'm not sure if 0 is a valid plex group in a stream file. 0x30 BGNEXTN 0x03 INTEGER_4 Path extension beginning for pathtype 4 in CustomPlus. In database units, may be negative. 0x31 ENDTEXTN 0x03 INTEGER_4 Path extension end for pathtype 4 in CustomPlus. In database units, may be negative. 0x32 TAPENUM 0x02 INTEGER_2 Tape number for multi-reel stream file. 0x33 TAPECODE 0x02 INTEGER_2 Tape code to verify that the reel is from the proper set. 12 bytes that are supposed to form a unique tape code. 0x34 STRCLASS 0x01 BIT_ARRAY Calma use only. In stream files created by non-Calma programs, this should be missing or all field should be 0. 0x35 RESERVED 0x03 INTEGER_4 Used to be NUMTYPES per GDSII Stream Format Manual, v6.0. 0x36 FORMAT 0x02 INTEGER_2 Archive or Filtered flag. 0: Archive 1: filtered 0x37 MASK 0x06 STRING Only in filtered streams. Layers and datatypes used for mask in a filtered stream file. A string giving ranges of layers and datatypes separated by a semicolon. There may be more than one mask in a stream file. 0x38 ENDMASKS 0x00 NO_DATA The end of mask descriptions. 0x39 LIBDIRSIZE 0x02 INTEGER_2 Number of pages in library director, a GDSII thing, it seems to have only been used when Calma INFORM was creating a new library. 0x3a SRFNAME 0x06 STRING Sticks rule file name. 0x3b LIBSECUR 0x02 INTEGER_2 Access control list stuff for CalmaDOS, ancient. INFORM used this when creating a new library. Had 1 to 32 entries with group numbers, user numbers and access rights. ---------------------------------------------------------------------------