ISO9660 (CD Format)
(Last updated: Jan 4th 2004)

Intro
Physical Sector Layout
Logical Sector Layout
Volume Descriptors
Path Tables
Character Sets
Error Correction

Intro

Living abroad, I don't get to see much of my family and old friends, but they do send me videos on CD now and again. Recently, I got some some that the freeware I was using simply couldn't read. Not wanting to spend money on a professional package I thought it would be simply a case of looking up the file format, extracting the files and simply playing the file (from after the first key frame after the damaged sectors if necessary).

Needless to say, it wasn't as straightforward as I hoped. There is surprisingly little concrete information out their on the format of CDs, BIN and IMG files. That's not to say you don't get a lot of hits on Google if you type in phrases like  'ISO', 'BIN' and 'VCD' but the results are just an endless repetition of script kiddies wanting help to rip off the film or software hit.

Anyway, I found the information in the end and was able to write code to extract the files from an ISO image. Admittedly, well before I got to that stage, I found a superior product (ISOBuster) that did everything I wanted for free already (and which handled problems I couldn't, had a GUI etc. etc) but, by then it was too late... it had become personal - me against a host of 12cm platters!

So here, in one place, is the information I found on ISO9660. I've haven't looked at common extensions such as Joilet because they weren't used in the VCDs I received. I'll leave that as an exercise for the reader...

Source code will be published when I've tidied it up a bit.

Links

Key Terminology

Physical Notes

General Notes

Byte

Meaning

0 Years since 1900
1 Month
2 Day
3 Hours
4 Minutes
5 Seconds
6 Signed offset from GMT in 15 minute intervals

Acroynm

Meaning

BCD Binary Coded Decimal
LEF Little Endian Format
BEF Big Endian Format
LBEF Both Endian Format (see General Notes)

 

Physical Sector Layout

While reading my brother's disks I couldn't figure out why the Primary Volume Descriptor was well over a sector off where it was supposed to be. Effectively, the sector size appeared to be 2352 bytes in length, but ISO9660 said it had to be power of two equal to 2048 bytes or more.

A search on Google (AKA the world's largest help-file) eventually found the answers, but not in the ISO nor in other official documentation but from a virus prevention site!

The issue is that the 2048 sector length is correct when reading sectors from the CD as if it were a normal disk. However, if you perform a really low-level read of the disk (such as CD to Image tools do) the entire physical sector is recorded, including information such as sector number and checksum. This additional information explains the additional size.

Offset

Length

Details

0 12 Synchronisation marker: 00 FF FF FF  FF FF FF FF  FF FF FF 00
12 1 2 digit minute index (BCD format)
13 1 2 digit second index (BCD format)
14 1 2 digit 1/75 second index (BCD format)
15 1 Sector type:
    0 = Empty
    1 = Mode 1
    2 = Mode 2
16 8 Sub-header,  Green Book  CDs only
24 2048* Logical Sector ("Data Field of a Sector" in ISO9660 speak)
2072 276 Error Correction data
2348 4 Checksum
2352    

* 2048 is the universally used value, but ISO9660 only requires it to be any power of 2 resulting in a value of 2048 or more.

In MODE2XA, the first 16 bytes are excluded from the header.

So in short, you may see physical sector sizes of 2048 (ISO, MODE1), 2336 (MODE2XA) or 2352 (low-level reads, BIN). ISO9660 does not allow the logical sector size to change, but Yellow Book does and it seems several of the images I have take advantage of this - making it necessary to check the sector type field (and possibly the sub-header field as well) for each sector.

Update - Books in all Colours of the Rainbow

I've found out since then that the other Modes correspond to other standards. Also, there appears to be 784 bytes of EDC / ECC and 98 control bytes beyond the 2352 byte sector that are never visible to the PC.

RedBook defines the standard for Audio CDs and uses a physical sector structure of:

Redbook

Offset

Length

Details

0 2352 Data

YellowBook defines the standard for Data CDs and provides the physical layer on which the logical ISO9660 layer is implemented. In MODE1 Yellow Book CDs have the format:

Yellow Book Mode 0

Offset

Length

Details

0 12 Synchronisation marker: 00 FF FF FF  FF FF FF FF  FF FF FF 00
12 1 2 digit minute index (BCD format)
13 1 2 digit second index (BCD format)
14 1 2 digit 1/75 second index (BCD format)
15 1 Sector type:
    0 = Mode 0
16 2336 Zeros

Yellow Book Mode 1

Offset

Length

Details

0 12 Synchronisation marker: 00 FF FF FF  FF FF FF FF  FF FF FF 00
12 1 2 digit minute index (BCD format)
13 1 2 digit second index (BCD format)
14 1 2 digit 1/75 second index (BCD format)
15 1 Sector type:
    1 = Mode 1
16 2048 Logical Sector ("Data Field of a Sector" in ISO9660 speak)
2064 4 EDC (32-bit CRC checksum from the previous 2064 bytes)
2068 8 Intermediate field (all zeros)
2076 276 Error Correction data:
172 byte P-Parity followed by 104 byte Q-Parity.
I have no idea what that means, but it sounds impressive :-)

Yellow Book Mode 2 (Rare?)

Offset

Length

Details

0 12 Synchronisation marker: 00 FF FF FF  FF FF FF FF  FF FF FF 00
12 1 2 digit minute index (BCD format)
13 1 2 digit second index (BCD format)
14 1 2 digit 1/75 second index (BCD format)
15 1 Sector type:
    2 = Mode 2
16 2336 Logical Sector ("Data Field of a Sector" in ISO9660 speak)
2352    

GreenBook defines a standard for 'enhanced architecture' CDs:

Green Book Mode 2 Form 1

Offset

Length

Details

0 12 Synchronisation marker: 00 FF FF FF  FF FF FF FF  FF FF FF 00
12 1 2 digit minute index (BCD format)
13 1 2 digit second index (BCD format)
14 1 2 digit 1/75 second index (BCD format)
15 1 Sector type: 2
16 8 Sub-header
24 2048 Logical Sector ("Data Field of a Sector" in ISO9660 speak)
2072 4 EDC (Checksum)
2076 276 Error Correction data
2352    

Green Book Mode 2 Form 2

Offset

Length

Details

0 12 Synchronisation marker: 00 FF FF FF  FF FF FF FF  FF FF FF 00
12 1 2 digit minute index (BCD format)
13 1 2 digit second index (BCD format)
14 1 2 digit 1/75 second index (BCD format)
15 1 Sector type: 2
16 8 Sub-header
24 2324 Logical Sector ("Data Field of a Sector" in ISO9660 speak)
2072 4 EDC (Checksum)
2352    

Using Google to translate the one site that had any information about the sub-header in it, it appears that bit 5 of the 3rd byte of the sub-header is set for Form 2 and clear for Form 1.

Other books include:

Colour Covers
Orange CD-WO
Blue Laserdisk
White VCD

 

Logical Sector Layout

 ISO9660 calls sectors 0-15 the System Area and sectors 16+ the Data Area and the whole thing (System Area + Data Area) Volume Space.

Universally, Logical Sectors are 2048 bytes in length although ISO9660 allows them to be any power of 2 that results in a value of 2048 or more. If the physical sector size of the medium is less than 2048, the data from multiple sequential physical sectors are joined to make one logical sector.

Sector

Contents

0-15 First 16 sectors are reserved for bootable disks. These sectors should be filled with zeros for non-bootable disks.
16 - X Volume Descriptor(s)
A CD can have multiple volumes on it. For each volume on the CD a corresponding Volume Descriptor appears here. Each Volume Descriptor takes an enter sectors to describe.

Typically a CD will have just two Volume Descriptors: The Primary Volume Descriptor (which contains the pointer to the root directory information) and the Volume Descriptor Set Terminator (which indicates no more Volume Descriptors follow).
X+ Directory Records and File Data

ISO9660 also allows for Logical Blocks within each sectors. Each block contains 2n+9 (n>=0) bytes. Universally, n is always selected to give the same size as the Logical Sector (i.e. 2048 bytes) but it is theoretically possible to have another value provided it is equal or less than the size of the Logical Sector. Reading the ISO9660 specification it appears that logical blocks can cross logical sectors (i.e. unlike directory records).

Logical Block 0 maps to the start of Volume Space, not Data Space as you might have expected (i.e. as I would have expected :-)).

Volume Descriptors

The interesting stuff starts with sector 16 which contains the Volume Descriptors. A single CD may contain multiple volumes in which case additional sequential Logical Sectors (one per descriptor) will be
Describe the disk (technically, you could have multiple volumes on the same disk in which case additional volumes would have their own Volume Descriptors).

Volume Descriptor Header (logical sector 16)

Their are four Volume Descriptors defined in ISO9660, of which the only one of serious interest is the Primary Volume descriptor, but I'll give passing mention later to the others as as well.

Offset

Length

Details

0 1 Volume Descriptor Type;
Value Meaning
0 Boot
1 Primary Volume Descriptor
2 Supplementary Volume Description
3 Volume Partition Descriptor
4-254 Reserved
255 Volume Description Set Terminator

(only Value = 1 is common, and that's what the reset of the section presumes as offset 8+ is type dependent)

1 5 Standard Identification (always "CD001")
6 1 Volume Descriptor version (1)
7 2041* Contents depends on Volume Descriptor Type
2048    

* Presumes standard Logical Sector size of 2048 bytes

Boot Record (Volume Descriptor Type = 0)

Since a booting is so system specific, ISO9660's Boot Record is very generic consisting of no more than two identification strings and some reserved space. In addition to the Boot record, the first 15 logical sectors are reserved for system specific boot information.

Offset

Length

Details

7 32 Boot System Identifier (string)
39 32 Boot Identifier (string)
71 1977 Boot System Use (ISO9660 just defines this space as being available to the application)
2048    

 

Primary Volume Descriptor (Volume Descriptor Type = 1)

Offset

Length

Details

7 1 Unused & set to zero
8 32 System Identifier
40 32 Volume Identifier
72 8 Unused & set to zero
80 8 Volume space size, in sectors, in LBEF
88 32 Unused & set to zero
120 4 Volume set size, in LBEF
124 4 Volume sequence number, in LBEF
128 4 Logical Block size (typically 2048) in LBEF
132 8 Path table size in LBEF
140 4 Sector number (in LEF) for start of 1st LEF path table (type L)
144 4 Sector number (in LEF) for start of 2nd LEF path table, or 0 if no second table (type L)
148 4 Sector number (in BEF) for start of 1st BEF path table (type M)
152 4 Sector number (in BEF) for start of 2nd BEF path table, or 0 if no second table (type M)
156 34 Root Directory Record (see directory record format below)
190 128 Volume Set Identifier (string)
318 128 Publisher Identifier (string)
If this field starts with ASCII 5F (underscore) character, the rest of the field is the name of a file (in 8.3 format) in the root directory. If first character is a space, this information hasn't been supplied.
446 128 Data Prep Identifier (string, see Publisher Identifier for format)
574 128 Application Identifier (string, see Publisher Identifier for format)
702 37 Copyright Identifier (string, see Publisher Identifier for format)
739 37 Abstract File Identifier (string, see Publisher Identifier for format)
776 37 Bibliographical Identifier (string, see Publisher Identifier for format)
813 17 Volume Creation Date (string)
In YYYYMMDDMMHHSSsso text format, where ss = 1/100ths of a second and o = GMT offset as per 7 byte dates, e.g. 20031223123000000)
830 17 Last Modified Date (string, see Volume Creation Date for format)
847 17 Expiry Date (string, see Volume Creation Date for format)
864 17 Effective Date (string, see Volume Creation Date for format)
881 1 File structure version (1)
882 1 Reserved (zeros)
883 512 Reserved for application use
1395 643 Reserved (zeros)
2048    

 

Directory Record

Directory records must finish in the same logical sector that they began. A logical sector can contain multiple directory records. Any unused space at the end of logical sector should be filled with zeros.

Note that Directory Records are contained in Logical Sectors (not Logical Blocks), but contain pointers to Logical Blocks (not sectors).

Offset

Length

Details

0 1 Record length
1 1 Extended Attribute (XAR) Length. The number of logical blocks taken up with additional OS specific information (e.g. file rights).
2 8 Extent Location

The index of the first Logical Block Number containing the XAR / File Data or Directory Entry, in LBEF. An Extent is simply ISO9660 speak for a sequential set of logical blocks.

Note: If XAR data is present this is placed in front of the file data. Thus the effective file start is this value plus the value in XAR Length field above - i.e. file data starts in the next block following the last XAR block.

10 8 Length (in bytes) of File Data or Directory Entry, in LBEF
Note: XAR blocks do not effect this value
18 7 Datetime stamp:
Byte Meaning
0 Years since 1900
1 Month
2 Day
3 Hours
4 Minutes
5 Seconds
6 Signed offset from GMT in 15 minute intervals
25 1 Flags:   
Bit (0=LSB) Meaning
0 Set for hidden files
1 Set for directories, Clear for files
2 Set for associated files
3 Set if a record format specified in associated XAR data (I haven't looked in depth at this, but I think it means you should set this flag if its a database file you have provided the schema of in the optional XAR blocks)
4 Set if permissions / protections / rights specified
5 Reserved (clear)
6 Reserved (clear)
7 Set if not the final record for the file. This only occurs is the file is split over multiple Extents and I can't see why you would want to do that.
26 1 Interleaved* file unit size (0)
27 1 Interleaved* file gap size (0)
28 4 Volume sequence number, in LBEF
32 1 Length of following field
33 N File or directory name / identifier.

For Files:
File name in the format 8.3;version (where version is an value between 1 and 32,767 . Note this is stored as character not binary). See Character Sets for more information
Example: "MYFILE.TXT;3"
Note: 8.3 is ISO9660-1. ISO9660-2 allows longer values provided Filename + Extension does not exceed 30 characters. [??I may be wrong??]

For Directories:
"
\0" indicates this entry refers to the root [??self, not root-root??] directory
"\1" indicates this entry refers back to the parent directory
Otherwise the directory name (cannot exceed 31 characters)

  [1] Padding byte to bring length of N up to an even number. Not included if N is already even, set to \0 otherwise
Record -X
Length
X Reserved for system use. X must be an even number (padded with a zero if necessary). Also used to pad out final entry in a sector such that the sector is filled

* I'm guessing here, but I imaging if you have a slow device, interleaving would improved performance as you wouldn't keep having to move the head backwards to catch sequential sectors you were too slow to pick up first time.

Volume Descriptor Set Terminator (Volume Descriptor Type = 255)

Indicates no further Volume Descriptors follow.

Offset

Length

Details

7 2041 Zeros
2048    

 

Path Tables

Path tables contain a linked list of all directories on the CD and where there corresponding Directory Records on disk can be found. Path Tables were developed to provide a fast way for slow systems (especially those with  slow seek times) to locate files deep in the folder structure without having to transverse all the intermediate Directory Records which, potentially, could be spread over the entire disk.

Since this information is redundant and I'm only interested in reading the entire disk I haven't looked into Path Tables in any depth. They are mentioned here only for completeness.
 

Character Sets

Technically, ISO9660 supports only a limited subset of ANSII characters in file and directory names (capitals A-Z, 0-9, !"%&'()*+,-./?) and an even more limited set of characters in volume names (just A-Z, 0-9 and underscores). Nearly universally, this is ignored.
 

Error Correction

Error correction used with computer data is much more complete that that used with Audio CDs. Audio CDs use something called Cross Interleaved Reed Solomon Code (CIRC for short) to correct errors, with a fallback or simply skipping the problematic sector (which equates to 1/75th of a second of audio lost).

CD-R use CIRC as well (they use something else as well that I'll come to in a minute). CIRC consists of two parts - C1 and C2. C1 is useful for correcting small errors (1 - 3 bytes per sector), C2 for larger errors. Reading between the lines, this error correction is performed at the hardware level and the additional information used in the repair is not visible to the PC.

Since the PC could crash as the result of corrupted data, error correction is much more important to it. While it still applies CIRC it also applies additional EDC/ECC (Error Detection Code / Error Correction Code) on top. Unlike the data used by CIRC, this information is visible to the PC.

The additional room required to store this EDC/ECC information is why an Audio CD holds approximately 97 megs* more of information than a Data CD.

97 Megs?

CD Audio is sampled at 16-bit quality, 44100 times a second, in stereo. An audio CD holds 74 minutes of this information so, in bytes, an audio CD must hold that comes:

2 * 44100 * 2 * 74 * 60 = 783,216,000 bytes = 746.93 Megs

We all know that a standard Data CD holds 650 Megs, but here's an non-empirical way to obtain that number:

Each sector on an audio CD corresponds to 1/75th of a second. A normal Data CD has the same number of sectors as an Audio CD (presumably keeping the same track / sector alignment promotes maximum compatibility in the hardware). We know that we get 2048 bytes per sector, so the capacity of a Data CD is therefore:

74 * 60 * 75 * 2048 = 681,984,000 bytes = 650.39 Megs

Subtracting the capacity of a Data CD from a Audio CD:

746.93 - 650.39 = 96.54 Megs

According to http://www.gi.alaska.edu/crc/cdrom/cdrom.html, the difference between Mode1 and Mode2 CDs is that Mode2 doesn't bother with the EDC/ECC information.

EDC / ECC requires so mathematics / information theory that would require actual effort on my part - which is something I usually require imminent danger to trigger.  So my investigation stops here :-)