Analysis of two commercial C64 tape loaders


This document is about commercial C64 tape loaders.

I think a short introduction about the C64 I/O hardware is required in order to deeply understand the meaning of data in the Tap file format, which is not just a hardware-independent array of bytes.

Data encoding

This paragraph describes data transfer between the C64 processor and the cassette unit (aka C2N, aka Datassette).

Each bit of data or program sent to the C2N is encoded by the operating system using audio frequencies. Specifically, square waves with a 50% duty cycle, often referred to as pulses (this is also the term I’ll often be using moving forward), are used.
A sequence of bits is therefore encoded as a sequence of square waves on tape, back-to-back.

The standard Commodore encoding method uses three distinct pulses:

  • “long pulses” with a frequency of 1488 Hz (period = 672 microseconds about),
  • “medium pulses” at 1953 Hz (period = 512 microseconds about),
  • “short pulses” at 2840 Hz (period = 352 microseconds about).

It is evident that each name refers to the period (i.e. 1/frequency) of the square wave rather than to its frequency.
Data bits are encoded by a couple of pulses (medium and short pulses are used). The structure of this loader is discussed in the CBM Loader article.

Sample of pulses coming from C2N READ pin during a CBM file reading:

..      ____        ______      ____   
  |    |    |      |      |    |    | 
  |    |    |      |      |    |    | 
  |____|    |______|      |____|    |..

Since pulse length detection triggers on descending (negative) edges, this sample produces the following sequence of pulses:

   <-352 us-><-  512 us  -><-352 us->
  |         |             |         |
  |         |             |         |
     short       medium      short

On the other side, a Turbo Loader usually uses just two pulses:

  • “short pulse”
  • “long pulse”

whose frequencies are chosen by its designer. The length of a pulse saved on tape decides whether the bit is a 1 or a 0. In fact, these loaders don’t usually use a sequence of pulses to encode a bit: just a single pulse per bit.

NOTE about the hardware

One thing to point out is that during a SAVE operation, on the WRITE line of the Datasette port, “pulse length” is intended as the time distance between two consecutive low-high transitions. During a LOAD operation, pulse length is the time distance between two consecutive high-low transitions, since the 6526 READ line triggers on negative edges. In other words, the signal from C2N to C64 is the negative version of the one from C64 to C2N. That’s why tape duplication hardware consists in an inverter (with a BJT and 2 resistors used in a so called “common emitter” scheme): the signal from the C2N performing LOAD, intended for being sent to C64, needs being inverted before being sent to the C2N doing the SAVE operation.

Check at the end of this article for additional information on CIA 1 + 2 and Vectors.

TAP format

This paragraph summarizes the Tap file format purposes. For a detailed discussion about this topic you may consult the “tapformat.html” file in your CCS64 Emulator folder or online at the “computerbrains” CCS64 home page.

Designed by Per Hakan Sundell (author of the CCS64 C64 emulator) in 1997, this format attempts to duplicate the data stored on a C64 cassette tape, pulse after pulse. The difference from a WAV sampling of any C64 tape and its Tap file data is that Tap file is not a sampled version of the waveforms stored on tape.

Each nonzero byte in the Tap file data area represents the time length of a single pulse. The conversion formula is given here:

Tap Data Byte=Pulse length (expressed in seconds)*C64 PAL Frequency/8

where “C64 Pal Frequency” is 985248 Hz. By calculating the constant 1E-6*C64 PAL Frequency/8, the following equivalent formula can be produced:

Tap Data Byte=Pulse length (microseconds)*0.123156

where “Pulse length” is the time interval between two negative edges of the received square wave. As example, CBM pulses correspond to the following TAP values:

short:  352 * 0.123156 = 43.35 = $2B
medium: 512 * 0.123156 = 63.01 = $3F
long:   672 * 0.123156 = 82.76 = $53

It is clear that this conversion introduces some information alteration due to quantization: two pulses with similar length may produce the same Tap value.

NOTE about CBM pulses size: Tap imports of older tapes show a better consistence with the above mentioned values than younger ones. Anyway the operating system can produce a correction factor which allows a very wide variation in tape speed without affecting reading. In fact, the sequence of short pulses written on the CBM leader is used to synchronize the read routine timing to the timing on the tape.

Commercial tape loaders

We’ll have a general discussion here about Turbo Loaders and C64 I/O dedicated hardware.

Almost every marketed C64 tape software uses some form of Turbo Loader. The origin of these Turbo Loaders is rather obscure since many of the software houses use the same routines.

A Turbo Loader is a routine which must be loaded into C64 RAM before being executed and therefore every Turbo Loader routine is stored in a Standard CBM encoded “boot” file. Usually a part of the Turbo Loader routines is stored in the CBM file Header and therefore loaded in the tape buffer (at $033C-$03FB). CBM file Data is often used both for other Turbo Loader routines and to modify the table of vectors in low RAM, to cause the autostart of the turbo loader itself (eg. it may modify $0326/$0327 where the output vector is located). When the standard LOAD ends, the operating system executes various operations, one of which is printing the “READY.” message on the C64 screen. By default, at $0326/$0327 there’s the start address of the onscreen print routine (remember any of the 64K memory addresses can be identified by 2 bytes, low significant part first and then most significant. As example, the couple of values 01-08 is a pointer to $0801). If CBM loader loads Data block overwriting this vector with the Turbo Loader start address, the operating system, instead of printing the “READY.” message at the end of CBM LOAD, executes the Turbo Loader.

When executed, a Turbo Loader “replaces” the existing LOAD and allows a program or data to be loaded from tape at a faster speed than the normal LOAD. This is achieved by simply reducing the length of pulses stored onto the tape, in order to allow a far greater density of information storage per inch of tape.

Each bit is flagged in the interrupt register on the falling (negative) edge of the pulse. A widely used Turbo loader scheme runs with the interrupts disabled, sets a timer to between the two lengths (which we will refer as “threshold” value), and when the timer runs out, the interrupt register is checked to see if the pulse came in or not. If the falling edge of the pulse sets the relevant interrupt flag before the timer runs out then the pulse was a “short” pulse (usually identified with bit zero), otherwise it was a “long” one (bit one). Bits are then rotated into a byte storage until 8 bits have been read, thereby loading a full byte. This rotation may be to left or right, which establishes endianess: MSbF or LSbF.

Before any byte can be read and stored, the Turbo Loader must set itself to be in sync with the bits on the tape. This is done by writing a certain string of bits at every byte interval. The routine then tries to align itself by recognising the value of the byte. An example of a header byte for aligning would be the value 64, hex $40 or in binary: 01000000. A series of these bytes is written as the header; only when this byte has been read in a number of times consecutively, the actual program can be read without risk of alignment errors.

Non-IRQ tape loader

We document here a “nonIRQ based” loader step-by-step. Before starting with this reading it’s necessary to have a good knowledge of CIA Timers. I reported in Appendix A and B an extract (I did some changes where needed) from MAPC6410.TXT (the Project 64 etext of the “Mapping The Commodore 64 book”). Those paragraphs just cover the CIAs use we are interested in.
In addition to CIA Timers, you might want to have a copy of the “Commodore 64 Programmer’s Reference Guide” closeby, to consult it while studying the ASM code.

Here we’ll see how this Turbo Loader performs the operations we discussed before. Please consult the documentation about this loader (CHR Loader T1) coming with Stewart Wilson’t Final TAP. Get a Tap version of “Cauldron” if you want to extract yourself the listings I report here.

CHR Loader T1-T3 routines are partly stored in the CBM file Header. CBM file Data (loaded at $02A7-0303) stores other routines and is used to cause the autostart of this loader. The autostart feature is performed by using the IMAIN Vector at $0302-0303. By default, this vector points to the address of the main BASIC program loop at $A483. This is the routine that is operating when you are in the direct mode (READY.). It executes statements, or stores them as program lines. Cauldron Loader sets the IMAIN Vector to point to $02AE, therefore, when CBM LOAD ends, control is given to the Turbo Loader.

Using Final TAP to exam the mentioned Tap file, you can get the following listings.

* CBM Data block *

--New STOP routine (ignore it now)-
02A7  A9  80      LDA #$80
02A9  05  91      ORA $91
02AB  4C  EF  F6  JMP $F6EF

* Start of this Loader *
02AE  A9  A7      LDA #$A7
02B0  78          SEI
02B1  8D  28  03  STA $0328  ; Changes Vector to "Kernal STOP
02B4  A9  02      LDA #$02   ; Routine" into $02A7
02B6  8D  29  03  STA $0329

02B9  58          CLI

02BA  A0  00      LDY #$00   ; Inits some locations used by
02BC  84  C6      STY $C6    ; this loader
02BE  84  C0      STY $C0
02C0  84  02      STY $02

02C2  AD  11  D0  LDA $D011  ; Blanks screen
02C5  29  EF      AND #$EF
02C7  8D  11  D0  STA $D011

02CA  CA          DEX        ; A small pause here
02CB  D0  FD      BNE $02CA
02CD  88          DEY
02CE  D0  FA      BNE $02CA

02D0  78          SEI        ; Sets interrupt disable
                             ; status bit
02D1  4C  51  03  JMP $0351

--Read Bit subroutine--------------
02D4  AD  0D  DC  LDA $DC0D  ; Checks the interrupt register
02D7  29  10      AND #$10   ; to see if the pulse (negative
02D9  F0  F9      BEQ $02D4  ; edge on a C64) came in or not

02DB  AD  0D  DD  LDA $DD0D  ; Checks the countdown (bit 2 will
                             ; be 1 if countdown runned out)

02DE  8E  07  DD  STX $DD07  ; Sets a new Timer B countdown

02E1  4A          LSR        ; Move bit 2 to the Carry bit
02E2  4A          LSR
02E3  A9  19      LDA #$19   ; Starts Timer B (one shot, force
02E5  8D  0F  DD  STA $DD0F  ; latch value being loaded)
02E8  60          RTS

--Back to prompt-------------------
02E9  20  8E  A6  JSR $A68E  ; Resets the CHRGET pointer

02EC  A9  00      LDA #$00
02EE  A8          TAY
02EF  91  7A      STA ($7A),Y

02F1  4C  74  A4  JMP $A474  ; Prints Ready and then
                             ; processes keyboard buffer

02F4  52 D5 0D    ;"R", "SHIFT+U", "RETURN" (stays for "RUN", followed by RETURN key)

02F7  00 00 00 00 00 00 00 00 00

0300  8B E3       ; default value, not changed

0302  AE 02       ; used to perform the Autostart
* CBM Header block *

033C-0350  File details (see CBM File header)

-Loader's Core---------------------
0351  78          SEI

0352  A9  07      LDA #$07   ; Sets a Threshold value via
0354  8D  06  DD  STA $DD06  ; Timer B countdown
0357  A2  01      LDX #$01

0359  20  D4  02  JSR $02D4  ; Tries to align bits of leader
035C  26  F7      ROL $F7    ; with MSbF until...
035E  A5  F7      LDA $F7
0360  C9  63      CMP #$63   ; ... a Lead-in byte is found.
0362  D0  F5      BNE $0359
0364  A0  64      LDY #$64   ; Sync train start value

0366  20  E7  03  JSR $03E7
0369  C9  63      CMP #$63   ; Reads the whole leader
036B  F0  F9      BEQ $0366

036D  C4  F7      CPY $F7
036F  D0  E8      BNE $0359
0371  20  E7  03  JSR $03E7
0374  C8          INY        ; Reads the whole Sync train
0375  D0  F6      BNE $036D

0377  C9  00      CMP #$00   ; After Sync there's a Check byte
0379  F0  D6      BEQ $0351  ; if it is $00 then Reload

037B  20  E7  03  JSR $03E7
037E  99  2B  00  STA $002B,Y  ; Loads a 10 bytes header
0381  99  F9  00  STA $00F9,Y  ; The following code (at $0392,
0384  C8          INY          ; $039E, $03D1, $03C6) tells us
0385  C0  0A      CPY #$0A     ; us they consist in: Load
0387  D0  F2      BNE $037B    ; address, End address,
                               ; Execution address and 2 flag
                               ; bytes, which state if all
                               ; turbo files were loaded and
                               ; what to do once done

0389  A0  00      LDY #$00   ; Inits locations used to store
038B  84  90      STY $90    ; the Checksum info
038D  84  02      STY $02

--Load Loop------------------------
038F  20  E7  03  JSR $03E7    ; Reads a new byte

0392  91  F9      STA ($F9),Y  ; Stores it into RAM using the
                               ; Load address locations as
                               ; destination pointer

0394  45  02      EOR $02      ; computes the XOR Checksum
0396  85  02      STA $02      ; of Data

0398  E6  F9      INC $F9      ; Increases dest. pointer
039A  D0  02      BNE $039E
039C  E6  FA      INC $FA

039E  A5  F9      LDA $F9      ; Checks if dest. pointer (16
03A0  C5  2D      CMP $2D      ; bits) equals End Address
03A2  A5  FA      LDA $FA
03A4  E5  2E      SBC $2E
03A6  90  E7      BCC $038F    ; not yet finished? Restart!
--End of Load Loop-----------------

03A8  20  E7  03  JSR $03E7  ; Reads a closing byte (Checksum)

03AB  C8          INY
03AC  84  C0      STY $C0    ; Allows control of the motor
                             ; via software
03AE  58          CLI
03AF  18          CLC

03B0  A9  00      LDA #$00
03B2  8D  A0  02  STA $02A0

03B5  20  93  FC  JSR $FC93  ; Restores the Default IRQ
                             ; Routine. This subroutine
                             ; is used to turn the screen
                             ; back on and stop the cassette
                             ; motor.

03B8  20  53  E4  JSR $E453  ; Calls this subroutine to copy
                             ; the table of vectors to
                             ; important BASIC routines
                             ; to RAM, starting at location
                             ; $300. This prevents the Turbo
                             ; loader is executed again if
                             ; control is given back to the
                             ; BASIC interpreter.

03BB  A5  F7      LDA $F7  ; Checks checksum
03BD  45  02      EOR $02
03BF  05  90      ORA $90
03C1  F0  03      BEQ $03C6

03C3  4C  E2  FC  JMP $FCE2  ; A wrong checksum causes a SOFT
                             ; Reset

03C6  A5  31      LDA $31    ; First flag byte: other files
03C8  F0  03      BEQ $03CD  ; need to be loaded?
03CA  4C  B9  02  JMP $02B9

03CD  A5  32      LDA $32    ; Second flag byte: use the Exec.
03CF  F0  03      BEQ $03D4  ; address or give back control
                             ; to BASIC?

03D1  6C  2F  00  JMP ($002F)  ; Jumps to Exec. address

03D4  20  33  A5  JSR $A533  ; Relinks Lines of a BASIC
                             ; Program.

03D7  A2  03      LDX #$03   ; Puts 3 chars in the Keyboard
03D9  86  C6      STX $C6    ; Buffer

03DB  BD  F3  02  LDA $02F3,X  ; Those are "R", "SHIFT+U"
03DE  9D  76  02  STA $0276,X  ; and "RETURN"
03E1  CA          DEX
03E2  D0  F7      BNE $03DB
03E4  4C  E9  02  JMP $02E9

--Read byte subroutine-------------
03E7  A9  07      LDA #$07   ; 8 bits to read...
03E9  85  F8      STA $F8
03EB  20  D4  02  JSR $02D4
03EE  26  F7      ROL $F7    ; ...grouping them with MSbF
                             ; ROL retrieves the Carry
                             ; bit where incoming bit was
                             ; stored (code at $02E1)
03F0  EE  20  D0  INC $D020  ; Performs border flashing
03F3  C6  F8      DEC $F8
03F5  10  F4      BPL $03EB
03F7  A5  F7      LDA $F7
03F9  60          RTS
03FA  00 00

IRQ-based tape loader

I’ll assume you are familiar with hardware interrupts and ISRs (you don’t absolutely require to know how they work on a C64, but a small knowledge about interrupts in general is essential). If you know about any data-link layer networking protocol, it can be useful (for understanding things better) to compare the Datasette to a network adapter. Import the problems of framing you have on the data-link layer (which is equivalent to our loader) and adapt them to our study.

First, here’s a summary of what we need to do when writing an IRQ-based loader:

  • We first need to disable the system of interrupts, by setting the interrupt disable status bit (this is done by a SEI instruction).
  • Then we have to disable all interrupts individually (by WRITING to $DC0D, which is an Interrupt Control Register when written to) and clear any latched interrupt request (by READING the clear-on-read register $DC0D, which is an Interrupt Latch Register when read from- e.g. bit 1 reads 1 when CIA #1 Timer B countdown expires).
  • Now we have to set the start value of the timer we’ll be using to measure the length of the pulses coming from the tape (CIA #1 Timer A was chosen in the discussed loader). That’s done by WRITING the start value in $DC04/$DC05 (which is the CIA #1 16-bit Timer A latch value). The timer will count down to zero starting from the value we just chose (one-shot mode). We’ll restart the countdown every time we received a pulse, to measure the pulse that will come after the one we just measured.
  • Then we have to enable the FLAG line interrupt (the interrupt that triggers when a pulse is read from the Datasette). The interrupt won’t trigger until we enable the system of interrupts. Before doing that, we have to declare where our Interrupt Service Routine is (by making the vector at $FFFE/$FFFF point to our ISR).
  • After enabling interrupts (CLI instruction), we are ready to measure the pulses coming from the Datasette, align our read routine with the bit stream (using the pilot byte information), synchronize (i.e. know where exactly the turbo frame starts), and finally read the header which tells us where to store the following data bytes in the RAM.

Disassembly of the code stored in the CBM Header and Data files of Terminator 2:

; ********************************************
; * Loader Setup-Part 1                      *
; * Description: Hardware setup instructions *
; ********************************************
02A7  78             SEI            ; Disable interrupts, since we are about to
                                    ; change the vector table at $FFFA-$FFFF, whose
                                    ; vectors point to 2 Interrupt Service Routines.

02A8  A9 05          LDA #$05       ; Select ROM at $A000       (bit 0)
02AA  85 01          STA $01        ; and switch in I/O devices (bit 2).

02AC  A9 1F          LDA #$1F       ; CIA #1 Interrupt Control Register reset:
02AE  8D 0D DC       STA $DC0D      ;  disable Timer A interrupt               (bit 0)
                                    ;  disable Timer B interrupt               (bit 1)
                                    ;  disable TOD clock alarm interrupt       (bit 2)
                                    ;  disable serial shift register interrupt (bit 3)
                                    ;  disable FLAG line interrupt             (bit 4)

02B1  AD 0D DC       LDA $DC0D      ; Clear Interrupt Latch to prevent servicing
                                    ; interrupt requests not requested by our program.
                                    ; This register is clear-on-read.

02B4  A9 7C          LDA #$7C       ; CIA #1 Timer A Latch value setup.
02B6  8D 04 DC       STA $DC04     
02B9  A9 04          LDA #$04       
02BB  8D 05 DC       STA $DC05      ; (Threshold=$027C clock cycles)

02BE  A9 90          LDA #$90       ; CIA #1 Interrupt Control Register setup:
02C0  8D 0D DC       STA $DC0D      ;  enable just FLAG line interrupt (bit 4) (1)

; (1) This FLAG line is connected to the Cassette Read line of the Cassette Port.
;     The interrupt triggers on negative edges.

02C3  A9 51          LDA #$51       ; Maskable Interrupt Request Vector setup:
02C5  8D FE FF       STA $FFFE      ;  make this vector point to our IRQ handler (ISR)
02C8  A9 03          LDA #$03       ;  located at $0351, so that the only active
02CA  8D FF FF       STA $FFFF      ;  Interrupt (FLAG line) will cause its execution
                                    ;  on request.

02CD  A9 00          LDA #$00       ; Initialization of:
02CF  85 02          STA $02        ;  loop_break variable (see later)
02D1  85 03          STA $03        ;  buffer where to build a byte, pulse by pulse.

02D3  EA             NOP           

02D4  4C E5 02       JMP $02E5      ; Jump to Part 2
; ********************************************
; * Loader Setup-Part 1.END                  *
; ********************************************

; ********************************************
; * Checksum check subroutine                *
; * Description: Compares calculated and     *
; *              read checksum to detect a   *
; *              load error.                 *
; ********************************************

02D7  A9 07          LDA #$07       
02D9  85 01          STA $01       
02DB  A5 05          LDA $05       
02DD  C5 06          CMP $06       
02DF  D0 01          BNE $02E2     
02E1  60             RTS           

02E2  4C E2 FC       JMP $FCE2      ; On checksum error, reset C64
; ********************************************
; * Checksum check subroutine.END            *
; ********************************************

; ********************************************
; * Loader Setup-Part 2                      *
; * Description: Hardware setup instructions *
; ********************************************

02E5  A9 E7          LDA #$E7       ; Non-Maskable Interrupt Hardware Vector setup:
02E7  8D FA FF       STA $FFFA      ;  make it point to our Load Loop at $03E7. (2)
02EA  A9 03          LDA #$03       
02EC  8D FB FF       STA $FFFB     

; (2) There are two possible sources for an NMI interrupt.  The first is the
;     RESTORE key, which is connected directly to the 6510 NMI line.  The
;     second is CIA #2, the interrupt line of which is connected to the 6510
;     NMI line.

02EF  A9 01          LDA #$01       ; Set CIA #2 Timer A high byte
02F1  8D 05 DD       STA $DD05     

02F4  A9 81          LDA #$81       ; CIA #2 Interrupt Control Register setup:
02F6  8D 0D DD       STA $DD0D      ;  enable Timer A interrupt (bit 0)

02F9  A9 99          LDA #$99       ; CIA #2 Control Register A setup:
02FB  8D 0E DD       STA $DD0E      ;  start timer A                (bit 0)
                                    ;  Timer A run mode is one-shot (bit 3)
                                    ;  Force latched value to be
                                    ;  loaded to Timer A counter    (bit 4)

02FE  D0 FE          BNE $02FE      ; C64 should hang here, but CIA #2 Timer A
                                    ; expiration causes the NMI request, which makes
                                    ; Program Counter move to $03E7.

; ********************************************
; * Loader Setup-Part 2.END                  *
; ********************************************

; *****************************
; * BASIC RAM vector area (3) *
; *****************************
0300  8B 03 01 E3
0302  A7 02
0332  ED F5

; (3) Several important BASIC routines are vectored through RAM. Vectors
;     to all of these routines can be found in the indirect vector table.
;     The turbo loader changes those vectors to execute itself when the
;     CBM file is fully loaded (this is called "AUTOSTART").

; ***************************************************************
; * ISR                                                         *
; * Description: Interrupt Service Routine that handles FLAG    *
; *              line interrupts                                *
; ***************************************************************

; Each interrupt is triggered by a pulse read from tape, so we need to
; compare it's size (counted by a timer) with a Threshold value, to
; decide if it's a Bit 0 pulse or Bit 1 pulse.

0351  48             PHA            ; We'll be using A and Y registers
0352  98             TYA            ; so we save them on the processor stack,
0353  48             PHA            ; just as every Interrupt Service Routine does.

0354  AD 20 D0       LDA $D020      ; Perform border flash among 2 colors
0357  49 05          EOR #$05       
0359  8D 20 D0       STA $D020     

035C  AD 05 DC       LDA $DC05      ; Read the Timer value

035F  A0 19          LDY #$19       ; CIA #1 Control Register A re-initialized
0361  8C 0E DC       STY $DC0E      ; for the next pulse measurement:
                                    ;  Start Timer A                (bit 0)
                                    ;  Timer A run mode: continuous (bit 3)
                                    ;  Force latched value to be
                                    ;  loaded to Timer A counter    (bit 4)

0364  49 02          EOR #$02       ; This piece of code subtracts $200 clock cycles
0366  4A             LSR            ; from the Timer value. (4)
0367  4A             LSR            ; Carry is set when pulse is bigger than Threshold
                                    ; ie. [Latch value - $200] clock cycles.

0368  26 03          ROL $03        ; Group bits with MSb First

036A  A5 03          LDA $03       
036C  90 02          BCC $0370      ; IF AND ONLY IF the last bit of a byte was just
                                    ; read, a 0 will be moved from bit 7 of $03
                                    ; to the Carry by the "ROL $03" instruction,
                                    ; otherwise the Carry will be set (see code
                                    ; at $0379 to understand why).
                                    ; Therefore Carry is set IFF a complete byte
                                    ; is not yet available.(5)
                                    ; If a complete byte is available, it is kept
                                    ; by the A register.
; (4) Why not to use the SBC instruction to subtract?
;     Answer: with SBC we should invert the carry bit (that holds
;     the borrow at the end of the instruction) to use it in the
;     following "ROL $03" instruction.
;     Also remember that SBC would need a CLC before it and that it
;     affects more Processor Status register bits (N, Z, C, and V).

; (5) This is a self-modified Branch which branches to different addresses during load,
;     to properly use the available byte just read.
;     It's a VERY common thing in IRQ loaders to use a self-modifying branch there.
;     When we are waiting for the FIRST Pilot Byte (to align the byte-oriented loader
;     to the bit-oriented pulse storage method), this branches to $0370.
;     When alignment was done, we need to read in the whole pilot sequence and the
;     Sync Byte, so that this branch branches to $0384.
;     When Sync Byte is found, we read a single byte we don't even use, at $0399.
;     And so on...

036E  B0 0D          BCS $037D      ; Always jumps

; -----------------------------------------------------------------------------------
0370  C9 40          CMP #$40       ; Check if this byte is the FIRST Pilot Byte
0372  D0 09          BNE $037D     
0374  A9 16          LDA #$16       
0376  8D 6D 03       STA $036D      ; Change the branch at $036C, to jump to $0384
; -----------------------------------------------------------------------------------

; This code is executed everytime we exit from the ISR (with the RTI).

0379  A9 FE          LDA #$FE       ; This will cause the "ROL $03" instruction to
037B  85 03          STA $03        ; always set Carry if a whole byte was not yet
                                    ; built in the byte buffer at $03.

037D  AD 0D DC       LDA $DC0D      ; Clear Interrupt Latch.
                                    ; This register is clear-on-read.

0380  68             PLA            ; Pop the values of A and Y registers from
0381  A8             TAY            ; the Processor stack before returning.
0382  68             PLA           

0383  40             RTI           
; ***************************************************************
; * ISR.END                                                     *
; ***************************************************************

; ********************************************
; * Read Pilot train and Sync byte           *
; ********************************************
0384  C9 40          CMP #$40       ; Read in the whole Pilot Byte sequence
0386  F0 F1          BEQ $0379      ; and stop when we read a different byte,
0388  C9 5A          CMP #$5A       ; checking if it is the Sync Byte
038A  F0 02          BEQ $038E     

038C  D0 52          BNE $03E0      ; If the Sync Byte doesn't match, retry
                                    ; alignment (seek the FIRST Pilot Byte again).

038E  A9 2B          LDA #$2B       
0390  8D 6D 03       STA $036D      ; Change the branch at $036C, to jump to $0399
0393  A9 00          LDA #$00       
0395  85 05          STA $05       
0397  F0 E0          BEQ $0379      ; (6)
; ********************************************
; * Read Pilot train and Sync byte.END       *
; ********************************************

; ********************************************
; * Read an unused byte                      *
; ********************************************
0399  A9 32          LDA #$32       ; Read byte is unused.
039B  8D 6D 03       STA $036D      ; Change the branch at $036C, to jump to $03A0
039E  D0 D9          BNE $0379      ; (6)
; ********************************************
; * Read an unused byte.END                  *
; ********************************************

; ********************************************
; * Read Header bytes                        *
; ********************************************
03A0  85 07          STA $07        ; Load header at $07..$0A:
03A2  EE A1 03       INC $03A1      ;  2 bytes: Load address
03A5  AD A1 03       LDA $03A1      ;  2 bytes: End address+1
03A8  C9 0B          CMP #$0B       
03AA  D0 CD          BNE $0379     
03AC  A9 45          LDA #$45       
03AE  8D 6D 03       STA $036D      ; Change the branch at $036C, to jump to $03B3
03B1  D0 C6          BNE $0379      ; (6)
; ********************************************
; * Read Header bytes.END                    *
; ********************************************

; ********************************************
; * Read Data bytes                          *
; ********************************************
03B3  A0 00          LDY #$00       
03B5  91 07          STA ($07),Y    ; Load data into memory
03B7  45 05          EOR $05        ; Compute checksum
03B9  85 05          STA $05       
03BB  E6 07          INC $07       
03BD  D0 05          BNE $03C4     
03BF  E6 08          INC $08       

03C1  EE 20 D0       INC $D020      ; Change the border flash base colors

03C4  A5 07          LDA $07        ; Check if we finished
03C6  C5 09          CMP $09       
03C8  A5 08          LDA $08       
03CA  E5 0A          SBC $0A       
03CC  90 AB          BCC $0379     
03CE  A9 67          LDA #$67       
03D0  8D 6D 03       STA $036D      ; Change the branch at $036C, to jump to $03D5
03D3  D0 A4          BNE $0379      ; (6)
; ********************************************
; * Read data bytes.END                      *
; ********************************************

; ********************************************
; * Read Checksum byte                       *
; ********************************************
03D5  85 06          STA $06        ; Load checksum byte

03D7  A9 FF          LDA #$FF       ; Sets the loop_break variable
03D9  85 02          STA $02       

03DB  A9 07          LDA #$07       ; Restore the vector to where store next header
03DD  8D A1 03       STA $03A1     

03E0  A9 02          LDA #$02       ; Restore the Branch at $036C to seek the FIRST
03E2  8D 6D 03       STA $036D      ; Pilot Byte.
03E5  D0 92          BNE $0379      ; (6)
; ********************************************
; * Read Checksum byte.END                   *
; ********************************************

; (6) This branch is always executed, and it's a trick to avoid using a JMP, which
;     is not relocatable (since it requires the hard memory address where to jump to,
;     instead of an offset from Program Counter, as the branch instructions do).

; ***************************************************************
; * NMI-ISR                                                     *
; * Description: keeps the CPU in a loop during which the FLAG  *
; *              line interrupts are serviced.                  *
; *              Executes code $0407 as soon as the load loop   *
; *              is over.                                       *
; ***************************************************************
03E7  58             CLI            ; Enable interrupts since we are ready to service
                                    ; our FLAG line interrupt requests.

03E8  A9 58          LDA #$58       ; Change the NOP at $02D3 into a CLI
03EA  8D D3 02       STA $02D3      ; to skip Part 2 of setup on next block load.

03ED  A9 0B          LDA #$0B       ; Show screen
03EF  8D 11 D0       STA $D011     

03F2  A5 02          LDA $02        ; Load Loop. The CPU loops here, waiting
03F4  F0 FC          BEQ $03F2      ; FLAG line interrupts to serve or a
                                    ; loop_break instruction (= performed when
                                    ; any bit in $02 memory register is set).

03F6  C6 02          DEC $02       
03F8  4C 07 04       JMP $0407     
03FB  20             
; ***************************************************************
; * NMI-ISR.END                                                 *
; ***************************************************************


It should now be clear that this loader has the following structure:

Threshold: $027C clock cycles (Tap value=$50)
Endianess: MSbF

Pilot Byte: $40
Start of payload Byte (1): $5A


  1 byte: unused
  2 bytes: Load address (LSBF)
  2 bytes: End address+1 (LSBF)


  N bytes: Loaded into RAM


  1 byte: XOR checksum

(1) better known as "Sync Byte".

By looking at the TAP file, we can also say that:

Bit 0: $36
Bit 1: $65

A quick analysis on the TAP file will also tell us if this loader uses any trailer pulse.

Loader timings

The following picture shows the details about the timings for this loader:

Terminator 2 loader timings

Terminator 2 loader timings

Figuring out the threshold value

I’m referring to those loaders using an IRQ handler routine and FLAG line(1) interrupt. For some reason, the Threshold value for them is often omitted in docs. Here is a note about how to extract it. I will refer to the ASM code of a loader I just found, which uses CIA #1, Timer B. Other loaders may use a different combination, but the results are the same.

Before changing the vector to main IRQ handler (by writing to $FFFE/$FFFF) we have:

LDA #$1F       ; disable Timer A interrupt
STA $DC0D      ; disable Timer B interrupt
               ; disable TOD clock alarm interrupt
               ; disable serial shift register interrupt
               ; disable FLAG line(1) interrupt
LDA #$A0       ; Timer B Countdown start value
LDA #$03

The loader’s own IRQ handler looks this way:


LDY #$11       ; Re-Start Timer B
STY $DC0F      ; Force latched value to be loaded to Timer B counter
               ; Timer B counts microprocessor cycles

INC $D020

EOR #$02       ; Revert bit
ROR $A9        ; Move it to MSb of $A9 (Endianess: LSbF)
BCC done       ; Whole byte read?

What this code does is to compare the Timer B Countdown value with $0200. Since the initial value is $03A0 (clock cycles), and it counts DOWN to 0, if the Countdown is at a value greater than $0200 clock cycles when a pulse received on FLAG line(1), the latter is shorter than $01A0 clock cycles. $01A0 is therefore the Threshold value (in clock cycles).

In our example, the TAP value is then:

TAP threshold byte = Threshold (in microseconds) * 0.123156 = $34

where the Threshold (in microseconds) is Threshold * 1e6/CPUFrequency.

(1) on CIA #1, this FLAG line is connected to the Cassette Read line of the Cassette Port.

CIA + Vector information

Appendix A (CIA 1)

56320-56335 $DC00-$DC0F
Complex Interface Adapter (CIA) #1 Registers

Locations 56320-56335 ($DC00-$DC0F) are used to communicate with the Complex Interface Adapter chip #1 (CIA #1). This chip allows the 6510 microprocessor to communicate with peripheral input and output devices. The specific devices that CIA #1 reads data from and sends data to are the joystick controllers, the paddle fire buttons, and the keyboard.

In addition to its two data ports, CIA #1 has two timers, each of which can count an interval from a millionth of a second to a fifteenth of a second. Or the timers can be hooked together to count much longer intervals. CIA #1 has an interrupt line which is connected to the 6510 IRQ line. These two timers can be used to generate interrupts at specified intervals (such as the 1/60 second interrupt used for keyboard scanning, or the more complexly timed interrupts that drive the tape read and write routines).

Location Range: 56320-56321 ($DC00-$DC01)
CIA #1 Data Ports A and B

Data Port B can be used as an output by either Timer A or B. It is possible to set a mode in which the timers do not cause an interrupt when they run down (see the descriptions of Control Registers A and B at 56334-5 ($DC0E-F)). Instead, they cause the output on Bit 6 or 7 of Data Port B to change. Timer A can be set either to pulse the output of Bit 6 for one machine cycle, or to toggle that bit from 1 to 0 or 0 to 1. Timer B can use Bit 7 of this register for the same purpose.

Location Range: 56324-56327 ($DC04-$DC07)
Timers A and B Low and High Bytes

These four timer registers (two for each timer) have different functions depending on whether you are reading from them or writing to them. When you read from these registers, you get the present value of the Timer Counter (which counts down from its initial value to 0). When you write data to these registers, it is stored in the Timer Latch, and from there it can be used to load the Timer Counter using the Force Load bit of Control Register A or B (see 56334-5 ($DC0E-F) below).

These interval timers can hold a 16-bit number from 0 to 65535, in normal 6510 low-byte, high-byte format (VALUE=LOW BYTE+256*HIGH BYTE). Once the Timer Counter is set to an initial value, and the timer is started, the timer will count down one number every microprocessor clock cycle. Since the clock speed of the 64 (using the American NTSC television standard) is 1,022,730 cycles per second, every count takes approximately a millionth of a second. The formula for calculating the amount of time it will take for the timer to count down from its latch value to 0 is:


where LATCH VALUE is the value written to the low and high timer registers (LATCH VALUE=TIMER LOW+256*TIMER HIGH), and CLOCK SPEED is 1,022,370 cycles per second for American (NTSC) standard television monitors, or 985,250 for European (PAL) monitors.

When Timer Counter A or B gets to 0, it will set Bit 0 or 1 in the Interrupt Control Register at 56333 ($DC0D). If the timer interrupt has been enabled (see 56333 ($DC0D)), an IRQ will take place, and the high bit of the Interrupt Control Register will be set to 1. Alternately, if the Port B output bit is set, the timer will write data to Bit 6 or 7 of Port B. After the timer gets to 0, it will reload the Timer Latch Value, and either stop or count down again, depending on whether it is in one-shot or continuous mode (determined by Bit 3 of the Control Register).

Although usually a timer will be used to count the microprocessor cycles, Timer A can count either the microprocessor clock cycles or external pulses on the CTN line, which is connected to pin 4 of the User Port.

Timer B is even more versatile. In addition to these two sources, Timer B can count the number of times that Timer A goes to 0. By setting Timer A to count the microprocessor clock, and setting Timer B to count the number of times that Timer A zeros, you effectively link the two timers into one 32-bit timer that can count up to 70 minutes with accuracy within 1/15 second.

In the 64, CIA #1 Timer A is used to generate the interrupt which drives the routine for reading the keyboard and updating the software clock. Both Timers A and B are also used for the timing of the routines that read and write tape data. Normally, Timer A is set for continuous operation, and latched with a value of 149 in the low byte and 66 in the high byte, for a total Latch Value of 17045. This means that it is set to count to 0 every 17045/1022730 seconds, or approximately 1/60 second.

For tape reads and writes, the tape routines take over the IRQ vectors. Even though the tape write routines use the on-chip I/O port at location 1 for the actual data output to the cassette, reading and writing to the cassette uses both CIA #1 Timer A and Timer B for timing the I/O routines.

56324 $DC04 TIMALO
Timer A (low byte)

56325 $DC05 TIMAHI
Timer A (high byte)

56326 $DC06 TIMBLO
Timer B (low byte)

56327 $DC07 TIMBHI
Timer B (high byte)

56333 $DC0D CIAICR
Interrupt Control Register
Bit 0: Read / did Timer A count down to 0? (1=yes)
Write/ enable or disable Timer A interrupt (1=enable, 0=disable)
Bit 1: Read / did Timer B count down to 0? (1=yes)
Write/ enable or disable Timer B interrupt (1=enable, 0=disable)
Bit 2: Read / did Time of Day Clock reach the alarm time? (1=yes)
Write/ enable or disable TOD clock alarm interrupt (1=enable, 0=disable)
Bit 3: Read / did the serial shift register finish a byte? (1=yes)
Write/ enable or disable serial shift register interrupt (1=enable, 0=disable)
Bit 4: Read / was a signal sent on the flag line? (1=yes)
Write/ enable or disable FLAG line interrupt (1=enable, 0=disable)
Bit 5: Not used
Bit 6: Not used
Bit 7: Read / did any CIA #1 source cause an interrupt? (1=yes)
Write/ set or clear bits of this register (1=bits written with 1 will be set, 0=bits written with 1 will be cleared)

This register is used to control the five interrupt sources on the 6526 CIA chip. These sources are Timer A, Timer B, the Time of Day Clock, the Serial Register, and the FLAG line. Timers A and B cause an interrupt when they count down to 0. The Time of Day Clock generates an interrupt when it reaches the ALARM time. The Serial Shift Register interrupts when it compiles eight bits of input or output. An external signal pulling the CIA hardware line called FLAG low will also cause an interrupt (on CIA #1, this FLAG line is connected to the Cassette Read line of the Cassette Port).

Even if the condition for a particular interrupt is satisfied, the interrupt must still be enabled for an IRQ actually to occur. This is done by writing to the Interrupt Control Register. What happens when you write to this register depends on the way that you set Bit 7. If you set it to 0, any other bit that was written to with a 1 will be cleared, and the corresponding interrupt will be disabled. If you set Bit 7 to 1, any bit written to with a 1 will be set, and the corresponding interrupt will be enabled. In either case, the interrupt enable flags for those bits written to with a 0 will not be affected.

For example, in order to disable all interrupts from BASIC, you could POKE 56333, 127. This sets Bit 7 to 0, which clears all of the other bits, since they are all written with 1’s. Don’t try this from BASIC immediate mode, as it will turn off Timer A which causes the IRQ for reading the keyboard, so that it will in effect turn off the keyboard.

To turn on the Timer A interrupt, a program could POKE 56333,129. Bit 7 is set to 1 and so is Bit 0, so the interrupt which corresponds to Bit 0 (Timer A) is enabled.

When you read this register, you can tell if any of the conditions for a CIA Interrupt were satisfied because the corresponding bit will be set to a 1. For example, if Timer A counts down to 0, Bit 0 of this register will be set to 1. If, in addition, the mask bit that corresponds to that interrupt source is set to 1, and an interrupt occurs, Bit 7 will also be set. This allows a multi-interrupt system to read one bit and see if the source of a particular interrupt was CIA #1. You should note, however, that reading this register clears it, so you should preserve its contents in RAM if you want to test more than one bit.

56334 $DC0E CIACRA
Control Register A

Bit 0: Start Timer A (1=start, 0=stop)
Bit 1: Select Timer A output on Port B (1=Timer A output appears on Bit 6 of Port B)
Bit 2: Port B output mode (1=toggle Bit 6, 0=pulse Bit 6 for one cycle)
Bit 3: Timer A run mode (1=one-shot, 0=continuous)
Bit 4: Force latched value to be loaded to Timer A counter (1=force load strobe)
Bit 5: Timer A input mode (1=count microprocessor cycles, 0=count signals on CNT line at pin 4 of User Port)
Bit 6: Serial Port (56332, $DC0C) mode (1=output, 0=input)
Bit 7: Time of Day Clock frequency (1=50 Hz required on TOD pin, 0=60 Hz)

Bits 0-3. This nybble controls Timer A. Bit 0 is set to 1 to start the timer counting down, and set to 0 to stop it. Bit 3 sets the timer for one-shot or continuous mode.

In one-shot mode, the timer counts down to 0, sets the counter value back to the latch value, and then sets Bit 0 back to 0 to stop the timer. In continuous mode, it reloads the latch value and starts all over again.

Bits 1 and 2 allow you to send a signal on Bit 6 of Data Port B when the timer counts. Setting Bit 1 to 1 forces this output (which overrides the Data Direction Register B Bit 6, and the normal Data Port B value). Bit 2 allows you to choose the form this output to Bit 6 of Data Port B will take. Setting Bit 2 to a value of 1 will cause Bit 6 to toggle to the opposite value when the timer runs down (a value of 1 will change to 0, and a value of 0 will change to 1). Setting Bit 2 to a value of 0 will cause a single pulse of a one machine-cycle duration (about a millionth of a second) to occur.

Bit 4. This bit is used to load the Timer A counter with the value that was previously written to the Timer Low and High Byte Registers. Writing a 1 to this bit will force the load (although there is no data stored here, and the bit has no significance on a read).

Bit 5. Bit 5 is used to control just what it is Timer A is counting. If this bit is set to 1, it counts the microprocessor machine cycles (which occur at the rate of 1,022,730 cycles per second). If the bit is set to 0, the timer counts pulses on the CNT line, which is connected to pin 4 of the User Port. This allows you to use the CIA as a frequency counter or an event counter, or to measure pulse width or delay times of external signals.

Bit 6. Whether the Serial Port Register is currently inputting or outputting data (see the entry for that register at 56332 ($DC0C) for more information) is controlled by this bit.

Bit 7. This bit allows you to select from software whether the Time of Day Clock will use a 50 Hz or 60 Hz signal on the TOD pin in order to keep accurate time (the 64 uses a 60 Hz signal on that pin).

56335 $DC0F CIACRB
Control Register B

Bit 0: Start Timer B (1=start, 0=stop)
Bit 1: Select Timer B output on Port B (1=Timer B output appears on Bit 7 of Port B)
Bit 2: Port B output mode (1=toggle Bit 7, 0=pulse Bit 7 for one cycle)
Bit 3: Timer B run mode (1=one-shot, 0=continuous)
Bit 4: Force latched value to be loaded to Timer B counter (1=force load strobe)
Bits 5-6: Timer B input mode
00 = Timer B counts microprocessor cycles
01 = Count signals on CNT line at pin 4 of User Port
10 = Count each time that Timer A counts down to 0
11 = Count Timer A 0's when CNT pulses are also present
Bit 7: Select Time of Day write (0=writing to TOD registers sets alarm, 1=writing to TOD registers sets clock)

Bits 0-3. This nybble performs the same functions for Timer B that Bits 0-3 of Control Register A perform for Timer A, except that Timer B output on Data Port B appears at Bit 7, and not Bit 6.

Bits 5 and 6. These two bits are used to select what Timer B counts. If both bits are set to 0, Timer B counts the microprocessor machine cycles (which occur at the rate of 1,022,730 cycles per second). If Bit 6 is set to 0 and Bit 5 is set to 1, Timer B counts pulses on the CNT line, which is connected to pin 4 of the User Port. If Bit 6 is set to 1 and Bit 5 is set to 0, Timer B counts Timer A underflow pulses, which is to say that it counts the number of times that Timer A counts down to 0. This is used to link the two numbers into one 32-bit timer that can count up to 70 minutes with accuracy to within 1/15 second. Finally, if both bits are set to 1, Timer B counts the number of times that Timer A counts down to 0 and there is a signal on the CNT line (pin 4 of the User Port).

Bit 7. Bit 7 controls what happens when you write to the Time of Day registers. If this bit is set to 1, writing to the TOD registers sets the ALARM time. If this bit is cleared to 0, writing to the TOD registers sets the TOD clock.

Appendix B (CIA 2)

Locations 56576-56591 ($DD00-$DD0F) are used to address the Complex Interface Adapter chip #2 (CIA #2). Since the chip itself is identical to CIA #1, which is addressed at 56320 ($DC00), the discussion here will be limited to the use which the 64 makes of this particular chip. For more general information on the chip registers, please see the corresponding entries for CIA #1.

A significant (for our purposes) difference between CIA chips #1 and #2 is that the interrupt line of CIA #1 is wired to the 6510 IRQ line, while that of CIA #2 is wired to the NMI line. This means that interrupts from this chip cannot be masked by setting the Interrupt disable flag (SEI). They can be disabled from CIA’s Mask Register, though. Be sure to use the NMI vector when setting up routines to be driven by interrupts generated by this chip.

Appendix C (VECTORS)

792-793 $318-$319 NMINV
Vector: Non-Maskable Interrupt

This vector points to the address of the routine that will be executed when a Non-Maskable Interrupt (NMI) occurs (currently at 65095 ($FE47)).

There are two possible sources for an NMI interrupt. The first is the RESTORE key, which is connected directly to the 6510 NMI line. The second is CIA #2, the interrupt line of which is connected to the 6510 NMI line.

When an NMI interrupt occurs, a ROM routine sets the Interrupt disable flag, and then jumps through this RAM vector. The default vector points to an interrupt routine which checks to see what the cause of the NMI was.

If the cause was CIA #2, the routine checks to see if one of the RS-232 routines should be called. If the source was the RESTORE key, it checks for a cartridge, and if present, the cartridge is entered at the warm start entry point. If there is no cartridge, the STOP key is tested. If the STOP key was pressed at the same time as the RESTORE key, several of the Kernal initialization routines such as RESTOR, IOINIT and part of CINT are executed, and BASIC is entered through its warm start vector at 40962. If the STOP key was not pressed simultaneously with the RESTORE, the interrupt will end without letting the user know that anything happened at all when the RESTORE key was pressed.

Since this vector controls the outcome of pressing the RESTORE key, it can be used to disable the STOP/RESTORE sequence. A simple way to do this is to change this vector to point to the RTI instruction. A simple

LDA #$C1
STA $0318

will accomplish this. To set the vector back:

LDA #$47
STA $0318

Note that this will cut out all NMIs, including those required for RS-232 I/O.

Location Range: 65530-65535 ($FFFA-$FFFF)
6510 Hardware Vectors

The last six locations in memory are reserved by the 6510 processor chip for three fixed vectors. These vectors let the chip know at what address to start executing machine language program code when an NMI interrupt occurs, when the computer is turned on, or when an IRQ interrupt or BRK occurs.

65530 $FFFA
Non-Maskable Interrupt Hardware Vector

This vector points to the main NMI routine at 65091 ($FE43).

65532 $FFFC
System Reset (RES) Hardware Vector

This vector points to the power-on routine at 64738 ($FCE2).

65534 $FFFE
Maskable Interrupt Request and Break Hardware Vectors

This vector points to the main IRQ handler routine at 65352 ($FF48).

Commodore’s ROM loader


On the Commodore 64 CBM’s ROM Loader uses 3 pulse types whose values have been observed to be close to:

  (S)hort  : TAP value $30
  (M)edium : TAP value $42
  (L)ong   : TAP value $56

In some of the literature from the 80s (e.g. Nick Hampshire’s “Commodore 64 Kernal and Hardware Revealed”) the following durations are defined, which seem to apply to VIC 20 rather than to the Commodore 64:

  (S)hort  : 2840 Hz
  (M)edium : 1953 Hz
  (L)ong   : 1488 Hz

Any definition of these durations would be more appropriately expressed in clock cycles, rather than absolute timings. The reason is that the number of clock cycles that make up each pulse is independent of the machine, where the actual duration in seconds depends on the CPU frequency of the machine itself.

Field observations suggest that either the number of clock cycles has been changed in different versions of the CBM Kernal, or the tape deck circuitry has changed, or (very unlikely but still a possibility) some files for a machine have been recorded on a different machine. In fact, we have examples of C64 files (mainly old ones) using pulses whose duration is typical of VIC 20 files. In support of these speculations we also have examples of files that miss the “end-of-data marker” -see later- at the end of certain CBM files.

Pulses are always interpreted as a pair:

  (S,M) = 0 bit
  (M,S) = 1 bit
  (L,M) = new-data marker
  (L,S) = end-of-data marker


Each data byte is organized as follows:

  (?,?) (?,?) (?,?) (?,?) (?,?) (?,?) (?,?) (?,?) (?,?) (?,?)
    |     |     |     |     |     |     |     |     |     |
    |    bit0  bit1  bit2  bit3  bit4  bit5  bit6  bit7   |
    |                                                     |
data marker                                             checkbit

So that, each byte is encoded as a sequence of 20 pulses (10 pairs):

  1 data marker:

    data finishes when an "end-of-data marker" (L,S) is met.
    Older Kernal SAVE routines do not seem to save the "end-of-data marker",
    so it has to be assumed as a non-mandatory field.

The following data is NOT present if data marker is "end-of-data marker":

  8 bits of information in LSbF format.

  1 checkbit which is computed as:

    1 XOR bit0 XOR bit1 XOR bit2 XOR bit3 XOR bit4 XOR bit5 XOR bit6 XOR bit7. 


When a VIC20 or a C64 save a file to tape with the following:

  SAVE "MY PROGRAM", 1    (relocatable program file: secondary address being
                           0 or any even number, i.e. bit 0 clear)
  SAVE "MY PROGRAM", 1, 1 (non-relocatable program file: secondary address
                           being 1 or any odd number, i.e. bit 0 set)

they create 4 files:

  silence (roughly 0.333 seconds, which allows the motor to reach full speed before recording data)


  silence (roughly 0.333 seconds, which allows the motor to reach full speed)


Additionally, if bit 1 of the secondary address is set, an End-of-tape marker is
saved after the DATA REPEATED file (load address/end address and filename are the
ones used in the first two HEADER files):

  silence (roughly 0.333 seconds, which allows the motor to reach full speed)

  HEADER - End-of-tape marker
  HEADER - End-of-tape marker, REPEATED

When a SEQuential file is saved to tape with:

  OPEN N, 1, 1, "MY SEQ DATA" (no "End-of-tape marker" is saved)
  OPEN N, 1, 2, "MY SEQ DATA" ("End-of-tape marker" is saved after all files)

  PRINT# N, "... DATA..."


it's segmented, if required, and encapsulated into HEADER files. A padding is
automatically done, if required, since HEADER payload has a standard length (191
bytes). An empty HEADER (all "File name" and "body" bytes are $20) comes before

  HEADER - SEQ file header
  HEADER - SEQ file header, REPEATED

One or more of these follow, depending on the SEQ data size:

  silence (duration is variable)

  HEADER - Data block for SEQ file
  HEADER - Data block for SEQ file, REPEATED

If an "End-of-tape marker" is requested, the OS saves an additional empty HEADER
just after the last "Data block for SEQ file":

  silence (roughly 0.333 seconds, which allows the motor to reach full speed)

  HEADER - End-of-tape marker
  HEADER - End-of-tape marker, REPEATED 


A ten second leader is written on the tape before recording of the data or program commences. This leader has two functions; first it allows the tape motor to reach the correct speed, and secondly the sequence of short pulses written on the leader is used to synchronize the read routine timing to the timing on the tape. The operating system can thus produce a correction factor which allows a very wide variation in tape speed without affecting reading.

The exact amount of short pulses is:

  - $6A00 (10 seconds) for HEADER 

Inter-record gaps

Inter-record gaps are primarily used in ASCII files and their function is to allow the tape motor time to decelerate after being turned off and accelerate to the correct speed when turned on prior to a block read or write. Each inter-record gap is approximately two seconds long and is recorded as a sequence of short pulses in the same manner as the ten second leader.

The exact amount of short pulses is:

  - $1500 (2 seconds) for DATA, and HEADER when it contains "Data block
    for SEQ file" 

Interblock gaps

There is also a gap between each file and its replication.

The exact amount of short pulses is:



It consists in a sync train (9 bytes).

Both HEADER and DATA blocks have the following sequence:

    $89 $88 $87 $86 $85 $84 $83 $82 $81

Both HEADER REPEATED and DATA REPEATED blocks have the same sequence with bit 7 clear:

    $09 $08 $07 $06 $05 $04 $03 $02 $01


For any HEADER the following information is sent after the sync sequence:

  1 Byte   : File type.

    $01= relocatable program
    $02= Data block for SEQ file
    $03= non-relocatable program
    $04= SEQ file header
    $05= End-of-tape marker

  Here starts what I refer to as HEADER "payload".
  In case File type is not $02, the following bytes have this meaning:

    2 Bytes  : Start Address (LSBF).
    2 Bytes  : End Address+1 (LSBF).
    16 Bytes : File Name (PETSCII format, padded with blanks).

  When File type is $02, SEQ file data starts immediately after File Type thus
  allowing the use of those 20 bytes to store additional data.

  After the File Name there is HEADER "body": 171 bytes, often used by commercial
  loaders to store executable loader code or any additional data and code the
  loader or program may require.
  It encapsulates Data for segmented SEQ files too, as discussed before.

  The default behaviour of the Kernal SAVE command is to pad the File Name with
  blanks so that the total length of the name portion equals 187 bytes.

  Last Byte: Data checkbyte, computed as:

    0 XOR all other HEADER bytes, from "File type" to end of "body".

  After the checkbyte there may or may not be an "end-of-data marker".


For any DATA the following information is sent after the sync sequence:

  DATA body

  Last Byte: Data checkbyte, computed as:

    0 XOR all DATA "body" bytes.

  After the checkbyte there may or may not be an "end-of-data marker".


Some trailing short pulses follow both HEADER REPEATED and DATA REPEATED. The standard amount is $4E pulses.

C64 Notes

HEADER blocks always load into the Tape Buffer at $033C.

If the File Type is relocatable program the start address for loading will be $0801 regardless of what may be written in the ‘Start Address’ field.

Header and SEQ overview

ROM loader header structure

ROM loader header structure

ROM loader SEQ data arrangement

ROM loader SEQ data arrangement


  ;[Generated by 6510 Dasm v2.1b (c)2004-05 Luigi Di Fraia]

  ;load RAM from a device
  JF49E  86  C3        STX $C3         ;set destination address from XY
         84  C4        STY $C4         
         6C  30  03    JMP ($0330)     ;load RAM (normally F4A5)

  ;standard load RAM entry
  WF4A5  85  93        STA $93         ;set load/verify switch to load
         A9  00        LDA #$00       
         85  90        STA $90         ;clear ST
         A5  BA        LDA $BA         ;if current device is keyboard (0)
         D0  03        BNE $F4B2       
  BF4AF  4C  13  F7    JMP $F713       ;indicate Illegal Device # Error

  BF4B2  C9  03        CMP #$03        ;if current device is the screen
         F0  F9        BEQ $F4AF       ;indicate error
         90  7B        BCC $F533       ;if not serial bus device
         A4  B7        LDY $B7         ;and if no filename,
         D0  03        BNE $F4BF       
         4C  10  F7    JMP $F710       ;indicate File Name Missing Error

  BF4BF  A6  B9        LDX $B9         ;move X to secondary address
         20  AF  F5    JSR $F5AF       ;handle load messages
         A9  60        LDA #$60        ;set current secondary address
         85  B9        STA $B9         
         20  D5  F3    JSR $F3D5       ;perform open of serial bus device
         A5  BA        LDA $BA         ;let A = current device
         20  09  ED    JSR $ED09       ;send TALK on serial bus
         A5  B9        LDA $B9         ;fetch secondary address
         20  C7  ED    JSR $EDC7       ;and send on serial bus
         20  13  EE    JSR $EE13       ;input a byte on serial bus
         85  AE        STA $AE         ;set I/O end address
         A5  90        LDA $90         
         4A            LSR             
         4A            LSR             
         B0  50        BCS $F530       ;if ST doesn't indicate a timeout (read)
         20  13  EE    JSR $EE13       ;input a byte on serial bus
         85  AF        STA $AF         ;set high byte of end address
         8A            TXA             
         D0  08        BNE $F4F0       ;if EOI is not low,
         A5  C3        LDA $C3         ;use destination address
         85  AE        STA $AE         ;as end address
         A5  C4        LDA $C4         ;ditto for high byte
         85  AF        STA $AF         
  BF4F0  20  D2  F5    JSR $F5D2       ;print LOAD or VERIFY
  BF4F3  A9  FD        LDA #$FD        ;clear timeout (read) bit
         25  90        AND $90         ;in ST
         85  90        STA $90         
         20  E1  FF    JSR $FFE1       ;check for Stop key
         D0  03        BNE $F501       ;if depressed
         4C  33  F6    JMP $F633       ;abort load

  BF501  20  13  EE    JSR $EE13       ;input a byte on serial bus
         AA            TAX             
         A5  90        LDA $90         ;if Timeout (read) set in ST
         4A            LSR             
         4A            LSR             
         B0  E8        BCS $F4F3       ;abort load
         8A            TXA             
         A4  93        LDY $93         ;if in verify mode
         F0  0C        BEQ $F51C       
         A0  00        LDY #$00       
         D1  AE        CMP ($AE),Y     ;compare byte read to memory
         F0  08        BEQ $F51E       
         A9  10        LDA #$10       
         20  1C  FE    JSR $FE1C       ;and set verify error on mismatch
         2C            .BYTE $2C       ;skip next instruction
  BF51C  91  AE        STA ($AE),Y     ;load byte to memory
  BF51E  E6  AE        INC $AE         ;bump load address
         D0  02        BNE $F524       
         E6  AF        INC $AF         
  BF524  24  90        BIT $90         ;if not end of file
         50  CB        BVC $F4F3       ;repeat
         20  EF  ED    JSR $EDEF       ;else send TALK on serial bus
         20  42  F6    JSR $F642       ;close serial bus
         90  79        BCC $F5A9       ;and exit
  BF530  4C  04  F7    JMP $F704       ;indicate File Not Found Error

  BF533  4A            LSR             ;if input device is not 1 (cassette)
         B0  03        BCS $F539       
         4C  13  F7    JMP $F713       ;indicate Illegal Device #

  BF539  20  D0  F7    JSR $F7D0       ;fetch tape buffer pointer
         B0  03        BCS $F541       
         4C  13  F7    JMP $F713       ;if invalid, indicate Illegal Device #

  BF541  20  17  F8    JSR $F817       ;display msgs and test buttons for read
         B0  68        BCS $F5AE       
         20  AF  F5    JSR $F5AF       ;handle load messages
  BF549  A5  B7        LDA $B7         ;if file name present
         F0  09        BEQ $F556       
         20  EA  F7    JSR $F7EA       ;search tape for file name
         90  0B        BCC $F55D       ;if no errors, continue
         F0  5A        BEQ $F5AE       ;exit if end of tape
         B0  DA        BCS $F530       ;error if not found
  BF556  20  2C  F7    JSR $F72C       ;since no file name, get next tape hdr
         F0  53        BEQ $F5AE       ;exit if end of tape found
         B0  D3        BCS $F530       ;indicate File Not found Error
  BF55D  A5  90        LDA $90         ;check ST for unrecoverable read error
         29  10        AND #$10       
         38            SEC             
         D0  4A        BNE $F5AE       ;and exit if so
         E0  01        CPX #$01        ;if not Program Header
         F0  11        BEQ $F579       
         E0  03        CPX #$03       
         D0  DD        BNE $F549       
  BF56C  A0  01        LDY #$01       
         B1  B2        LDA ($B2),Y     
         85  C3        STA $C3         ;reset load address from tape buffer
         C8            INY             
         B1  B2        LDA ($B2),Y     ;high byte also
         85  C4        STA $C4         
         B0  04        BCS $F57D       
  BF579  A5  B9        LDA $B9         
         D0  EF        BNE $F56C       
  BF57D  A0  03        LDY #$03        ;index low byte of end address
         B1  B2        LDA ($B2),Y     
         A0  01        LDY #$01       
         F1  B2        SBC ($B2),Y     ;compute length of block to load
         AA            TAX             
         A0  04        LDY #$04       
         B1  B2        LDA ($B2),Y     
         A0  02        LDY #$02       
         F1  B2        SBC ($B2),Y     
         A8            TAY             
         18            CLC             
         8A            TXA             
         65  C3        ADC $C3         
         85  AE        STA $AE         ;and set the end address of I/O area
         98            TYA             
         65  C4        ADC $C4         
         85  AF        STA $AF         
         A5  C3        LDA $C3         
         85  C1        STA $C1         ;set tape load address
         A5  C4        LDA $C4         
         85  C2        STA $C2         
         20  D2  F5    JSR $F5D2       ;display load messages
         20  4A  F8    JSR $F84A       ;load from cassette
         24            .BYTE $24       ;skip next instruction
  BF5A9  18            CLC             ;clear error flag
         A6  AE        LDX $AE         ;exit with end address in XY
         A4  AF        LDY $AF         
  BF5AE  60            RTS             


  ;get next file header from cassette
  SF72C  A5  93        LDA $93         ;save load/verify switch on stack
         48            PHA             
         20  41  F8    JSR $F841       ;read a block from tape
         68            PLA             
         85  93        STA $93         ;restore load/verify flag
         B0  32        BCS $F769       ;exit if read error
         A0  00        LDY #$00       
         B1  B2        LDA ($B2),Y     ;get first character in tape buffer
         C9  05        CMP #$05        ;if code for End of Tape
         F0  2A        BEQ $F769       ;return
         C9  01        CMP #$01       
         F0  08        BEQ $F74B       ;if not code for Program Header
         C9  03        CMP #$03        ;or "?"
         F0  04        BEQ $F74B       
         C9  04        CMP #$04       
         D0  E1        BNE $F72C       ;or Data Header, try next block
  BF74B  AA            TAX             
         24  9D        BIT $9D         ;if in direct mode,
         10  17        BPL $F767       
         A0  63        LDY #$63        ;point to message FOUND
         20  2F  F1    JSR $F12F       ;and print it
         A0  05        LDY #$05       
  BF757  B1  B2        LDA ($B2),Y     
         20  D2  FF    JSR $FFD2       ;print a file name character
         C8            INY             
         C0  15        CPY #$15        ;and repeat
         D0  F6        BNE $F757       ;for all characters
         A5  A1        LDA $A1         
         20  E0  E4    JSR $E4E0       ;pause
         EA            NOP             ;filler for patch
         18            CLC             
         88            DEY             
         60            RTS             


  ;read a block from cassette
  SF841  A9  00        LDA #$00       
         85  90        STA $90         ;clear ST
         85  93        STA $93         ;set load/verify switch to load
         20  D7  F7    JSR $F7D7       ;set tape buffer to I/O area
  SF84A  20  17  F8    JSR $F817       ;handle msgs and test sense for read
         B0  1F        BCS $F86E       
         78            SEI             ;disable IRQ
         A9  00        LDA #$00       
         85  AA        STA $AA         ;set gap
         85  B4        STA $B4         ;set no sync estabilished
         85  B0        STA $B0         ;set no special speed correction yet
         85  9E        STA $9E         ;initialize error log index for pass 1
         85  9F        STA $9F         ;and pass2
         85  9C        STA $9C         ;set no byte available yet
         A9  90        LDA #$90        ;set Flag mask
         A2  0E        LDX #$0E        ;index for cassette read IRQ address
         D0  11        BNE $F875       ;JMP

  ;write a block to cassette
  SF864  20  D7  F7    JSR $F7D7       ;initialize tape buffer pointer
  SF867  A9  14        LDA #$14
         85  AB        STA $AB         ;20 sync patterns
  SF86B  20  38  F8    JSR $F838       ;test sense and display msgs for output
  BF86E  B0  6C        BCS $F8DC
         78            SEI
         A9  82        LDA #$82        ;mask for ICR1 to honor TB1
         A2  08        LDX #$08        ;IRQ index for cassette write, part 1

  ;common code for cassette read & write
  BF875  A0  7F        LDY #$7F       
         8C  0D  DC    STY $DC0D       ;clear any pending mask in ICR1
         8D  0D  DC    STA $DC0D       ;then set mask for TB1
         AD  0E  DC    LDA $DC0E
         09  19        ORA #$19        ;+force load, one shot and TB1 to CRA1
         8D  0F  DC    STA $DC0F       ;to form CRB1
         29  91        AND #$91
         8D  A2  02    STA $02A2       ;and CRB1 activity register
         20  A4  F0    JSR $F0A4       ;condition flag bit in ICR2
         AD  11  D0    LDA $D011       
         29  EF        AND #$EF       
         8D  11  D0    STA $D011       ;disable the screen
         AD  14  03    LDA $0314       ;save standard IRQ vector
         8D  9F  02    STA $029F
         AD  15  03    LDA $0315
         8D  A0  02    STA $02A0
         20  BD  FC    JSR $FCBD       ;set new IRQ for cassette depending on X
         A9  02        LDA #$02
         85  BE        STA $BE         ;select phase 2
         20  97  FB    JSR $FB97       ;initialize cassette I/O variables
         A5  01        LDA $01
         29  1F        AND #$1F
         85  01        STA $01         ;start cassette motor
         85  C0        STA $C0         ;set tape motor interlock
         A2  FF        LDX #$FF       
  BF8B5  A0  FF        LDY #$FF       
  BF8B7  88            DEY             
         D0  FD        BNE $F8B7       ;delay 0.3 seconds
         CA            DEX             
         D0  F8        BNE $F8B5       
         58            CLI             
  BF8BE  AD  A0  02    LDA $02A0       ;test high byte of IRQ save area
         CD  15  03    CMP $0315       ;to determine if end of I/O
         18            CLC             
         F0  15        BEQ $F8DC       ;exit if so
         20  D0  F8    JSR $F8D0       ;else test Stop key
         20  BC  F6    JSR $F6BC       ;scan keyboard
         4C  BE  F8    JMP $F8BE       ;repeat


  ;set IRQ vector depending upon X
  SFCDB  BD  93  FD    LDA $FD9B-8,X   ;move low byte of address
         8D  14  03    STA $0314       ;into low byte of IRQ vector
         BD  94  FD    LDA $FD9B-7,X   ;then do high byte
         8D  15  03    STA $0315       
         60            RTS


  ;IRQ vectors
         .WORD $FBDC
         .WORD $EA31
         .WORD $F92C


  ;cassette read IRQ routine
  BF92C  AE  07  DC    LDX $DC07       ;get TBH1
         A0  FF        LDY #$FF       
         98            TYA             ;and the complement of TBL1
         ED  06  DC    SBC $DC06       ;(time elapsed)
         EC  07  DC    CPX $DC07       ;if high byte not steady,
         D0  F2        BNE $F92C       ;repeat
         86  B1        STX $B1         ;else save high byte
         AA            TAX             
         8C  06  DC    STY $DC06       ;reset TBL1 to maximum
         8C  07  DC    STY $DC07       ;ditto TBH1
         A9  19        LDA #$19        ;force load, one-shot and Timer B
         8D  0F  DC    STA $DC0F       ;into CRB1
         AD  0D  DC    LDA $DC0D       
         8D  A3  02    STA $02A3       ;save ICR1
         98            TYA             
         E5  B1        SBC $B1         ;complement high byte
         86  B1        STX $B1         ;save low byte
         4A            LSR             ;elapsed time in A, ZB1
         66  B1        ROR $B1         ;/ 2
         4A            LSR             
         66  B1        ROR $B1         ;/ 4
         A5  B0        LDA $B0         ;get speed correction
         18            CLC             
         69  3C        ADC #$3C        ;+240 microseconds
         C5  B1        CMP $B1         ;if cycle shorter
         B0  4A        BCS $F9AC       ;dismiss
         A6  9C        LDX $9C         ;if byte available
         F0  03        BEQ $F969       
         4C  60  FA    JMP $FA60       ;receive it

  BF969  A6  A3        LDX $A3         ;test bit count and if beyond last bit,
         30  1B        BMI $F988       ;do end of byte
         A2  00        LDX #$00        ;assume bit value of 0
         69  30        ADC #$30        ;add 432 microseconds
         65  B0        ADC $B0         ;+ 2 * speed correction
         C5  B1        CMP $B1         ;if cycle shorter
         B0  1C        BCS $F993       ;record a 0
         E8            INX             ;assume bit value of 1
         69  26        ADC #$26        ;get 584 microseconds
         65  B0        ADC $B0         ;+ 3 * speed correction
         C5  B1        CMP $B1         ;if cycle shorter
         B0  17        BCS $F997       ;record a 1
         69  2C        ADC #$2C        ;get 760 microseconds
         65  B0        ADC $B0         ;+ 4 * speed correction
         C5  B1        CMP $B1         ;if cycle shorter
         90  03        BCC $F98B       
  BF988  4C  10  FA    JMP $FA10       ;go do end of byte

  BF98B  A5  B4        LDA $B4         ;if sync estabilished
         F0  1D        BEQ $F9AC       
         85  A8        STA $A8         ;set erroneous bits
         D0  19        BNE $F9AC       
  BF993  E6  A9        INC $A9         ;for a 0, increment 0/1 balance
         B0  02        BCS $F999       
  BF997  C6  A9        DEC $A9         ;for a 1, decrement 0/1 balance
  BF999  38            SEC             
         E9  13        SBC #$13        ;0/1 cutoff level
         E5  B1        SBC $B1         ;-cycle width
         65  92        ADC $92         
         85  92        STA $92         ;accumulated for speed correction
         A5  A4        LDA $A4         
         49  01        EOR #$01        ;flip cycle indication
         85  A4        STA $A4         
         F0  2B        BEQ $F9D5       ;if first cycle,
         86  D7        STX $D7         ;save bit value
  BF9AC  A5  B4        LDA $B4         ;if no sync yet
         F0  22        BEQ $F9D2       ;return from IRQ
         AD  A3  02    LDA $02A3       ;if ICR1 mask
         29  01        AND #$01       
         D0  05        BNE $F9BC       
         AD  A4  02    LDA $02A4       ;and last CRA1 mask shows no TA1 flag
         D0  16        BNE $F9D2       ;exit from IRQ
  BF9BC  A9  00        LDA #$00       
         85  A4        STA $A4         ;clear cycle count
         8D  A4  02    STA $02A4       ;and last CRA1 mask
         A5  A3        LDA $A3         ;if bit count indicated end of byte,
         10  30        BPL $F9F7       
         30  BF        BMI $F988       ;go do end of byte
  BF9C9  A2  A6        LDX #$A6       
         20  E2  F8    JSR $F8E2       ;schedule timer
         A5  9B        LDA $9B         ;if parity calculated does not match
         D0  B9        BNE $F98B       ;set erroneous bit flag
  BF9D2  4C  BC  FE    JMP $FEBC       ;exit from IRQ

  BF9D5  A5  92        LDA $92         ;if second cycle
         F0  07        BEQ $F9E0       ;check accumulated over/under time
         30  03        BMI $F9DE       
         C6  B0        DEC $B0         
         2C            .BYTE $2C       ;skip next instruction
  BF9DE  E6  B0        INC $E0         ;adapt speed correction accordingly
  BF9E0  A9  00        LDA #$00       
         85  92        STA $92         ;reset accumulated over/under time
         E4  D7        CPX $D7         ;if 2nd cycle = complement of cycle 1
         D0  0F        BNE $F9F7       ;include bit
         8A            TXA             
         D0  A0        BNE $F98B       ;if two 0 cycles
         A5  A9        LDA $A9         ;and 0/1 balance
         30  BD        BMI $F9AC       
         C9  10        CMP #$10        ;at least 16 "0" cycles extra
         90  B9        BCC $F9AC       
         85  96        STA $96         ;set sync detected
         B0  B5        BCS $F9AC       
  BF9F7  8A            TXA             
         45  9B        EOR $9B         ;calculate parity
         85  9B        STA $9B         
         A5  B4        LDA $B4         ;if no sync yet
         F0  D2        BEQ $F9D2       ;exit
         C6  A3        DEC $A3         ;decrement pending bit count
         30  C5        BMI $F9C9       ;after last bit, check parity
         46  D7        LSR $D7         ;include bit
         66  BF        ROR $BF         ;in byte being read
         A2  DA        LDX #$DA       
         20  E2  F8    JSR $F8E2       ;schedule timer
         4C  BC  FE    JMP $FEBC       ;exit from IRQ

  BFA10  A5  96        LDA $96         ;if sync detected
         F0  04        BEQ $FA18       
         A5  B4        LDA $B4         ;and not yet estabilished
         F0  07        BEQ $FA1F       
  BFA18  A5  A3        LDA $A3         ;or last bit done
         30  03        BMI $FA1F       
         4C  97  F9    JMP $F997       ;allow byte reception

  BFA1F  46  B1        LSR $B1         ;compute new speed correction value
         A9  93        LDA #$93       
         38            SEC             
         E5  B1        SBC $B1         
         65  B0        ADC $B0         
         0A            ASL             
         AA            TAX             
         20  E2  F8    JSR $F8E2       ;schedule timer
         E6  9C        INC $9C         ;indicate byte available
         A5  B4        LDA $B4         ;if not yet estabilished
         D0  11        BNE $FA44       
         A5  96        LDA $96         ;but sync detected
         F0  26        BEQ $FA5D       
         85  A8        STA $A8         ;set error bits
         A9  00        LDA #$00       
         85  96        STA $96         ;clear sync detected
         A9  81        LDA #$81        ;set TA1 bit
         8D  0D  DC    STA $DC0D       ;in ICR1
         85  B4        STA $B4         ;set sync estabilished
  BFA44  A5  96        LDA $96         ;move sync status
         85  B5        STA $B5         ;to saved sync status
         F0  09        BEQ $FA53       
         A9  00        LDA #$00        ;if not detected,
         85  B4        STA $B4         ;indicate sync not estabilished
         A9  01        LDA #$01       
         8D  0D  DC    STA $DC0D       
  BFA53  A5  BF        LDA $BF         ;clear TA mask in ICR1
         85  BD        STA $BD         ;save byte read
         A5  A8        LDA $A8         
         05  A9        ORA $A9         ;accumulate possible errors
         85  B6        STA $B6         
  BFA5D  4C  BC  FE    JMP $FEBC       ;exit from IRQ


  ;schedule CIA1 Timer A depending on parameter in X
  SF8E2  86  B1        STX $B1         ;save entry parameter
         A5  B0        LDA $B0         ;get speed correction
         0A            ASL             ;* 2
         0A            ASL             ;* 4
         18            CLC             
         65  B0        ADC $B0         ;add speed correction
         18            CLC             
         65  B1        ADC $B1         ;and parameter
         85  B1        STA $B1         ;save low order
         A9  00        LDA #$00       
         24  B0        BIT $B0         ;if speed correction is positive
         30  01        BMI $F8F7       
         2A            ROL             ;set high oreder in A
  BF8F7  06  B1        ASL $B1         ;* 2
         2A            ROL             
         06  B1        ASL $B1         ;* 4
         2A            ROL             
         AA            TAX             
  BF8FE  AD  06  DC    LDA $DC06       ;wait until no change of
         C9  16        CMP #$16        ;TBL1 changing
         90  F9        BCC $F8FE       ;while it still must be read
         65  B1        ADC $B1         :add low order offset to TBL1
         8D  04  DC    STA $DC04       ;and store in TAL1
         8A            TXA             
         6D  07  DC    ADC $DC07       ;add high order offset to TBH1
         8D  05  DC    STA $DC05       ;and store in TAH1
         AD  A2  02    LDA $02A2       
         8D  0E  DC    STA $DC0E       ;set CRA1 from CRB1 activity register
         8D  A4  02    STA $02A4       ;and save it
         AD  0D  DC    LDA $DC0D       
         29  10        AND #$10       
         F0  09        BEQ $F92A       ;if Flag bit is not set
         A9  F9        LDA #$F9        ;set exit address on stack
         48            PHA             
         A9  2A        LDA #$2A       
         48            PHA             
         4C  43  FF    JMP $FF43       ;and simulate an IRQ

  BF92A  58            CLI             ;else allow IRQ and exit
         60            RTS 

Adding new scanners to TAPClean

This is a cookbook intended to be used by FinalTAP and TAPClean scanner designers. It gives guidelines and code examples to follow when writing NEW scanners. To integrate your new scanner inside the above mentioned tools check the document about adding new scanners to FinalTAP (Stewart Wilson).

Definition of Terms

File: plain data, ie. the information itself (a picture, a tune, a program, etc).
Chunk: the wrapped version of a file, as found on tapes. Usually this means a stream of pulses made up of a lead-in sequence, a sync pattern, usually a header section, a data section, and possibly a trailer.


To be aware of what’s going on here, we need to make a step back and point out what loader designers had to bear in mind before writing their own loader.

Basically, to encode data on a sequential media, the following things have to be provided:

  • a way to encode bits
  • a way to recognize the start of a chunk of data while reading it in from tape
  • a way to do a byte alignment while reading in a sequence of bits from tape

Usually, but not always, commercial tape loaders use a frequency shift keying (FSK) with just two frequencies. That is: bit 0 is encoded with a shorter duration (higher frequency) square wave and bit 1 with a longer duration (lower frequency) one.
To break down information into a stream of bits and sequentially write these to tape, it is necessary to choose if it’s the Most Significant bit (MSb) that has to be written first and then each subsequent one, up to the Least Significant bit (LSb), or the other way round. That’s usually referred to as endianness, and therefore endianness is either MSb First (MSbF) or LSb First (LSbF).

Let’s assume each piece of information (i.e. different files) has been broken down into different streams of bits (i.e. different chunks) and saved to tape. How do we know where each chunk starts? The main part of loaders use a pattern that tells them a new chunk is beginning. That pattern is known as lead-in sequence. It can be a sequence of the same byte value repeated quite many times, or the same bit value. As soon as that value changes into some known value (referred to as sync pattern), the loader is said to have done a complete synchronization with the stream coming from tape. In other words, the loader can be sure that what follows is the information that had been previously saved to tape.
The information (i.e. the files) can then be loaded into the computer memory for being used.

Usually the loader itself has to be loaded into the computer memory from tape, so there must be a built-in loader into the computer’s ROM that can load a standard chunk from tape and execute it. In turn, the newly loaded code can load subsequent chunks using a different keying mechanism (that’s why this code is referred to as “tape loader” or “turbo loader”, the latter due to the fact that a custom loader is used to load data faster than the built-in loader).
The built-in loader is often referred to as “CBM tape loader” or “ROM Loader”. It’s the one loader that is executed when one types LOAD at the BASIC interpreter. It is beyond the scope of this document to illustrate how a turbo loader is stored inside a standard chunk and executed. If you are interested in that piece of information be sure to read the article about commercial turbo loaders.

Scanner Design

Bear in mind that a scanner is the product of reverse engineering of the turbo loader code AND inspection of the TAP file. You need to be proficient in 65xx ASM and CIA to do so. Again, check the article about commercial turbo loaders if you would like some help with that.

A FinalTAP or TAPClean scanner is composed of two sections: a search section and a describe section. The search section of each active scanner is run first and used to identify the chunks within the TAP file that use each supported loader. In this way, identified chunks can be correctly decoded (or described) at a later stage.

The search section attempts to recognize a turbo chunk by hunting for its known structure within the whole TAP file data: lead-in sequence + sync pattern, and size of the chunk based on the file length retrieved from the chunk header or from the standard chunk.

Once a chunk has been recognized, it has to be added to the internal database of recognized chunks by means of the function addblockdef(int lt, int sof, int sod, int eod, int eof, int xi). The meaning of those parameters is as per below:

  • lt is chunk type, as declared in an enum in mydefs.h
  • sof is the tape image offset of the first pulse that belongs to the chunk
  • sod is the tape image offset of the first pulse that belongs to data section
  • eod is the tape image offset of the pulse corresponding to the first bit of the last byte of the data section (that includes the data checksum, if any is present after data)
  • eof is the tape image offset of the last pulse that belongs to the chunk. That is usually the last pulse of the trailer if there’s one, otherwise it equals to eod
  • xi is an extra information parameter, a 32bit value, used to pass information to the describe section.

More recently (May 2011) the function addblockdefex() has been added, which takes an extra parameter: addblockdefex(int lt, int sof, int sod, int eod, int eof, int xi, int meta1). The extra parameter is described below:

  • meta1 is used for additional information exchange between the search and describe stages where xi alone is not enough.

It is recommended NOT to set xi and meta1 to pointers for data allocated via malloc().

Different scenarios

Now we can talk about the different scenarios that are likely to be found when reverse engineering one of those turbo loaders and looking at the distribution of values within the tape image.
What you should end up with, is a table of information like the following one specific for Accolade turbo loader:

Threshold: 0x01EA (490) clock cycles (TAP value: 0x3D)
Bit 0 pulse: 0×29 (average value)
Bit 1 pulse: 0x4A (average value)
Endianness: MSbF

Pilot byte: 0x0F (amount of bytes: 8)
Sync byte: 0xAA


  • 16 bytes: Filename
  • 02 bytes: Load address (LSBF)
  • 02 bytes: Data size (LSBF)
  • 01 byte : XOR Checksum of all Header bytes


  • Data is split in sub-blocks of 256 bytes each, or less for the last one.
  • Each sub-block is followed by its XOR checksum byte. There are no pauses between sub-blocks.

Trailer: 8 Bit 0 pulses + 1 longer pulse.

The way we give FinalTAP and TAPClean the information contained above is by means of a fmt array entry. Based on the above table, the entry for this loader inside the fmt array is the following one:

/* name,     en,   tp,   sp,   mp,  lp,   pv,   sv,   pmin, pmax, has_cs. */
{"ACCOLADE", MSbF, 0x3D, 0x29, NA,  0x4A, 0x0F, 0xAA, 4,    NA,   CSYES},


  • en is endianness,
  • tp is threshold (TAP value)
  • sp is bit 0 pulse (or short pulse for those loaders that use 3 pulses to encode data)
  • mp is med pulse (only significant for those turbo loaders that use 3 pulses to encode data)
  • lp is bit 1 pulse (or long pulse for those loaders that use 3 pulses to encode data)
  • pv is pilot (i.e. lead-in) value, it may be a byte or a bit
  • sv is sync value, it may be a byte or a bit (note that this is just the first one in case there’s a sync pattern made up of multiple bytes)
  • pmin is the minimum amount of pilot bytes requested for a chunk to be identified during the search stage. The suggested value for pmin is 1/2 of the pilot size usually found on TAPs for very short pilot sequences (e.g. 8 bytes) and 3/4 of the pilot size for longer pilot sequences.
  • pmax is the maximum number of pilot bytes to be used during the search stage. It is not usually used. Experienced designers can use this value to gather additional control over thesearch stage.
  • has_cs is the flag with which we tell the program if the chunk in question has got a data checksum, so that it can give us correct stats about failed checksum checks.

Each search section uses common code, thanks to the definition of THISLOADER. This means that a designer can safely copy and paste code from an existing scanner to a new one, without being concerned about moving scanner specific information into the new scanner. Usually there’s no need to change this convenient way to do things. If your new scanner is going to support more than one variant, please use a variable (eg. variant) and some emums to describe the variants. THISLOADER has been introduced to index the ft array only.



en = ft[THISLOADER].en;
tp = ft[THISLOADER].tp;
sp = ft[THISLOADER].sp;
lp = ft[THISLOADER].lp;
sv = ft[THISLOADER].sv;


for (i = 20; i > 0 && i < tap.len - BITSINABYTE; i++) {

	eop = find_pilot(i, THISLOADER);

	if (eop > 0) {

		/* Valid pilot found, mark start of file */
		sof = i;
		i = eop;

		/* Check if there's a valid sync byte for this loader */
		if (readttbyte(i, lp, sp, tp, en) != sv)

		/* Valid sync found, mark start of data */


Based on additional turbo chunk inspection, you should be able to provide the following information (the meaning of each field is discussed later on) in your C source code. This comes handy when you need to write a new scanner copying code from the new available scanners:

 * Status: Beta
 * CBM inspection needed: No
 * Single on tape: No
 * Sync: Byte
 * Header: Yes
 * Data: Sub-blocks
 * Checksum: Yes (for each sub-block)
 * Post-data: No
 * Trailer: Yes
 * Trailer homogeneous: Yes (bit 0 pulses)

That’s it: if you need code for a new scanner that uses one sync byte, or that has a header, ot whose data is divided in sub blocks, just get the code from this file, both for the search and the describe sections.
Yes, it is really THAT easy, and it is the reason for which I designed the new scanners the way they are.

CBM inspection needed

One option for tape loader designers was to store both the turbo loader code and a table with information about how many files to load from tape (and where in RAM to load them) inside the standard chunk. That’s one approach, and it causes us serious headaches if the table is encrypted or placed in a point inside the standard chunk that changes on a per tape basis rather than being loader specific. I.e.: different tapes may use the same encoding scheme, the same loader code, but they may store that table in different locations.
The other (clever) option was to place information about each file inside the chunk, thus providing what’s often called a chunk “header” (it may also be in a chunk of its own, of course). That header can contain, as example, the name of the data file that follows, where in RAM to load it, and how many bytes to load (or, equivalently, which is the location of the last byte to load). If each file has got its header, we don’t have to bother seeking table entries inside the standard chunk.
We will refer to this different way to do things saying if “CBM inspection needed” is yes or no.

A way to pass information from the search to the describe routines in FinalTAP and TAPClean is to use the extended info field of the blk structure (the single unit of the file database). So that, once we extract information from the standard chunk during the search stage, we do not have to extract it again in the describe stage.
The extra info field is a 32bit integer in which we can pack two 16 bit values. Of course it is mainly intended to pack together load address and end address to pass to the describefunction. Expert designers may find it useful to pack different information as well.

A snippet of code from cult.c follows:

/* Store the info read from CBM part as extra-info */
xinfo = s + (e << 16);

A more complex scenario can be found in actionreplay.c, which shows this this field can be reused for different purposes:

/* Pass details over to the describe stage */
xinfo = dt << 24; /* threshold */
xinfo |= (hd[CHKBYOFFSET]) << 16; /* checksum */
xinfo |= e; /* end address */

Single on tape

Let’s assume we are unlucky: the turbo chunks do not contain any header. All the information is inside the standard chunk. If the turbo chunk is unique on tape, we may find the information about it inside the standard chunk and that’s it. But what if there are more than one standard chunk on the tape each followed by a turbo chunk whose load details are in its respective standard chunk? In FinalTAP and TAPClean, standard chunk are recognized on the first scan process and acknowledged, so that it may be harder at a later time when searching for a turbo chunk to know which is the standard chunk for that chunk.
That’s why we have to know if the turbo file is “Single on tape” or not.
An example of the worst scenario occurs with Biturbo: CBM inspection is needed because there are usually multiple files on the same tape (usually magazine tapes). A brilliant technique has been developed to process this case: we don’t need to create new (buggy) code for that.


After the lead-in sequence has been read, a sync pattern is expected, so that if it’s found the loader can reliably read in the following data, or even first read backwards and acknowledge a sequence of pre-pilot bytes (check the CHR scanner for an example).
If a sync pattern is not found, there’s probably a disturb in the lead-in sequence that is not a sync pattern, not yet. Therefore the scanner (just as the loader code does itself) has to go back one step and try to read in the remaining part of the lead-in sequence. Eventually a sync pattern will be found.
The sync pattern can be just a bit (usually it’s the other value than the one used by a lead-in sequence that consists in a sequence of the same bit value repeated), or a byte, or a sequence of bytes (ie. 2 or more bytes).
Since the code to read in those different sync patterns depends on the pattern itself, we have to specify if it is: a bit, a Byte, or a sequence (either of bits or Bytes). Then we can copy and paste the right code from an existing scanner to do the job.

One example that uses a pattern of bytes follows:

#define SYNCSEQSIZE	17	/* amount of sync bytes */


/* Expected sync pattern */
static int sypat[SYNCSEQSIZE] = {
	0x10, 0x0F, 0x0E, 0x0D, 0x0C, 0x0B, 0x0A, 0x09, 
	0x08, 0x07, 0x06, 0x05, 0x04, 0x03, 0x02, 0x01,
int match;			/* condition variable */


/* Decode a byte sequence (possibly a valid sync train) */
for (h = 0; h < SYNCSEQSIZE; h++)
	pat[h] = readttbyte(i + (h * BITSINABYTE), lp, sp, tp, en);

/* Note: no need to check if readttbyte is returning -1, for 
         the following comparison (DONE ON ALL READ BYTES)
         will fail all the same in that case */

/* Check sync train. We may use the find_seq() facility too */
for (match = 1, h = 0; h < SYNCSEQSIZE; h++)
	if (pat[h] != sypat[h])
		match = 0;

/* Sync train doesn't match */
if (!match)

/* Valid sync train found, mark start of data */


As I told before, turbo chunks can have a small piece of information that may contain the file name/ID, load address, data size/end address, and additional information. If there’s a header, there’s a common code to read it in and some define(s) to describe its structure (length and contents).
The presence of a header with load address and data size/end address may be crucial if there are multiple turbo files of the same type on one tape. If those files lack a header, it means the loader has a table of addresses that is used to load the turbo files in RAM. One part of this table may be inside the standard chunk, so that we can retrieve it easily, but other records may be anywhere in the following turbo files, in which case that information is too hard to retrieve.

One example of header being inside each chink is given below:

#define HEADERSIZE	4	/* size of block header */

#define LOADOFFSETH	1	/* load location (MSB) offset header */
#define LOADOFFSETL	0	/* load location (LSB) offset header */
#define ENDOFFSETH	3	/* end location (MSB) offset inside header */
#define ENDOFFSETL	2	/* end location (LSB) offset inside header */


/* Read header */
for (h = 0; h < HEADERSIZE; h++) {
	hd[h] = readttbyte(sod + h * BITSINABYTE, lp, sp, tp, en);
	if (hd[h] == -1)
if (h != HEADERSIZE)

/* Extract load and end locations */
s = hd[LOADOFFSETL] + (hd[LOADOFFSETH] << 8);
e = hd[ENDOFFSETL]  + (hd[ENDOFFSETH]  << 8);

// Prevent int wraparound when subtracting 1 from end location
if (e == 0)
	e = 0xFFFF;

/* Plausibility check */
if (e < s)

Another example is the following one:

#define HEADERSIZE	5	/* size of block header */

#define FILEIDOFFSET	0	/* file ID offset inside header */
#define LOADOFFSETH	2	/* load location (MSB) offset inside header */
#define LOADOFFSETL	1	/* load location (LSB) offset inside header */
#define DATAOFFSETH	4	/* data size (MSB) offset inside header */
#define DATAOFFSETL	3	/* data size (LSB) offset inside header */


/* Read header */
for (h = 0; h < HEADERSIZE; h++) {
	hd[h] = readttbyte(sod + h * BITSINABYTE, lp, sp, tp, en);
	if (hd[h] == -1)
if (h != HEADERSIZE)

/* Extract load location and size */
s = hd[LOADOFFSETL] + (hd[LOADOFFSETH] << 8);
x = hd[DATAOFFSETL] + (hd[DATAOFFSETH] << 8);

/* Compute C64 memory location of the _LAST loaded byte_ */
e = s + x - 1;

/* Plausibility check */
if (e > 0xFFFF)


Data is usually a continuous sequence of bytes but sometimes it was splitted into sub-blocks inside the same turbo chunk, separated by a checksum value for each sub-block. In the latter case there’s usually a checksum byte every each 256 bytes of data. So that the search section must take into account the overload produced by those checksums that results in a data section inside the chunk longer than data size.
Turbo loaders that use sub-blocks, among the others: Accolade and Ocean new 4. An example of the overload calculation is provided below, from the accolade.c scanner.

/* Compute size */
x = e - s + 1;

/* Compute size overload due to internal checksums */
bso = x / 256;
if (x % 256)

/* Point to the first pulse of the last checkbyte (that's final) */
/* Note: - 1 because "bso" also includes the last checkbyte! */
eod = sod + (HEADERSIZE + x + bso - 1) * BITSINABYTE;

/* Initially point to the last pulse of the last checkbyte */
eof = eod + BITSINABYTE - 1;


Some loader designers decided to protect data with one or more checksums, so that if the calculated checksum is not matching the expected one, probably a problem occurred while loading data from tape. If a load error occurred, the data cannot be reliably used. Some loaders just cause a soft reset if a load error occurred, other ones give the user the chance to try and reload the file from its beginning.
The presence of a checksum is crucial for once we find out all checksums in a tape match the expected values, we can almost surely say that data integrity has not been compromised at any point of the digitalization process nor by time.


Some turbo chunks have additional bytes just after the data section and they are not checksums. Sometimes they are just padding bytes. Sometimes they can be used to detect when the data chunk finished, which is useful in those cases different loaders use a similar data structure, but just one has got those additional bytes (example: TDI F2 and TDI F1).


Wise loader designers put some lead-out bytes just after the data section, to be sure data was properly read in. We usually know which is the total amount of trailer bytes, so it’s good to check for that number at most. The reason is that sometimes there may be no evident separator between the trailer of one chunk and the lead-in sequence of the following chunk. On a real C64 that’s not a problem, because the trailer is not read in and decoded. FinalTAP and TAPClean have to read it in to acknowledge it, so that we have to care about reading in the correct amount of trailer pulses.

#define MAXTRAILER	8	/* max amount of trailer pulses read in */

Trailer homogeneous

Some loaders use an homogeneous trailer, made up of bit 1 or bit 0 pulses, some others use just a combination of both.
Some examples follow.

Homogeneous trailer, made up of short pulses:

/* Trace 'eof' to end of trailer (bit 0 pulses only) */
h = 0;
while (eof < tap.len - 1 &&
		h++ < MAXTRAILER &&
		readttbit(eof + 1, lp, sp, tp) == 0)

Non homogeneous trailer:

/* Trace 'eof' to end of trailer (both bit 1 and bit 0 pulses) */
h = 0;
while (eof < tap.len - 1 &&
		h++ < MAXTRAILER &&
		readttbit(eof + 1, lp, sp, tp) >= 0)


Do not copy and paste code from the old scanners. If any feature you need in your own scanner is not available within the new scanners, just ask me to help with that. Old code is inconsistent and partly buggy.
New code is consistent and robust. Consistent means that the same thing is done always the same way, variables have always the same name, scope, and usage, so that it is easier to read and debug the new code. Robust means we learned from who came before us, while fixings their bugs, so that we got rid of those bugs in the new code.
If you end up with a scanner of your own by copying from old scanners, do NOT expect:

  • me to help with issues that arise with it or to debug your code, and
  • FinalTAP and TAPClean maintainers to insert it in the development trees

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s