Monday, 20 March 2017

Doing it in less than Bill Gates. Loading Microsoft 4K BASIC on ALTAIR 8800 with Papertape

Two years ago, I demonstrated the loading of Micro-Soft's first software product on their first platform.  If you didn't already know, it was BASIC for the ALTAIR 8800 computer.  The story is well known in the vintage computer community.  It was a real success for Bill Gates and Paul Allen and they are nothing less than artists!

In preparation for the demonstration at World of Commodore 2014, I found there were multiple bootstrap programs.  It depended on hardware.  The manual has two, a 21 byte and a 20 byte version.  Over the course of a month, I studied the boot strap and tape in detail and found it really just loads another bootstrap program with better capabilities.  It may seem that 20 bytes is not a lot, but there should be a warning in the manual "fingers will get sore after repeated use of the small switches on the ALTAIR".  Further, each byte in turn lead to a greater probability of making an error.

The show went well, my friend Jeff Brown made a great video of the event "A re-tracing of how Paul Allen loaded BASIC on the MITS Altair 8800 from paper tape".  I really must thank him for the idea of loading 4K BASIC!  He saw my ALTAIR on a visit where we repaired his PET 2001 and suggested it.  After he released his youtube video, I stumbled upon another video called "Bill Gates talks about Microsoft and the Altair 8800".  At 3:26 into the video, Bill talks about the bootstrap loader and says Paul wrote the first loader in 46 bytes and he (Bill) later wrote it in 17 bytes.  I thought to myself, yes, 17 bytes is better than 20 bytes... especially if you are repeatedly demonstrating it.  I played with the idea of figuring out those 17 bytes then set out to make it as small as possible.  Well, that's when I got it down to just 14 bytes and now down further (with some encouragement) to 12 bytes!

Last summer, I acquired a Teletype ASR33 in working condition and decided it was time return to World of Commodore 2016 with a smaller bootstrap loader.  I narrowed what the bootstrap loader needs to work.
It needs to:
  • initialize a memory pointer
  • read a byte coming from the papertape
  • load it to memory
  • change the memory address
  • loop back and repeat
  • break out of this loop to execute the code
Well, Bill's code was pretty good and hard to beat, it used a stack to leverage smaller (shorter) instructions to reduce the byte count.  But I found 2 things that would save some steps (bytes).  The first was the way to break out of the loop, why not have the code written to decrement the memory pointer and let the code overwrite the bootstrap to break out.  The second save would be in how to determine if a byte was ready to read from the papertape.  Bill's bootstrap has to check for a status flag from the UART to indicate a new byte was shifted into the UART.  Well, if a new byte was NOT shifted into the UART, then the UART receive register would have the same byte that was previously there.  The trick is then simply to compare to the last byte saved to memory, and if it changed, then it would be understood that it's a new byte that is to be written.  The writes will then happen for all bytes that are different from the previous, so this now requires some extra "special coding" on the papertape.  Rule #1, no sequential bytes can be the same value.  Rule #2, we will not be guaranteed an absolute location in memory.  Of course, this means that the bootstrap loader cannot directly load 4K BASIC, but following in Bill's example, why not have the bootstrap load another bootstrap???  And since Bill's 20 byte bootstrap to load the 174 byte loader works, we can just load a program that will load that program.  **News Flash**, comments below suggested I re-evaluate some earlier work and now it's possible to do it in just 12 bytes!!! Without further adieu, here's the 12 bytes.

Octal
041     LXI   H,              ;Initialize Memory pointer to 0x01DB
333     IN 1                    ;Read UART data register (skip the status register)
001
276     CMP M              ;Compare with data stored at memory pointer
312     JZ 0001              ;Loop back if data is the same as stored
001
000
053     DCX H                ;When it's a different byte, decrement the memory pointer
167     MOV M,A          ;and save in memory
303     JMP 0001           ;Loop back for the next byte.
001
000

Now, you may notice, there's no way to break out of this loop, it will keep writing to memory until it overwrites itself... well that's great news!!!  The bytes targeted to overwrite the jump will be 01 or 00.  So, when the program finally gets overwritten the final jump will become JMP 0101 *OR* NOP
A long section of 00/01 is punched on the tape and these result in instructions that have no impact on program flow.  Next there are some instructions that align the PC in an unambiguous way and finally a section of code that simply stores Bill's original bootstrap to 0000, waits for the "AE" leader as required, then branches to 0000 to continue with the original bootstrap.

Since the nuked jump can be to 0101, the program to load the program must be after that point, which is fine because the memory pointer starts as 01DB, which happen to be the instructions to input port 1.

Also, the boot strap program could be shorten even further, but only with a significant hardware modification to wait for bytes in hardware.  The Tarbell 1101 Disk controller uses such a technique, hats off to Don Tarbell for a clever design!

That pretty much wraps up loading 4K basic with 5 bytes less than Bill.  It should be noted... at 110 baud, the extra papertape takes more time than it would take to switch in the extra bytes.  From start to finish, it takes about 9 minutes to load doing it Bill's way, and 10 minutes doing it my way.  But my way takes only 1 minute at the switches, while Bill's way takes 2 minutes.

Sincerely,
Josh Bensadon

A friend sent me this picture... why does Bill look so concerned?



My friend Walter (right) and me at WoC 2016.  ALTAIR and Teletype ASR33 in background.

My friend Jeff Brown (left) and me at WoC 2014.

Do two 8 bit machines stack up to a 16 bit machine?




-----------------------------------------------------------------------------------------------
The program to load the program on the papertape is....

Code to insert the original boot loader to 0000:  
56 BYTES. (38h) Plus 3 (since JUMP 0203)
Puts lowest starting address at 023B.

DB 01    IN 01
FE AE    CPI AE
C2 0201    JNZ 0102  LOOP BACK AND WAIT UNTIL TAPE LEADER
31 1400    LXI SP,0014
21 0300    LXI H,0003
E5    PUSH H
21 C0E9    LXI H,
E5    PUSH H
21 2D77    LXI H,
E5    PUSH H
21 BDC8    LXI H,
E5    PUSH H
21 DB01    LXI H,
E5    PUSH H
21 0FD8    LXI H,
E5    PUSH H
21 DB00    LXI H,
E5    PUSH H
21 1200    LXI H,
E5    PUSH H
21 0F31    LXI H,
E5    PUSH H
11 21AE    LXI D,  (Can't use H, 21 21 code repeats)
D5    PUSH D
        (Can't JP 0000, 00 code repeats)
21 0100    LXI H,0001
2B    DCX H
E9    PCHL    Jump to 0000


*** I punched a new tape for the 12 byte system and have tested it, works perfectly!  I must thank kgsws for encouraging me to go back to my notes and code it with one more trick.  The trick of reusing data bytes as instruction bytes ***



















22 comments:

  1. Computer Notes Volume 1, Issue 6, 1975
    Page Twenty-One
    Author Bill Gates

    "I've written a bootloader that only takes 13 bytes of keyed-in data, but anything smaller than 20 bytes isn't easy to use."

    Sorry for that, but as far as I know Gates never published his 13 bytes version, so you still could try to figure that one out :)

    Best regards,
    Udo Munk

    ReplyDelete
    Replies
    1. "isn't easy to use", see my comment to kgsws

      Delete
    2. I could shave one more byte off my bootstrap loader by only loading H with 03, but then it would add 300 bytes of filler to the papertape. It would still be "easy to use" but take even longer to load the papertape at the show. There's only so much patience one can endure!

      Delete
  2. If you make the assumption HL defaults to zero, it can be 12 bytes I think.

    00 NOP ;Keep bytes in jumps different.
    DB 01 IN 1 ;Read UART
    BE CMP M ;Compare last data
    CA 01 00 JZ 0001 ;Loop if data unchanged
    77 MOV M,A ;Save in memory when different
    2C INR L ;Increment LSB of pointer.
    C3 01 00 JMP 0001 ;Loop back for the next byte.

    The program get overwritten by the tape. Twice.

    Tape first overwrites the mini-loader without changing anything, then has 244 bytes for the "real" bootloader. The memory pointer then wraps around back to zero.

    We overwrite the mini-loader a second time, but now change the offset of the last byte to where the "real" bootloader starts (0xC maybe?).

    Dunno how reliable it would be, but still... 12 bytes. :-)

    Cheers!
    -- Ian Davis

    ReplyDelete
    Replies
    1. Ian, Thanks, but I don't think we can make the assumption HL is zero. That might work fine on an emulator. I've always thought a good emulator should randomize the register and RAM contents. PS. The data sheet clearly indicates all registers and flags are undefined.

      Delete
  3. hmm, what about this slight modification?
    21 DB 01 LXI H, 01DB
    BE CMP M
    CA 01 00 JZ 0001 ; notice target location
    2B DCX H
    77 MOV M,A
    C3 01 00 JMP 0001 ; notice target location

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. kgsws, I thought of doing this, but if you look at how the jump gets overwritten, you cannot guarantee bootstrap 100% of the time. The modified jump can either be 0103 or 0203, depends if the byte at the initialized HL is by chance equal to the leader byte or the random/last byte in the UART (depending if you start the papertape before or after the ALTAIR). I spent many hours trying to figure out how to guarantee the load location, but it cannot be done without adding more bytes to the bootstrap loader or preloading that one memory location (which adds more switch tasks than just 1 bytes). But it is a brilliant use of overlapping the code!

      Delete
    3. Ah ok, i do not know ALTAIR hardware stuff. It should be possible to overwrite JMP instruction itself then.
      Like this (bytes on tape):
      AE ** third stage mark **
      21 01 00 LXI H, 0001 ; it will jump to 0101 before writing "JMP 0121"
      C3 00 01 JMP 0100
      Or repeating 00 01 should work too, JMP becames NOP.

      Delete
    4. I don't quite see where you're going with that. The problem with overwriting, is that you can't be sure which value will overwrite the jump target. To be sure, it will take more bytes.

      Delete
    5. Look at repeating 01 00 case, it is easier.
      There is obvious case that last instruction will be "C3 01 01 JMP 0101" since 00 is overwritten with 01.
      However, if it happens to be 00 instead, you will get "C3 01 00 JMP 0001" (no change) and so bootstrap will continue. Next time it will overwrite 01 with 01. No change again and bootstrap will run once more. Last run is final run, because C3 will be replaced with 00, which is NOP opcode and that will break the loop.
      If i understand correctly, there is 3 byte uncertainty which makes two different outcomes.

      Delete
    6. And right you are! That should make it in just 12 Bytes! Well done! I'll cut a new tape and try it out on my system.

      Delete
    7. That's nice. I wonder if it works well.
      I might have something else. However, i have to make one rule. You have to start tape first and then run bootstrap.
      Can i assume that IN 01 will always return same byte, if you run tape first (with that byte repeating)?

      Delete
    8. Yes, I double checked the idea with my old notes, it was something I meant to get back to but there were other hurdles.

      You are correct to assume IN 01 will always return same byte. IN 01, will read the RX Register of the UART. This register will be undefined upon power up. It will then be written to by the byte received by the UART's shift register. This write happens regardless of whether or not the register was read. ie. The UART doesn't hold on to it for safe keeping, if you miss it, then it's gone. Can the papertape itself be used to provide the next instruction(s) for the bootloader???

      Delete
    9. With this code, you get executed whatever is on the tape:
      0000: 21 00 00 LXI H, 0000
      0003: DB 01 IN 01
      0005: 77 MOV M,A
      0006: E9 PCHL
      0007
      There is only very limited amount of instructions that you can use. It is pretty much useless.
      There are other multiple variations giving you more choices. For example this is most complex one:
      0000: 11 00 00 LXI D, 0000
      0003: 4F MOV C,A
      0004: DB 01 IN 01
      0006: 12 STAX D
      0007: 79 MOV A,C
      0008: C3 00 00 JMP 0000
      000B
      You can use much more instructions, you can even use A register for AND + OR math and PCHL to jump (or RET can jump to C9 + conditional returs). You can, with varying complexity, get over 80 unique values to any of HLBA register. Whether those values are useful, i do not know. It is already 11 bytes and it requires tape to be run first. Which might not even be worth trying.
      Remember, specified instruction gets executed many times, you can't use addition and similar.

      Delete
    10. I'm trying to go over what can be done with that... I'm just not seeing anything that can store a program. Most instructions cause relative action (ie, shift, increment, add, etc), and you already pointed out they can't be of use. Time to move on to the next project for me!

      Delete
  4. It is impossible to write a bootloader that takes less than 11 bytes. I have discovered a truly marvelous proof of this, which this reply textbox is too narrow to contain. -- Pierre de Fermat

    ReplyDelete
    Replies
    1. Brilliant! I would love to see that!
      With hardware modifications it can be done in 9 bytes (but that requires specialized hardware).
      It's basically this:
      LD H,3
      IN 1
      LD (HL),A
      DEC HL
      JP 0002

      The special hardware would put the processor in a wait state until the data comes in. I have designed such hardware for my JAIR 8080 beta version, to interface a slow PIC chip. Of course, if hardware modifications are on the menu, then it can all be done through DMA, or even better by the use of Shadow ROM (something my later version of JAIR 8080 uses).

      But, to your 11 bytes, why can't your reply fit in this box if it's only talking about 11 bytes?



      Delete
    2. This comment has been removed by the author.

      Delete
  5. Oh Josua Tree,

    Check out the famous meatball "Pierre de Fermat" he was a mathemagician who scrawed "that the equation an + bn = cn had no solutions in positive integers, if n is an integer greater than 2" in the margin of a book or on the wall of a truck stop (I forget which).

    ReplyDelete
    Replies
    1. LoL, Yes, I remember this guy... I totally forgot his name. I'm not good with remembering names, but do remember the skepticism.

      Delete
    2. Yes, the famous Ferman's Last Theorem. Wiles' proof was one of the greatest mathematical achievements of the past fifty years (not that I understand much of the proof).

      Delete