An Exploration of Cno

    Why look at Cno?

    Cno is the unstripped and mysterious first binary that forms part of ching(6) which first appears in V7 Unix. But, as we will see, that's not its origin! It is a simple binary compared to its sister phx, but perhaps unique enough that it was treated differently afterwards. Phx was basically unchanged but Cno was altered several times! (see The strange case of the ching(6) in the Unix)

    As an exercise in understanding V7 Unix and to recover its long-lost source we will disassemble cno. We'll use a few tools to do this: I have a copy of cno on my Linux system and the first tasks are to use od and strings to look for likely ways in.

    Using strings

    After strings cno we have predictably little. Examining the file directly for strings, we find some important additions, the word log.a as well as %s and then a long sequence d o x f e g c s l L u r D O X * U which is actually part of the code for converting variables in the printf() family. There is also some date-related strings following that. As we know there was a log file and it had a particular format, we can surmise that this is part of the code for that. It tells us definitely that C was the language used, which tells us a lot about how the program will be structured. If we cheat a little and look at the phx binary, we can see that "hexagrams.r" appears. That also tells us that C is being used as "r" and "a" are file handle flags: r for reading and a for text output.

    All this is at the end of the file, which tells us that there will be mostly library code appended by the linker, and references to it will be resolved in the preceding code, so it will be useful to find those addresses to find the system calls in the code.

    Code-hunting

    The C compiler in Unix emitted code for the Unix assembler to translate; this means it output code intended for the syntax of that assembler! as had some significant differences from the standard macro assembler from RT-11, particularly with input/output syntax. This is useful, because that gives us another place to hunt for code. Another source of entry (use the source, Luke!) are the optimizations used by the C compiler. This was before the era of optional optimization! One important one is explained in the compiler's sourcecode for the 2nd pass (usually the pass for resolving symbols):

    
          /* Notice addresses of the form
           * $xx,xx(r)
           * and replace them with
           * (pc),xx(r)
           */
          

    $ is "AT&T syntax" for # in DEC assembler syntax, and is usually a PC instruction. Remember that destination precedes source in most PDP11 operands, so this instruction is designed to reference the address of the program counter from a register address rather than the PC directly.

    System calls

    Let's look for system calls. Helpfully, they're in a file /usr/include/sys.s in V7 and are numbered. SYS is just a synonym for TRAP in AT&T syntax, so its machine code is 104400 and the masked bits are any number from 1 to 60. There are lots of calls, and some will be false positives. Here is a list in ascending order, with doubles and false positives weeded out:

    
          104401 - exit
          104403 - read
          104404 - write
          104405 - open
          104406 - close
          104410 - unlink
          104415 - chmod
          104421 - mount
          104423 - setuid
          104424 - getuid
          104430 - utime.
          

    There are many references to 104400 but that is a false positive. mount is an odd call also but more context is needed. setuid/getuid is very interesting, but may just be related to all the file calls which most of this stuff is dealing with. At least some of the calls will be for stdin/stdout and we know it writes a logfile.

    Library calls

    These are trickier: calls to most library routines are simple JMP instructions or variants on JMP, even more opportunity to make false positives particularly with subroutines. What we can be sure of are RTS, returns from subroutines.

    cno: Black Box Theory

    When attempting to understand the internals of a machine or a program, it's helpful to get something to go on by considering what it takes as input and what it outputs. For instance, we know V7 cno wrote a log file of coin tosses in a particular format, we know that it could take random input or generated its own random input and it output six numbers for phx to interpret.

    od Listing: Header

    00000000 000407 007146 000766 002152 000000 000000 000000 000001
          

    This first line of the od listing represents the a.out executable header:

    
          struct exec {
              int         a_magic; /* magic number */
              unsigned    a_text; /* size of text segment */
              unsigned    a_data; /* size of initialized data */
              unsigned    a_bss;  /* size of uninitialized data */
              unsigned    a_syms;  /* size of symbol table */
              unsigned    a_unused;
              unsigned    a_flag; /* relocation info stripped */
          };
          

    For now, 0407 means a normal executable, the next four fields are the sizes of each field. Note that the a_syms field is empty: the a_flag field is 01 meaning that it is stripped, so the symbol table has been zeroed. This also means that addresses have been resolved, so when we look for jumps and address references, we should find them hard-coded in the executable. We have 0766 bytes of variables but that is almost doubled by the space for uninitialized variables so we'll probably see a cast or two to some larger numbers.

    od Listing: crt0

    
          0000020 170011 010600 011046 005720 010066 000002 004767 000356
          0000040 022626 010016 004737 006040 104401
          

    The next lines in the file make up the C runtime header that sets up the stack-based executing environment. It disassembles to the following:

    
          setd
          mov sp, r0       / argc
          mov (r0), -(sp)  / argv
          tst (r0)+        / no args?
          mov r0, 2(sp)    / effectively the .bss
          jsr pc, _main    / jump to main
          cmp (sp)+, (sp)+ / check argv
          mov r0, (sp)     / push argument on the stack
          jsr pc, *$_exit  / mode 37 for $expr.
          sys exit
          

    The BSS space reserved for an initial stack would later morph into _environ under because this isn't a V7 crt0.s, it's the much simpler V6 version. And in fact our suspicion this might be the case is strengthened by an interesting fact about many unix binaries: the assembled crt0.s differs and this binary indicates that it is a V6 binary, not a V7 binary. But let's be thorough and do some more work.

    od Listing: further on

    
          0000040 022626 010016 004737 006040 104401 004567 006264 016701
          0000060 007106 070127 031425 062701 015415 010167 007072 010100
          0000100 000167 006252 004567 006232 004767 177736 010001 005000
          0000120 071027 000021 005401 010146 004767 177716 010001 005000
          0000140 073026 010100 000167 006206 004567 006166 005746 005065
          0000160 177770 000402 060465 177770 005367 007210 002407 117700
          0000200 007200 042700 177400 005267 007170 000404 012716 007362
          0000220 004737 001140 010004 022704 177777 001354 005765 177770
          0000240 001004 012716 000001 004737 000040 016500 177770 000167
          

    Picking up at byte 053, let's disassemble further

    
          / _getrand
          jsr r5, csv            / c save registers (standard library)
          mov word_7170, r1      / we grab the seed from here
          mul $31425, r1         / multiply by 13077
          add $15415, r1         / add 6925
          mov r1, word_7170      / return result, see text!
          mov r1, r0
          jmp cret               / pop regs
          / -- end of _getrand
          
          / this is _getrnum
          jsr r5, csv            / another subroutine
          jsr pc _getrand        / this is calling the previous subroutine
          mov r0, r1
          clr r0
          div $21, r0
          neg r1
          mov r1, -(sp)         / this is probably a parameter
          jsr pc _getrand
          mov r0, r1
          clr r0
          ashc (sp)+, r0        / definitely a parameter here
          mov r1, r0
          jmp cret              / end
          / end of _getrnum
          
          / this is _getques
          jsr r5, csv
          tst -(sp)
          clr -10(r5)
          dec word_7404
          blt loc_214
          movb $word_7402, r0 / this is getting from __iob
          bic $-400, r0       
          inc word_7402       / incrementing __iob
          br loc_224
          mov $7362, (sp)     / $__iob, (sp)
          jsr pc,  *$_filbuf
          mov r0, r4
          cmp $-1, r4
          bne loc_164
          tst -10(r5)         / major difference here to v7-generated code
          bne loc_252
          mov $1, (sp)
          jsr pc, sub_6040    / this is another oddity.
          mov -10(r5), r0
          jmp cret           / pop
          

    At this point if we compare V7-generated code, we can see there are great similarities but also important differences. Subroutine setup is much simpler here. The assembler optimizes code differently in _getques and its I/O routine differs. This is further confirmation that here is a V6 binary pretending to be a V7 one.

    od: Finding the Seed Routine

    So the first subroutine here had me wondering, clearly not main() but a function, and the answer comes from the Reno code by Guy Harris: it is indeed the original seed routine!

    
          unsigned getrand()
          {
              return(seed = (seed*13077) + 6925);
          }
          

    At this point, I tried the attack from the other end, a known-text attack by writing a C program to try and get the same result. I tried a separate function with a global variable seed and a main-local sum. The results both in object and assembler form were enlightening. If cno had not been stripped, the object code would have at once shown the structure of the program. By comparison, the VAX 32V code is not as helpful to identify global variables and functions.

    Another thing to consider is whether cno was written at a time when the system itself was evolving its assembler output. If the seed is calculated in a separate function, not only is seed moved using a different register, there is also extra stack management code. Cno did not exhibit the stack management code, so calculating the seed in main() might get closer to the original code.

    To my surprise, seed was not moved via the PC but r5 and there was still some stack management code missing. It is still possible that cno used global variables instead of local ones, but there are still questions about this. Given that the assembler output suggests that the seed value was taken from a subroutine, it might be the linker which is evolving or changing things from object code to executable.

    od: Finding the Throw Routine

    Most rewrites implement the throw as filling an array but using a different algorithm. This is more complicated, the Reno C code runs:

    
          char string [7]; /* space for 6 digits plus terminator */
          char *change();  /* pre-ANSI forward dec */
          
          int table[2][2][2] = {
              {{6,7},{7,8}},
              {{7,8},{8,9}},
          };
          
          char *
          change()
          {
              register int i;
          
              for (i=0;i<6;i+++){
                  string[i] = table[getrand()][getrand()][getrand()] + '0';
              string[i] = '\0';
              return(string);
          }
          

    The Reno code actually goes a little further in randomness than this but we already know the original random code. The table in cno starts at byte 7172, just after where the seed is stored, because the seed is BSS (uninitialized) and the table is DATA (initialized).

    Known-plaintext attack

    I bit the bullet and compiled a version of the Reno rewrite on V7: it turns out to be pretty much what the original code was, except for the logfile. So, going back to the previous disassembly section, we can see that the helper functions were either all written before main(), possibly either for scoping reasons or thats just the way the linker reorganized the file. For instance, here is char *change() from the IDA Pro disassembly; note how call is substituted for internal function calls where as uses jsr pc syntax

    
          sub_262:                                / CODE XREF: sub_416+62
                           jsr     R5, sub_6342   / csv
                           sub     #6, SP
                           clr     R4
          
          loc_274:                                / CODE XREF: sub_262+116^Yj
                           call    sub_104        / _getrnum
                           bic     #-2, R0
                           mov     R0, -10(R5)
                           call    sub_104
                           bic     #-2, R0
                           mov     R0, -12(R5)
                           call    sub_104
                           bic     #-2, R0
                           mov     R0, -14(R5)
                           asl     R0
                           add     -12(R5), R0
                           asl     R0
                           add     -10(R5), R0
                           asl     R0
                           mov     7152(R0), R0   / this is _table
                           add     #60, R0 ; '0'
                           movb    R0, 10134(R4)  / this is _string
                           inc     R4
                           cmp     #6, R4
                           bgt     loc_274
                           clrb    10134(R4)      / string
                           mov     #10134, R0     / again
                           jmp     loc_6356
          

    I don't know if Harris disassembled the VAX version or the PDP11 version but that is amazingly accurate. It would seem he either forgot or gave up implementing the log file, which is all that is left for me to do. The disassembly of main() indicates that this is probably where it got called, and the actual writing was done in a function.

    Recovering the V6 code

    Most of the existing rewrite can be kept; the issue is that V6 was a very different beast for programming to V7.

    This essay is a work in progress.

    Last updated: 2016-05-14

    Back to computers