UC Berkeley and Amdahl - ByrdSight Technical Consulting

ACCOMPLISHMENTS: UC Berkeley MS and Amdahl internship (see bottom of page for Amdahl) RISC Microprocessor Design

Was the Teaching Assistant (TA) for VLSI Design class taught by John Ousterhout
Was TA and Research Assistant (RA) for CS292R, the advanced architecture class under David Patterson for the development of the SOAR (Smalltalk On A RISC) microprocessor
Was the hardware microarchitect of SOAR. Developed the architecture and the simulation model written in a version of LISP called SLANG (a Simulation LANGuage)
My Masters Thesis was a separate project advised by Al Despain for the development of the architecture and implementation (including layout) of a vector co-processor for the Motorola MC68000 microprocessor. The coprocessor was capable of Direct Memory Access (DMA) to/from memory and performing integer tensor operations, and utilized a for then novel regular array based column compression carry-save multiplier using a parallel-prefix final adder stage

DETAILS: UC Berkeley MS CS

One or Two year Masters Program? Prior to enrolling at Berkeley, and while working at GTE Sylvania, I had taken 4-5 graduate level EE/CS courses at Stanford with a focus on communications. But rather than take another 4-5 courses to obtain an MSEE from Stanford, I elected to enroll at Berkeley to pursue a masters degree that was structured as a two year program and included a thesis.

Seduced by RISC: I entered Berkeley in the Fall of ’81 intending to focus on communications, with David Messerschmitt as my thesis advisor, but it was not long before I switched to computer/microprocessor architecture with Al Despain was my advisor. However, I spent the majority of my time as a member of the SOAR (Smalltalk On A RISC) team under David Patterson (affectionately known as “coach” by his grad students), and the SOAR experience under Dave was to profoundly (and positively) effect how I approach computer systems design and implementation challenges.

I was both the T.A. for the Meade and Conway introductory VLSI course taught by John Ousterhout, as well as both the T.A. and an R.A. for the CS292X advanced architecture course (the SOAR project) under Dave. My role on the SOAR team was as micro-architect. SOAR was an investigation to see how RISC (Reduced Instruction Set Computing) approaches could be applied to an Object Oriented System (OOS) such as Smalltalk that supported dynamic data typing and object-oriented (garbage collected) storage management. An early ‘83 block diagram of the SOAR architecture is shown below

UC Berkeley Smalltalk-on-a-RISC SOAR Micrprocessor architecture block diagram

SOAR (Smalltalk On A RISC) Micro-architecture Drawing

Simulating the SOAR Architecture: SOAR was simulated using a LISP based cycle accurate event driven simulator for digital logic, called SLANG, for (A Simulation LANGuage) that was first used on the RISC I processor at Berkeley. SLANG was developed by John Foderaro and later refined by Korbin Van Dyke and is embedded in Franz Lisp (SLANG is actually a LISP system with added functions and some pre-defined variables). John was one of the main developers of Franz LISP. One has to get used to parenthesis when coding in SLANG. An example of some SLANG code for SOAR is shown below.

UC Berkeley Smalltalk-on-a-RISC SOAR SLANG simulation program

SOAR (Smalltalk On A RISC) Architecture Simulation Code written in SLANG

Dealing with Slow SLANG: The SLANG simulation of SOAR (the largest design to have used SLANG at that time) ran too slowly on Berkeley’s DEC VAX 11/780. The simulation speed was about one full machine cycle every 2 wall clock seconds, which meant that simulations of meaningful size took a long (too long) time.

In order to reach even this simulation speed, SLANG had to be consciously coded to minimize the number of events being evaluated in the event queue when we knew there was no chance of a signal changing state. This required careful coding, and the resulting description was not as “bullet proof”, but it was necessary to achieve usable simulation speeds.

This careful empirically measured analysis of SLANG coding style (and its impact in event queue activity) vs actual system state transitions paid significant dividends in simulation efficiency, and I was to apply similar techniques later on a much bigger more complex system design on a then new event based simulator, called Verilog, for the design and simulation of the Newton at Apple. These skills were later to prove useful at Apple when I would write highly optimized Verilog to minimize event queue operations to improve simulation speed of the Newton. But you really had to know what you were doing as it was risky . . .

First SOAR Paper: I graduated before SOAR was completed and taped out, and Joan Pendleton took over the micro-architect role. However, before I departed Dave Ungar, Rick Blau, Dain Samples, Dave Patterson and I wrote a paper describing the team’s work. The first page of that paper is shown below:

Cover Page of Smalltalk-on-a-RISC SOAR microprocessor paper

Berkeley SOAR (Smalltalk On A RISC) Architecture Paper

Masters Thesis: My master’s thesis could have centered around the design and simulation of the SOAR architecture, but instead I chose to design and layout (using the CAESAR layout tool developed by John Ousterhout) a vector co-processor for the Motorola MC68000 microprocessor. The goal was to improve integer vector and matrix mathematics performance of the Motorola 68000 microprocessor (which was at that time being designed into the first Apple Macintosh). This took the form of a separate chip – a coprocessor – that could directly read and write main memory (DMA) and perform vector computations autonomously after being set up and kicked off by the 68000. The 68000, although 32-bit internally, used a 16-bt external data bus and a 24 bit external address bus.

A (then) novel aspect of the work was the use of a regular rectangular multiplier array called a column compression carry save multiplier that was suggested by my Thesis advisor, Al Despain. The design also used an, again then fairly novel, parallel prefix fan-in 3 final stage adders.

SLANG was also used to simulate the architecture. The implementation technology was 4 micron single metal, single poly NMOS, and clocking was performed using a two-phase non-overlapping clock which was the standard clocking methodology at Berkeley at that time.

The chip was designed , simulated (using Spice for circuit level simulation, and the Crystal tool for higher level timing analysis), and laid out using the Caesar layout tool. Performance improvement over the 68000 was up to 15x for long vectors. I find it affirming that even now, new embarrassingly parallel workloads such as Deep Learning can benefit greatly from parallel autonomous vector or tensor operations performed by an arithmetic coprocessor like Google’s TPU.

The layout of the multiplier array was completed and verified, but I estimated another 2-3 months work would be required tape out the chip to MOSIS for fabrication.

One of my main “takeaways” from the experience was to in future exercise the discipline required to complete functional simulation before beginning layout. I was too eager to implement my “clever” layout ideas.

My thesis cover page and abstract are shown below, along with a graph showing expected performance improvement over a standard MC68000.

Master's Thesis Cover Page A Vector Assist co-Processor for the MC68000 Microprocessor

Berkeley Masters Thesis Cover Page

Forks in the Road not Taken: At one point John Ousterhout invited me to join the Magic (a VLSI layout program) team effort, which I declined as my heart was more in the computer architecture and hardware systems area than EDA tools. If I had said yes, my career could have well gone down a very different path. But modern computer system design and implementation never stray far from either EDA tools or software/firmware – be it simulators, layout and synthesis tools, compilers, driver code, real-time operating systems, kernel code, etc.

Dave Patterson also urged me to stay and go for a PhD. But I instead, aided by a strong recommendation from Dave joined the Macintosh team at Apple, see: APPLE-MACINTOSH

Thanks to Berkeley: My graduate experience at Berkeley definitely reinforced my belief in the value of a two year masters program that included a thesis. I am very grateful to Berkeley and the teaching staff there (Al Despain, David Patterson, John Ousterhout, Bob Brodersen, Richard Newton, Paul Hilfinger, et. al.) for enabling such a fantastic opportunity and learning experience. Where else could a masters level graduate student, in 1982/83, been part of a superb multi-disciplinary team oriented processor design (SOAR), as well as have learned the skills and have access to the resources to, by himself, design a (for then) sophisticated co-processor chip from soup to nuts?

Dave Patterson (“coach”) has the skillset to technically lead, to inspire, and to manage a multi-disciplinary team of graduate students to execute on complex systems level projects. He could therefore take on bigger projects, projects that were more interesting to DARPA, and secure the funds to execute to programs and in the process crank out a succession of grad students and PhDs that would to on to have a profoundly positive impact on the industry.

Thanks to DARPA: I believe it appropriate to acknowledge DARPA’s role in making the RISC I, RISC II, SOAR and SPUR (Symbolic Processing Using RISCs) projects at Berkeley possible. I believe they also supplied funding for the MIPS work at Stanford. Without that money all the fantastic computer architecture/systems work done at Berkeley during the 1980’s, and all the PhDs and graduate students who emerged to do subsequent great work, might well not have happened.

There are those out there who believe that government funding of advanced R&D projects such as those took place at Berkeley and Stanford in the 80’s is unnecessary or inappropriate. The reality is that they are wrong

A FEW MEMORIES:

I still remember that when the elevator door opened to the 5th floor of Evans Hall, you would immediately see in front of you on the Seminar board the list of talks being given that day, often by traveling professors. I loved that environment, of having such a rich tapestry of expertise and presentations at your fingertips, and I miss it. Perhaps that is why I still much enjoy attending SystemX and other seminars at Stanford. I remember one day the elevator door opened, and on the board in large black whiteboard marker was the talk entitled “WHY RISC SUCKS“. It was a brilliant title because that seminar was packed. I think it was a researcher from IBM who gave the talk.

I remember the SOAR team traveling to Stanford to meet with the MIPS group under John Hennessey to exchange computer architecture ideas. The room had a lot of beanbag chairs. They would say disparaging things about register windows, and we would respond with disparaging comments about software table walk on page fault. All in good fun and for the exchange of academic ideas.

RESOURCES:

The button below is a download of a retrospective on SOAR I wrote for the Apple Newton team in mid ’88. At that time the Newton software group was surveying potential OS and development systems candidates for Newton with a bias towards OOS (Object Oriented Systems). As I had worked with Dave Ungar on SOAR I was later to bring him in to talk to the team to provide his perspectives and to present his work on the Self programming language.

A Smidgin of SOAR

Your Content Goes Here

UCB SOAR SLIDE SHOW:

SOAR Team Photo
SOAR Chip Die Photo

SOAR Team Photo
SOAR Die Photo

#soliloquy-container-14430{opacity:1}#soliloquy-container-14430 li > .soliloquy-caption{display:none}#soliloquy-container-14430 li:first-child > .soliloquy-caption{display:block} Berkeley Smalltalk on a RISC (SOAR) Team Photo

Berkeley Smalltalk on a RISC (SOAR) Team Photo

Berkeley Smalltalk on a RISC (SOAR) die chip photo

ACCOMPLISHMENTS: Amdahl Summer Internship. Processor Design

Working with Lloyd Dickman and Richard Bishop of Amdah’s Advanced Architecture Group, developed the design and implementation for a RISC based interrupt control processor for Amdahl’s IBM compatible mainframe computers. This design was implemented in ECL gate array technology.

DETAILS

Likely due to a recommendation from Dave Patterson (as I had seen Lloyd Dickman at a seminar or two at Berkeley), between my first and second year at Berkeley I was offered a great opportunity for a summer internship position at the Advanced Architecture Group at Amdahl under Lloyd and Richard Bishop. My task was to design a RISC style interrupt control processor for Amdahl’s IBM compatible mainframe computers as the current interrupt architecture was too slow (too much latency). The implementation technology was hot fast Fujitsu ECL gate array.

I remember spending a Saturday at Richard Bishop’s (Amdahl’s Chief Architect) house on Russian Hill in San Francisco where he asked me to work on the architecture with him. He showed me “how real architects design processors” (he was only half kidding I think) – and we proceeded to fill a notebook full of photocopied pages of the architecture block diagram, one page for each instruction type and for each pipeline/clock stage. We then use colored pens to highlight the busses and resources used for each instruction for each clock cycle to verify there were no conflicts and that there were sufficient architectural resources. I was to later employ the same technique (for initial design) later on SOAR (Smalltalk On A RISC) at Berkeley, for my masters project (A Vector Co-Processor for the Motorola MC68000) at Berkeley, and for a RISC inspired implementation of a “fast” 6502 (faster than the 68000) while with the IC Technology Group at Apple.

Richard very much liked the San Francisco lifestyle, with fine coffee shops just a walk down the hill from his house on Russian Hill, but they only had a garage for one car. I still remember driving around some very expensive real estate for quite awhile, looking for a place to street park his Porsche.

The CAD tools at Amdahl were fairly primitive. The tools at Berkeley were much better, and the MainSAIL based VLSI Technology tools (based on work done at Stanford) I was to use at Apple just over a year later better still.