High Performance, Low Cost…

What is HyperRAM ?

On this page we describe some promising initial work using a new type of memory, known as HyperRAM.

Data storage is a vital consideration in any new instrument design. Both the Parallax Propeller and XMOS Startkit have limited available memory - 32 kB HUB RAM in the former and 64 kB in the latter. While this may be sufficient for many applications, add-on external memory must be employed in more demanding situations.

There are two basic types of memory - static and dynamic. Static memories are readily available up to several MB in capacity, while dynamic memories can store hundreds of MB. In terms of cost per byte, static RAM is significantly more expensive than dynamic RAM but is much easier to use. The reason for the cost and ease-of-use differences when comparing the two types of memory comes down to the construction of the individual memory cells - in static RAM these are transistors while for dynamic RAM the memory cells are capacitors. While the logic state (0 or 1) of a static RAM memory cell, once written, will be retained (providing power is maintained) the same is not true for a dynamic RAM cell, which requires regular and constant refreshing to retain its state. It is this refreshing requirement that significantly complicates the use of dynamic RAM.

While the interface to a static RAM chip is relatively straightforward, a large number of pins are consumed. For example the interface to a byte-wide 1Mx8 static RAM would require 20 address lines (1M=220), 8 data lines and several additional control signals (CS*, RD*, WR*). A Parallax Propeller design incorporating such a RAM chip will have no pins left to do anything ! The XMOS Startkit has ~ 44 useable I/O pins, still leaving a handful of spare pins for additional interfacing. Indeed several XMOS-based data acquisition systems have been built 1MB and 2MB SRAM chips and these are briefly described here.

Dynamic RAM interfacing, while not considered further here is also demanding in regards pin counts. However very recently, the semiconductor company Integrated Silicon Solution Inc. (ISSI) has announced a new type of RAM, called HyperRAM which are high-speed CMOS, Self-refresh Dynamic RAM (DRAM) devices. Quoting the ISSI literature, “the HyperRAM family devices include a self-refresh logic that will refresh all the rows automatically so that the host system is relieved of the need to refresh the memory. The automatic refresh of a row can only be done when the memory is not being actively accessed by the host system. The refresh logic waits for the end of any active access before doing a refresh, if a refresh is needed at that time. If a new read or write begins before the refresh is completed, the memory will drive RWDS HIGH during the Command-Address period to indicate that an additional initial latency time is required at the start of the new access in order to allow the refresh operation to complete before starting the new access”.

What makes HyperRAM particularly attractive is its comparatively simple interface requirements. The 8Mx8 part # IS66WVH8M8ALL/BALL connects to a user’s system via just 12 pins - an 8 bit data bus and 4 I/O signals - CS*, CK, RESET* and read/write data strobe (RWDS).

The partial schematic opposite shows typical connections to an XMOS Startkit. Several shields incorporating the HyperRAM chip have been developed to explore the use of this memory device in scientific imaging and data acquisition/waveform generation applications.

Descriptions of these projects can be found in the following sections.

HyperRAM soldering

The HyperRAM part mentioned above comes in a ball grid array (BGA) package. The PCB land pattern consists of a 5 x 5 array of tiny dots, with one of the 25 dots missing in the top right corner marking row A, column 1.

It was relatively easy to solder the BGA to the PCB, first applying a generous amount of solder flux to the PCB, then placing the chip onto the PCB using tweezers and very carefully aligning the package to the silkscreen outline. The soldering was done in a reflow oven (a K-Mart toaster oven controlled by a Beta Layout reflow controller). Once the solder had flowed the package settled down nicely onto the PCB. On cool-down a subsequent visual inspection showed that no gaps could be seen when looking side-on at the the edges of the BGA package relative to the PCB top surface, indicating that the soldering likely was successful. The downside of using BGA packages is that apart from this visual inspection, testing the integrity of the connection is not possible without sophisticated X-ray equipment that is not available to a hobbyist enthusiast !

HyperRAM interface code

While HyperRAM is theoretically capable of extremely high data transfer rates (exceeding 100 MB/s) there are several important considerations that make such rates very challenging to achieve.

Data transfers to HyperRAM are performed in bursts. A burst consists of a 6 byte preamble indicating whether the operation is a read or a write, and a start address. After that come the data bytes - a double data rate (DDR) is achieved by ensuring that each byte is set up and ready just prior to each clock edge (both low-to-high and high-to-low). But there is one very important catch - there is a limitation to the burst duration - each burst must not take longer than 4 us in light of the HyperRAM chip’s self-refresh requirements. The actual number of bytes that can be sent or received within this 4 us window is then dictated by the rate at which the host processor can perform its I/O operations.

The xc code I’ve developed makes use of two XMOS clock blocks to generate two clocks, a fast clock running at 25 MHz (clk2) and a “slow” clock running at 12.5 MHz (clk1). The code fragment below configures a bi-directional 8 bit port DATA (connected to the HyperRAM’s data bus - D7::0) - and configures it so that both input and output operations on this port are referenced to the rising edges of clk2 (i.e new data appears every 40 ns). Clk1 is then configured to drive the CLK input pin of the HyperRAM chip - with the result that on every clk1 edge (every 40 ns), new data gets clocked into/out of HyperRAM.

out port CLK = XS1_PORT_1E ;
out port CS = XS1_PORT_1J ;
port DATA = XS1_PORT_8B ; // both input and output (bi-directional)
in port RWDS = XS1_PORT_1G ;

in port * movable RWDSip = &RWDS;
out port * movable RWDSop ;

clock clk1 = XS1_CLKBLK_1 ;
clock clk2 = XS1_CLKBLK_2 ;

configure_clock_rate(clk1, 100, 8) ; // a 12.5 MHz clock, giving an 80 ns period, and 40 ns between clock edges
configure_clock_rate(clk2, 100, 4) ; // a 25 MHz clock, 40 ns between rising edges

configure_port_clock_output(CLK, clk1) ;

configure_out_port (DATA , clk2 , 0);
configure_in_port (DATA , clk2);

n.b. Clocks are started and stopped by name (clk1/clk2) as the argument to the in-built xc functions start_clock()and stop_clock().

Once that’s done, the following two lines show how data is sent to or received from the bi-directional port DATA :

DATA <: wr_buffer[0] ; // output byte to port
DATA :> rd_buffer[0] ; // input byte from port

In order for the above scheme to work correctly (without getting data errors) the phasing of the two clocks (clk1/clk2) needs to be carefully managed. To this end, the XMOS simulator tool proved to be an invaluable aid. The simulator in XTIME Composer affords the user a powerful, logic analyzer-style graphical display that provides an accurate state and timing view of the resulting I/O operations when the user’s code is executed. Tweaks could then be made to the code until the timing was exactly as required by the HyperRAM chip.

With the aid of the simulator it was established that 64 byte chunks of data could be transmitted/received while simultaneously remaining within the 4 us timing constraint and also ensuring integrity of timing between data and clock signals. A 64 byte burst in 4 us yields an effective data transfer rate of 16MB/sec (but see later for additional performance figures).

Sample Waveforms from the XMOS Simulator Tool

Below we show some sample waveforms generated by the XMOS simulator. The screen captures below show the two register writes that initialise the HyperRAM. The traces show activity on the CS* and CLK signals as well as on the 8 bit data bus (D7..0). The first write to register 0 sends the 8 byte hex sequence 60 00 01 00 00 00 8F E9 - this configures the HyperRAM chip for a 3 clock latency and 64 byte burst length, as described on page10 of the datasheet.

A second write of the hex byte sequence 60 00 01 00 01 00 02 to register 1 sets a default refresh interval of 4 us (data sheet, page 13). Notice that each register write transaction involves 8 clock edges (resulting in transmission of 8 bytes) and takes a total of ~ 375 ns. In the screen capture shown here, the clock high and low periods are each 40 ns for a CLK frequency of 12.5 MHz, as mentioned earlier.
The next screen capture shows a memory write transaction. Immediately after CS* goes low, 6 bytes (48 bits) are transferred on the next 6 clock edges. This sets up the HyperRAM for a write operation and conveys the starting address for the linear burst of data to follow. The next 5 clocks are needed to provide fixed latency for the HyperRAM’s self-refresh system. Following this, 64 bytes of data (here 0x00-0x3f) are transferred and CS* is then pulled high.

During this write transaction the port direction of the RWDS signal (not shown here) must be changed; the time taken to do so results in the visible gap in the clock sequence. In the xc code being simulated here, the RWDS port is acted upon via the use of moveable pointers and it will be possible to eliminate this timing gap by simply declaring RWDS as a bi-directional port. As currently shown, the write transaction takes 3.81 us, for a data transfer rate of 16.8 MB/s.
Finally we show the simulator output for a memory read transaction. Again, there is an initial 6 byte transmission (a read command plus address), followed by 5 latency clocks. The DATA port direction is then changed (to an input) and 64 bytes of data are read from the HyperRAM pins D7..0 - these are timed to occur at the midpoints of the high and low clock states - before CS* is finally brought high.

In the simulator output below the data bus shows the value 0x00 during the entire 64 byte read, although during an actual read sequence these values would be changing according to the data being read back. The entire read transaction completes in 3.44 us, giving a data transfer rate of 18.6 MB/s.

Read/Write Timings for Alternative Buffer Sizes

The HyperRAM chip can be configured to make use of other data buffer burst lengths (32/128 bytes). A small improvement in read/write data transfer rates will be achieved for 128 byte transfers (vs. 64 byte) as a greater fraction of the time spent would then be devoted to the actual data transfers after the initial command/address setup. Consulting the HyperRAM data sheet one sees that if the chip is operating at near-ambient temperatures, the 4 us window for read/write transactions can be relaxed. For example, one can double the latency time by changing the last byte in the register 1 setup sequence shown above from a 2 (= 4us) to a 0 (= 8us).

Since each additional byte being read or written requires 40 ns, increasing the buffer size to 128 bytes from 64 bytes will take an additional 2.56 us. By changing the latency to 8 us one can accommodate the larger data transfer and also increase the read and write data transfer rates to 21.3 MB/s and 20.1 MB/s, respectively.

Timings for 32 byte read/write transactions have also been determined - these are 2.16 us for a read, and 2.53 us for a write transaction, giving data transfer rates of 14.8 MB/s and 12.6 MB/s. As expected these are slightly worse figures than for the 64 byte transfer discussed earlier.

Testing HyperRAM functionality

In order to verify the correct functioning of the HyperRAM, a LabVIEWTM vi has been developed. This vi enables the user to read and write either single or multiple blocks of 64 byte bursts of selected data to the HyperRAM and display the results.

There are a number of controls on the front panel whose functions are as follows. The Dataset# control is used to fill the 64 byte buffer with various data patterns/functions. Read_Buffer# and Write_Buffer# each set a starting block #, while the # of blocks to read/write are set using #Blocks. Four additional buttons control whether read/write operations take place as single or multiple block transfers. For the HyperRAM chip being evaluated here there are a total of 131,072 blocks and since each block holds 64 bytes this yields a total capacity of 8,388,608 bytes = 8 megabytes.

After each block read, data is sent back to the host PC via the serial port at 1.5 Mb/s (set in the baud rate field). In the screen capture shown below, Dataset 2 was first selected and used to fill the first eighth of HyperRAM (16384 blocks, each of 64 bytes) with a sawtooth repeating pattern and after doing so, the data is read back. As seen in the graph window (HyperRAM_n) at upper right, 1048576 bytes (1MB) of data have been written and read back correctly. Owing to the large number of points acquired here, the data in the graph window just appears as a series of horizontal lines, with the sawtooth nature of the data only recognisable when the graph is expanded.

By exercising the HyperRAM with the various controls on this panel one can successfully verify reliable operation of writes/reads. Using 64 byte burst transfers as described above we currently achieve a data rate of >16 megabytes/second during the actual HyperRAM read/write routines (and one can go slightly faster by increasing the buffer size to 128 bytes - see earlier notes).

Using HyperRAM in a Scientific Grade Imaging System

Elsewhere we have discussed area array CCD image sensors. A full frame readout of image data from this type of sensor far exceeds the available memory capacity of either the Propeller or the XMOS systems we typically use for instrument development. One solution to this problem has been described here, in which data is written on a line-by-line basis to an SD card on-the-fly during image readout.

Inspired by the successful HyperRAM experimentation above I’ve also developed an XMOS shield with a HyperRAM chip to store spectral images. Below is a photo of a SITe imaging system at left with an XMOS-based HyperRAM control shield at right. In this implementation an 8 bit data bus is shared between the HyperRAM and the AD9826 correlated double sampling chip on the imager board. Separate chip select and output enable signals routed to these two chips are used to avoid bus conflicts. In this design there is spare I/O capacity to allow for 7 uncommitted I/O pins and a ground to be brought out to the IDC header at the far bottom right.

Using HyperRAM (HR) in Data Acquisition Systems

The ISSI IS66WVH8M8ALL/BALL chip can hold 8MB of data, making it an excellent choice for building very low cost, deep memory data systems - either for capturing and storing large amounts of analog input (ADC) data or for retrieving such data and outputting it “on-the-fly” to a DAC - as would be required for arbitrary waveform generation.

In view of the way data is read from/written to HR it will be useful to briefly describe how the chip can be applied in these applications. In either case, the user will typically specify a “dwell time”, or sampling interval between points in the captured or generated waveform.

As was discussed earlier, read/write transactions with the HR always occur in blocks of bytes; with data transfers needing to be completed within a certain time due to refresh constraints. In the system we have described above, the block size is 64 bytes and the transfer time is <4 us.

Let us first consider the case of analog input and assume we are using a 16 bit ADC such as the Linear Technology LTC1867. This ADC has a maximum sampling rate of 200 ksps, or 5 usec per point. Our HR chip therefore has the capacity to store up to 4 million samples.

To do so, all that one requires is to maintain a block counter, a 64 byte buffer space and a buffer index. The block counter is initialized to zero at the start of a new acquisition sequence. As each new sample comes in (from reading the ADC) we index the data into the next pair of bytes in the buffer space. Once the buffer is full (i.e 32 samples = 64 bytes) we write the buffer to the current HR block and then increment the block counter and reset the buffer index in readiness for the next sample. This process is then repeated until all samples have been acquired.

This strategy works fine for acquiring an uninterrupted data stream providing the user’s sampling interval is longer than the time required to perform the block transfer to HR. In practice one can take data in this manner at around 200 ksps. When operating at this full speed the ADC/HR combination can acquire 16 bit readings every 5 usec for a total of 20 seconds data acquisition !

ADC’s with higher resolution (e.g. 24 bit) are even more easily accommodated because of their much slower data rates – for example the LTC2448 has a maximum sampling rate of 8 kHz. In this case the output data format is 4 bytes/point (24 bits of usable data) and consequently the 64 byte buffer fills after every 16 samples. Data acquired at that speed could be collected for over 4 minutes before filling the HR.

Arbitrary waveform generation is accomplished in a very similar manner. In the approach being used here, the waveforms are first computed in LabVIEWTM and are downloaded from there into the HR. The playback process uses the same scheme of buffer, buffer index and block counter as described above.

For the purposes of discussion we describe operations using an LTC2600, 16 bit DAC, that has a 10 usec settling time for a 2.5V step change at its input.

To generate a waveform from data stored in HR we must first load a block of data from HR into the 64 byte buffer. We then index into the buffer to get the next two bytes to combine into a 16 bit output that is passed to the DAC. Output continues in this manner until data in the buffer is exhausted, at which time the block counter is incremented, a new buffer is fetched and the process repeats.

Again, the high capacity of the HR chip allows waveforms having up to 4 million points to be stored and the data transfer rate to HR is more than adequate to keep up with the 100kHz upper data rate of the DAC.

As mentioned elsewhere, Linear Technology have low cost evaluation boards for each of the afore-mentioned data converter IC’s. The XMOS Startkit shield shown below has an on-board HR chip, an IDC header (at J3) for making the connection to the evaluation board as well as an SD card for additional permanent data storage.

Using an XMOS Startkit, this HR data acquisition shield and a suitable LT evaluation board one can construct a high performance analog data acquisition system or an arbitrary waveform generator for less than $100 !
Stacks Image 6036

Final Thoughts

In the investigations described here we have shown that one can obtain read/write data transfer rates in excess of 20 MB/s using the ISSI IS66WVH8M8ALL/BALL , 8MB HyperRAM chip. At this point we’ve basically reached the I/O bandwidth limit of the XS1-U8A-64-FB96 processor on-board the XMOS Startkit.

Considering the small physical size of the HyperRAM package, its very low cost (< $5, qty 1) and its memory capacity of 8 MB, the speed of data transfers reported here makes HyperRAM a very attractive option for dramatically boosting memory capacity in XMOS-based systems.

At this stage, no attempt has yet been made to use the HyperRAM in a Propeller-based system. The P8X32 chip would not be able to achieve the data rates reported here, but an FPGA-based P2 emulation might well be up to the task and would be an interesting project for someone to tackle.