Direct Memory Access (DMA) controller is an important entity of System on Chip (SoC) architecture. It plays a significant role in increasing the speed of data transfer between memory and peripherals via a special bus called Advanced Microprocessor Bus Architecture (AMBA) (Flynn, 1997). AMBA has become popular among all the on-chip industrial standard bus architecture.
The design DMAC has chosen AMBA specification for easy integration into SoC as per Tiwari and Dahigaonkar (2011) and Liang et al. (2000). Various architectures are available on DMAC, like flyby DMAC by Ganssle group, multi-channel transmission for multi users and dedicated channel for large data throughout. Flyby is a dedicated channel DMAC, works on non-buffer data transmission principle. The main advantage of it is to improve data transmission rate. But on the other side there is a fast data blocks movements in equipments, those are on the same part that is common in audio and video applications and need to buffer the data. Flyby mode is no longer applicable as per Olugbon et al. (2005) because, in non-buffering DMAC, it is not possible for writing operation just after reading in a single cycle. Therefore, flyby is not correct choice to get high-speed data transfer. The speed of flyby has achieved by the method described in the proposed research.
As AMBA is specified by advanced Reduced Instruction Set Computer (RISC) machines (ARM Ltd., 2007). Three buses are defined in AMBA specifications arm-amba specifications that are Advanced High Performance Bus, Advanced System Bus, Advanced Peripheral Bus (AHB, ASB and APB). It has described in AMBA based architecture by Flynn (1997), when DMAC transfers data, it is a master on AHB bus. When it realizes transfer of data between AHB slaves and APB peripheral, DMAC must buffer data and transfer to APB bridge to all visiting APB peripherals.
|Fig. 1:||DMAC AMBA based architecture
The buffering data reduces the data transfer rate which is shown in Fig. 1. For enhancing the data transfer rate, the AHB operation and APB operation should run in parallel, for that the DMAC architecture should lie in between AHB bus and APB bus with APB bridge functions, closely with the approach of Ma (2009). This architecture met the parallelism as per Hwang (1993). It can directly control address, data and control signals on APB bus. In Fig. 1 ASB is used instead of AMBA that is an older name of AMBA.
All the AMBA bus peripherals have to access the data from the memory directly; DMAC can do it. This research study introduces parallel reading/writing from/to dual RAM by buffering mechanism through asynchronous FIFO.
MATERIALS AND METHODS
In this section, the method of design functioning is explained by considering DMAC architecture. The function process of DMAC consists of two part. Transmitter DMA and received DMA engine to make writing and reading operation in parallel. In addition to this dual port RAM which is common for both transmitter and receiver as in Fig. 2.
Data movement from peripherals to memory and memory to peripherals through DMAC in both sides i.e., transmitter and receiver, DMAC has internal buffering mechanism through FIFO at both the side.
In transmitter side Fig. 3 shows, during write operations peripheral sends a request to DMAC. In respond to the request DMAC grants permission to write into FIFO, for every write the write pointer of Tx (transmitted) FIFO incremented by one. Once write operation completed, or while writing into transmitted FIFO, Tx DMA state machine has rights to read data from transmitted FIFO and for every read from Tx FIFO read pointer will increment by one.
Similarly, when the peripheral wants to read data from memory, it sends read request to receive DMAC as shown in Fig. 4. In response to a read request, the data reads from memory by receiving DMA engine and write into Rx (received) FIFO for any external peripheral. The FIFO pointer will increment appropriately for every read and write operations of received FIFO. For both transmitted and received DMA controller, the state machine has designed with three states each side.
|Fig. 2:||Proposed architecture DMAC
DMA state machine: DMA state machine consists of two individual state machines for transmitter and receiver. On transmitter side state machine Fig. 5 shows, DMA engine is in idle state. Once Tx_req comes for writing data into memory. It jumps to the state Tx_FF_RD_MEM_WR_ST. In this state, DMA reads data from FIFO and writes back to the memory upon the request of external user. It will write to memory for the count equal to the signal Tx_dcnt. Three states at transmitter side are:
||TX_IDLE: Idle state for transmitter
||TX_FF_RD_MEM_WR_ST: In this state data reads from fifo and write into the memory at transmitter
||TX_ISSUE_STATUS_ST: In this state transmitter DMA issue the status for writing operation
On the other side, the received DMA engine also consists of three states shown in Fig. 6. In RX_IDLE, state DMA waits for the request. Once it receives it jumps to the memory read FIFO write state, i.e., RX_MEM_RD_FF_WR_ST. DMA engine in this state reads data from memory and writes back to the FIFO upon user request. Once it gets the signal TX_ff_ff_full it jumps to status state and displays the status of DMA state machine. Three states at receiver side are:
|Fig. 3:||Transmitted DMAC
|Fig. 4:||Received DMAC
|Fig. 5:||Transmitted state machine
||RX_IDLE: Idle state for receiver
||RX_MEM_RD_FF_WR_ST: In this state data read from memory and write into fifo at receiver side
||RX_ISSUE_STATUS_ST: In this state receiver DMA issue the status for reading operation at receiver side
||Received state machine
Signals description of DMAC: All the input and output ports explained in Table 1. This, consists of the signals and their direction, width and function of respective signals.
|Table 1:||DMAC signal description|
Simulation results: DMAC design consists of asynchronous FIFO, DMA engine in both transmitted side (when peripherals want to write) and received side (when peripheral want to read) and a common dual port RAM. The whole design implemented in Verilog HDL for easy integration in SoC. Simulation has done for individual blocks and a complete top-level block architecture using Mentor Graphics tool Modelsim ISE by Mentor Graphics. The simulation result of AMBA based DMAC has achieved and indicates reading just after writing in a single clock.
Writing operation into FIFO by peripheral 1 indicated in Fig.
7 and 8 indicates reading operations at the same time
by the request of any other peripheral 2 to access the bus. Hence, two peripherals
in parallel requests controller and it grants the access of bus to both peripherals
alternatively for writing and reading operations to/from dual RAM using buffer
mechanism by FIFO. Here writing of peripheral is done first into FIFO and then
DMAC reads data and writes into the dual RAM.
As soon as peripherals get access to bus, it sends the address, data and writes enable signals for writing operations and address, reads enable signals for reading operations to control the unit. The data is written into the memory and read from memory. Those addresses are specified by peripheral.
Synthesis results: Synthesis of the design is then carried out on Xilinx synthesis tool Xilinx 13.2 ISE by referring to Xilinx Corporation (1998). Device family has selected for synthesis is Virtex 7 and Target Device is xc7vx330t-3-ffg1157.
|Table 2:||DMAC design results for improved speed|
Results obtained by synthesis has tabulated in Table 2. It is shown in the Table 2 that design achieved 306.24 Mhz of maximum frequency by utilizing only 162 Slice LUTs on 3.265 nsec minimum time period.
Advanced synthesis report which shows the digital logic block has generated in implementation of DMAC. The synthesis tool also generates the net list. The logic block is shown in Fig. 9, indicates the top level of design that internally consist of FIFO and DMAC state machine.
This method introduces an improved DMA method for SoC applications. It increases the data transfer rate in SoC between processor and memory. The AMBA based implementation is used to enhance the data transfer speed using buffering mechanism with asynchronous FIFO. Asynchronous FIFO is use for buffering the data between processor and memory and it provides parallel reading and writing operations to improve speed of data access. As compared to the polling method, data transfer method in microprocessor based SoC and the interrupt method, the proposed methodology present the similar and more efficient method to enhance the speed of data transfer using AMBA interface. It extends the method of Ma (2009) and Hwang (1993). The design has achieved 306.24 MHz maximum frequency with 162 slice LUTs. Results show that the high-speed data rate is achieved with less utilization of chip area. Therefore, it is best suitable for embedded systems based SoC with RISC based processors.
|Fig. 7:||Data writing to FIFO and reading from FIFO and then writing to memory
|Fig. 8:||Reading FIFO writing to memory
|Fig. 9:||Logic block generated for DMAC
AMBA based DMAC module is implemented in Verilog HDL. Simulation is done on Mentor graphics tool modelsim simulator and synthesis is done on Xilinx tool Xilinx 13.2 ISE synthesizer. The design has achieved AMBA based fast data rate DMAC core for easy integration into SoC product. This design describes how peripherals access data from dual RAM and how it controls address, data and control signals independently and achieves AHB bus and APB bus to run parallel. The DMAC could adapt buffered data transferred mode according to the speed of the peripheral. Experimental results show DMAC has the advantage of high-speed transfer rate and is much suitable for integration in SoC products. It achieved the maximum frequency of 306.24 Mhz and minimum time period of 3.265 nsec.
This study project supported by the Deanship of Scientific Research at Salman Bin Abdul Aziz University, Ministry of Higher Education, Kingdom of Saudi Arabia.