Last modified 9 months ago Last modified on 10/12/16 14:29:59


The SPECTRA European project

The Embb project is supported by the Celtic-Plus project SPECTRA.

Embb, a generic hardware and software architecture for digital signal processing

Embb is a generic hardware and software architecture dedicated to data flow application like Software Defined Radio (SDR), image processing, network processing, security,...) To make it short, Embb is especially useful for applications that:

  • Have strong digital processing requirements requiring hardware acceleration (fast Fourier transforms, high speed enciphering - deciphering,...)
  • Consist in series of very different kinds of digital processing algorithms that cannot be all implemented on the same hardware accelerator (channel convolutional decoding, modulation, vector processing, bitwise interleaving,...)
  • Have hard real time constraints.
  • Can be represented as data flows and controlled by a General Purpose micro-Processor (GPP).
  • Are very flexible in nature (SDR applications targeting several radio access technologies from Bluetooth to LTE-A).

In a SDR context Embb can be used to implement most of the demanding digital signal processing. It is controlled by a software application that runs on its General Purpose Control Processor, hence the Software Defined.

Embb in the context of an SDR application

The main characteristic of Embb is that it is an attempt to solve the contradiction between flexibility and energy efficiency. It does this by assembling a collection of Digital Signal Processing (DSP) units, each of them dedicated to a family of DSP algorithms (e.g. vector processing plus Fourier transforms). The DSP units are dedicated enough to be energy efficient but flexible enough to accommodate all needed variants of their algorithmic family. Embb is an alternate solution to classical DSP processors and Application Specific Instruction set Processors (ASIP) that are usually very flexible but not always sufficiently energy efficient for the target application. The following figure represents a typical Embb platform for SDR applications with its control GPP, the GPP peripherals and local bus and five DSP units interconnected together and with the GPP thanks to the central interconnect.

Example of a Embb instance for the SDR

DSP units

All DSP units share a common architecture depicted by the following figure. They embed:

  1. The Processing Sub-System (PSS), heart of the unit, implementing the family of digital signal processing algorithms.
  2. The Memory Sub-System (MSS), a local working memory. This memory is mapped in the memory space on the GPP; It is used to store the input data and output results of the DSP unit. Before a processing can take place the input data must be first moved in the MSS of the target unit. And after the processing completes, the output results must be read back from the MSS, unless another processing in the same unit uses them as input. The MSS size varies from one unit to the other and ranges from 0 (no memory at all) to 512 kB. All its interfaces are standardized and identical in all units except its interfaces with the Processing Sub-System (PSS) that are custom and depend on the specific needs of the considered unit.
  3. The Control Sub-System (CSS), a control and interface module. The CSS contains the interface and status registers of the DSP unit. Some of these registers are common to all units, some are specific. The CSS is responsible for interfacing the unit with the central interconnect and, as its name says, for controlling the unit. It is the only mandatory module in a DSP unit. It optionally embeds:
    • A Direct Memory Access controller (DMA)
    • An 8 bits micro-controller (UC)

Internal architecture of Embb DSP units

Currently available DSP units

INTL, a general purpose interleaver

INTL takes a set of input samples and a permutation table and outputs a permuted set of samples. The bit width of the samples can be anything between one and eight bits. INTL is also useful for rate matching and frame equalization, thanks to several special features like force-zero-sample, force-one-sample, skip-sample, repeat-sample, etc. When de-interleaving, if some input samples were repeated by the transmitter, INTL can replace the set of repeated samples by their average value or by the last occurrence. The largest applicable permutation is 32768 samples but larger permutations can be obtained in multiple passes (a 65536-samples permutation, for instance, requires four 32768-samples passes).

MAPPER, a general purpose modulator

MAPPER takes a set of input samples, a constellation table and outputs a set of 2x16-bits complex samples. Each input sample is used to address the constellation table that contains the possible complex output values. MAPPER can implement any mapping from BPSK (one-bit input samples) to 65536-QAM (16-bits input samples). MAPPER can optionally apply a rotation and / or an homothetic transformation on the output samples.

ADAIF, a general purpose interface with up to four Analogue-to-Digital Converters (ADC) and up to four Digital-to-Analogue Converters (DAC)

Each receiving channel (RX) and transmitting channel (TX) is driven by an independent external sampling clock. ADAIF takes care of re-synchronization between the sampling clocks and the system clock of the Embb instance. ADAIF can be configured at run-time to raise interrupts upon a given state of its internal sample FIFOs. It has standard slave and master interfaces with the central interconnect, plus a custom interface with PP a (not yet available) general purpose re-timing filter, I/Q imbalance and frequency offset adjuster. Pairs of RX and TX channels can be synchronized for FDD or TDD operations with programmable receiving and transmitting periods. Several ADAIF commands can be used during operation for fine-tune synchronization with a remote receiver / transmitter.

FEP, a general purpose vector processor

FEP operates on vectors of 8-bits or 16-bits integers or of 2x8-bits or 2x16-bits complex numbers. It can be used for component-wise add, subtract, product, square, square of modulus, non-linear Look-Up-Table based transforms. It also implements Fourier transforms. In a single operation it can compute a result vector plus the sum, max, min, argmax and argmin of its components. When fetching the input vectors from and storing the output vectors to its local memory it can apply sophisticated addressing schemes (repetitions, puncturing, periodic, self-wrapping...) and on-the-fly transforms (conjugate, negate...).

Other units will be added. The next best candidates are:

  • a general purpose Viterbi decoder,
  • a general purpose Turbo decoder,
  • several new versions of FEP with more or less internal parallelism,
  • a general purpose channel coder.

If the DSP unit you need is not available you can, of course, design it yourself. If you do so, please consider contributing the Embb project.

Gluing the DSP units together: interconnect, bridges, interrupt controllers, GPP…

Embb also comes with:

  • A configurable interconnect, 64-bits wide, based on the Advanced Virtual Component Interface (AVCI) point-to-point communication protocol.
  • A configurable interrupt controller that gathers all interrupt request lines from all DSP units and signals them to the GPP through one single line. It supports interrupt masking and prioritization.
  • Bridges between the interconnect and the GPP.

Two types of GPP are currently supported: ARM and Sparc, corresponding to the two provided example platforms: systems based on the Xilinx Zynq SoC family and classical FPGA-based systems using the LEON3 Sparc core by Aeroflex Gaisler.


Of course, the Embb hardware would not be very useful without a strong software basis to support it. In an Embb instance, each DSP unit can run independently of the others. Inside a single DSP unit, the PSS and the DMA engine (if present) can also run in parallel allowing simultaneous data processing and input/output data transfers. This is one of the main sources of performance and efficiency, compared to more classical architectures. The management of this high degree of parallelism, under data dependency, memory management and real time constraints, is very challenging.

Embb can be used with any operating system, provided software drivers of the Embb components are designed for the target OS. Currently, software drivers are provided for the MutekH OS. MutekH offers all the parallel programming facilities. It is highly customizable and very lightweight, which makes it a perfect choice for parallel software applications with hard read-time constraints and strong performance requirements. MutekH has native POSIX threads support. Embb parallel applications are usually multi-threaded. A typical software architecture for Embb is one or two threads per DSP unit (one to manage the signal processing by the PSS and the other to manage the data transfers using the DMA engine, if present), plus one or more global control threads to launch and stop the former threads and to communicate with the environment. The threads are synchronized and communicate using the classical parallel programming means like semaphores, mutexes or atomic variables, all offered by MutekH.

The software drivers for Embb automatically take in charge most of the complexity of an embedded application. At startup, the MutekH kernel autonomously enumerates all the available hardware devices thanks to a Read-Only Memory (ROM) embedded in the Embb interconnect. The ROM is located at an address known to the kernel, its format is also known and it contains detailed information about the present DSP units, their capabilities, whether they are equipped with a DMA engine or not, and lists all interrupts that these hardware devices can fire and their priorities. It allows the OS to remain the same, whatever the particular Embb instance, the DSP units it embeds, their memory mapping and interrupts. This automatic discovery mechanism is the key element of the plug-and-play feature of Embb. Each discovered device is attached a software driver, if a compatible software driver exists. Most Embb software drivers are merely requests-responses queue managers with, in some cases, priority management: they enqueue incoming requests from the application threads and submit them to the hardware devices when they are ready to proceed. Upon completion of DSP processing or DMA transfers, the hardware devices raise interrupts that are handled by the proper software driver using responses queues. The application threads can use the drivers API to get information about the current state of a specific device and to retrieve results of completed requests.

The Embb distribution

The Embb distribution comprises:

  • The synthesizable VHDL models of all components.
  • VHDL simulation environments for unit testing and for validation of complete Embb instances.
  • Makefiles and scripts for VHDL simulation. Mentor Graphics Modelsim is the only currently supported simulator but supporting others should be rather straightforward.
  • Makefiles and scripts for VHDL synthesis. Cadence RTL Compiler, Mentor Graphics Precision RTL and Xilinx Vivado are the currently supported logic synthesizers.
  • Complete examples of Embb instances for FPGA targets using the Aeroflex Gaisler LEON3 and the ARM Cortex A9 core of Xilinx Zynq (the Zynq-based ZedBoard by Digilent is one of the supported prototyping boards).
  • SystemC models of the currently available components. These SystemC models are designed for easy integration in the SoCLib modelling framework. Using the SoCLib interconnects and Instruction Set Simulators of CPUs, it is possible to quickly assemble virtual prototypes of Embb instances.
  • Examples of complete virtual prototypes.
  • Software drivers of all DSP units, DMA engines, interrupt controllers for the MutekH Operating System.
  • Example SDR applications that can be run either on a SystemC virtual prototype, in VHDL simulation or on a target hardware.
  • libembb, a software library with similar Application Programming Interface to that of the MutekH software drivers, but intended for pure software emulation and algorithmic validation. The library offers all the functionalities of the available Embb DSP units. The computations are bit accurate and applications built on top of libembb run on a regular desktop or laptop.
  • Example SDR emulation applications.
  • Documentation.

Important note: the VHDL source code of Embb is not yet available as a separate archive. In order to download the VHDL source code, please clone the Embb Git repository ( If you do not have access to the repository, please ask for credentials by sending an e-mail to contact hyphen embb at telecom hyphen paristech dot fr.

Getting started