Digital signal processors constitute an important aspect of signal processing in many high performance applications and end-user devices. Keeping the power dissipation as low as possible in a reconfigurable architecture that enables dynamic reconfiguration of hardware modules is quite a challenging task. This paper examines one approach towards low-power programmable DSPs that seeks to solve this problem by matching the model of computation of a given task with a well defined architectural granularity without adverse consequences on performance. At the end, an evaluation of this approach is given together with the shortcomings from its findings and possible areas that may need more research work.
Reconfigurable processing, in contrast to traditional processing, enables a dynamic synthesis of logic circuits that can adapt to, and optimize a desired functionality. The ability to design an efficient hardware description or model to match logic gates and their behavior, and that is suitable for a given application area such as DSP, is essentially much under research given an ever-increasing need to keep power utilization low and increase period of use. DSPs find varied applications in digital radios, computer disk drives, high-resolution printers, satellites and many end-user products. They are specialized and particularly dedicated to tackle processing tasks requiring very fast computation at the bit level with the least amount of energy dissipation possible.
They usually have small form factor and require no hit sink thereby making them suitable for many embedded applications. Another approach to processing digital signals is by the use of general-purpose (GP) processors like the Reduced Instruction Set Computer or RISC . These processors have high computation power but in spite of that, are not well suited for many embedded high-performance applications because they are often power-hungry and require a number of support ICs to operate. Other approaches toward low-power processing, and targeted towards wireless applications, are  GP processors with extended instruction sets, GP co-processor architectures, application-specific processors (such as ASIC) and processors with power-scalable performance.
To extend the time between battery recharges of mobile terminals,  energy-efficient designs coupling dedicated hardware and specialized processing elements (logic gate arrays) have been proposed so that the lifetime of these terminals while operating in the network could be increased.  Multimodal software radios and multichannel cellular network base stations are two applications that have relegated much of their processing requirements to software in-order to dynamically adapt to requested services and standards. Another closely related approach to this was published by Paul Graham and Brent Nelson  using a DSP hybrid consisting of a dedicated DSP core and a reconfigurable logic to reprogram the system without altering the DSP’s core internal architecture.
This paper evaluates yet another alternative approach to power minimization in digital signal processing applications based on a  dynamic matching between architecture and computation. It works on the principle that, the building blocks which make up the architectural design of a system, is necessitated by the computational needs of the system at any given point in time, which in turn has an overall effect on the power consumed by the system. Based on this hypothesis, if the computational requirements of a given task can be ascertained, then the system could be reconfigured to suit this task. More resources could be dedicated to tasks in turn and unneeded resources could be disabled, thereby increasing overall application efficiency and reducing total power consumption.
2. Principles of Reconfiguration
The principle used in  reconfigurable processing as a solution to low-power programmable DSP is derived from adaptive computing systems  (ACS) which allows a computer system to be built from an array of field programmable devices (FPGAs) using basic digital logics that can be reconfigured dynamically, thus resulting in great performance improvements compared to the traditional multiprocessor architectures. The level of reconfiguration that can be performed is a function of the amount of resources such as interconnect networks and logic functionality that are needed by the system. The author defined a concept known as  granularity of programming model which defines the level of system reconfiguration chosen in accordance with the model of computation of the target task.
There are various granularities (or levels) of reconfiguration that can be operated upon, with each one having its own preferred and optimal application domain. Firstly, Stored-instruction approach which uses zero level of reconfiguration, and commonly used in most traditional general-purpose processors, requires that data and instructions make use of a shared bus. Secondly, the reconfigurable dataflow architecture exploits bus access by using a network of buses interconnected to functional units. Complex Programmable Logic Device (CPLD) can be considered to adopt this architecture. Thirdly, the datapath architecture uses bit-slicing to increase concurrency and bypass the traditional one-dimensional flow of data by using an asymmetrical network of reconfigurable buses and a mesh of bit-levels. An example is FPGA.
Finally, the gate-level model represents the lowest level of reconfiguration possible. It functions at the transistor and gate levels. An example is Application Specific Integrated Circuit (ASIC). The performance of any of these described models will depend on the granularity of the modules that are used to describe the model, the interconnection structure between these modules, as well as distribution and accessibility of program storage by these modules. In practice, however, mapping actual computation tasks to any of these models could result in area, power and performance penalties.
3. Power-Saving Strategies
Reconfigurable processing for DSP’s proposed in  mainly focused on minimizing power consumption in programmable DSP by combining all four models of computation described above. Two key power-saving strategies are used to achieve substantial energy reduction and simplify the granularity of computation. One approach is to lower the supply voltage and frequency, and by exploiting concurrency in the form of pipelining and parallelism. However, because a task will demand a certain level of computation and degree of concurrency, which will ultimately affect the power consumption, an optimum position between concurrency and energy consumption needs to be maintained.
Another approach, and preferred method, is to reduce energy waste by keeping energy overhead to a minimum. This is achieved by making the application as specific (and as simplified) as possible, to what computation it needs per time. The model of computation or granularity adopted for a task, by the same reasoning, can be scaled-up to describe a entire system with changing functionalities in a dynamic environment using a multi-granularity architecture. Multi-granularity allows programming of domain–specific processor instance that can implement a range of algorithms or functions within a particular domain of interest. The combined granularity of the functional elements which define the power consumption of the entire system depends upon on the computational requirements and properties of the desired domain of interest. In summary, the architectural template which defines the properties of a good energy model should pay attention to granularity at multiple levels of computation. These properties are: application-specific modularity, preservation of locality inherent in algorithms, use of energy on demand and use of static configuration of interconnect network and computation resources without multiplexing.
Two examples  – a voice coder processor and CDMA baseband processing – were used to demonstrate the effectiveness of multi-granularity in reducing energy dissipation. In the voice coder processor, using the reconfigurable dataflow model, a great energy saving was realized at 5.7GOPS/Watt at a maximum clock rate of 30MHZ; whereas, in contrast, for the CDMA baseband processor, the high-performance requirements of its correlator units made it impossible to construct the processor from functional units. Instead, and more appropriately, the reconfigurable datapath model is used to implement other related computational sub-units such as modulators and filters.
4. Evaluation and Recommendations
The design methodology presented in the paper is one that supports adaptive computing system or ACS which provides a platform for ubiquitous computing by adapting to environment or user requirements. ACS allows device system reconfigurability at the gate level and provides higher performance when compared to traditional computing systems. More and more interesting applications have continued to evolve in this field and the future looks bright for the computer architecture community.
Previous implementation towards energy-efficient computations schemes proposed in the past such as in  DSP-RL hybrid used a combination of dedicated hardware with specialized processing elements. Although the authors pointed out that great performance improvement can be realized with specific computational kernels, a balance must be set for maintaining an optimal performance level against the resulting increase in area, power dissipation and the difficulty of programming. How achievable this is, would be dependent primarily on two factors – the digital composition (or granularity) of the building blocks – electrically and structurally – as well as the complexity of the desired computation. Although the presentation of the four granularity models were quite illustrative using diagrams, the content was not sufficient. Each model could have been dealt with individually in detail since the idea behind the proposed energy-saving technique lies in the selection of a suitable model that best architecturally describes the computational model of a desired task.
The very fast  bit-level operations capability described in the reconfigurable datapath model might not be quite feasible with programmable logic such as FPGA because FPGAs inherently cannot handle high precision arithmetic or complex control. As a solution, a dedicated co-processor might be needed in combination with the rest of the programmable design to handle such high precision and complex computation. A similar approach with DSP and programming logic hybrid was proposed in . In the domain of real-time applications requiring a 100% system uptime, a programmable solution that reconfigures itself dynamically during runtime might not be able to provide adequate service life due to the delay involved with device reconfiguration.
A typical example is in a multi-standard base station supporting many different standards. Every time the base station needs to reconfigure itself to support a particular standard, however minimal, there will definitely be a time lag associated with reconfiguring the computational structures to fit that standard. During this period of time, end-users who currently have a call ongoing might witness call drops. The examples given in the paper  to illustrate power-saving in programmable DSP also did not seem sufficient enough to cover different scenarios and application environments. No quantitative approach was provided to support the very important correlation between performance level and power dissipation. Furthermore, none of the two examples cited:  voice coder processor and CDMA baseband processing truly used the concept of multi-granularity architecture. Each of them had their processing unit constructed from a single computational model and not necessarily from the combination of all four models as proposed in the multi-granularity architecture.
Making an application configurable usually carries with it, a large energy overhead. The more application-specific a task seems the less energy it uses up. To minimize energy, applications must use the right model of computation for only the time period required. More research needs to be done in keeping energy dissipation low without sacrificing performance.
 Jan M. Rabaey, “Reconfigurable Processing: The Solution to Low-Power Programmable DSP”
 Paul Graham and Brent Nelson, “Reconfigurable Processors for High Performance, Embedded Digital Signal Processing”
 Mark Jones, Luke Scharf, Jonathan Scott, Chris Twaddle, Matthew Yaconis, Kuan Yao, “Implementing an API for Distributed Adaptive Computing System”
 Reconfigurable Computing for Digital Signal Processing: A Survey, Journal of VLSI Signal Processing, Kluwer Academic Publishers, The Netherlands, 7–