Abstract: As embedded systems have become wide reaching technology the need for better performance has been persisting. For that researchers have made great effort to come up with new techniques in order to obtain a better performing system that consumes less power. From here, it comes the idea of developing a new scheduling algorithm that manages to achieve components, scheduling in order to reduce dynamic power consumption. The main purpose of that algorithm is to increase the latency of some components whenever it is possible without negatively influencing the dependency constraint. One of the main solutions to increase the latency of components is through decreasing frequency of their clocks. This decrease of the clock frequency brings about reduction in power consumption. To prove its efficiency, the new proposed algorithm has been tested at both levels: Simulation and physical one. In fact, design results have shown significant reduction in dynamic power.
INTRODUCTION
As portable and embedded systems have become the worldwide leading technologies and the need for better performance has been persisting, the creation of a new efficient type of digital circuits has become a critical issue. Because low power dissipation is one of the most important parameters that improve the capabilities and the performance of these systems, the need of new ways to preserve power has become a vital necessity. Designers have to come up with many solutions that preserve power. In fact, some research groups have focused on the designing of batteries with extended life cycles (http://www.lbl.gov/) or the creation of new low power architectures in terms of materiel (i.e., silicon) (http://www.silabs.com/Pages/default.aspx). Unfortunately, several research groups dont have the necessary resources to both design batteries and lower power architecture. To overcome this dilemma, there exist other methods such as design technique that is based on scheduling between the components of a target system to minimize its dynamic power. From here comes the idea of developing a new algorithm that manages to bring us a less consuming system. That algorithm meant to increase the latency of some components without influencing the dependency constraint. There exist different ways to increase the latency of a component, one of them is the decrease of frequency value of its clock. It is widely acknowledged that a decrease in frequency is followed by a respective decrease in dynamic power. So, given a multi-component system, if any possibility of increasing component latency occurs then the proposed technique will definitely provide a gain in power. For instance, given a multi-component system with n components S (C1, C2...Ci...Cn) that are running with a main master clock; assuming that they use the same frequency F1 which is the frequency of the master clock, the dynamic power of a multi-component system before and after the scheduling will be as follows:
Knowing that, the dynamic power Pd of a system can be written as follows (Wang et al., 2013):
(1) |
where, A concerns the percentage of active logic gates; that are charged dynamically, C represents the total capacitance load, V is the supply voltage and F refers to the execution frequency. Since all components are using the master clock, the dynamic power before scheduling is as follows:
(2) |
For a simplification reason we presume that:
(3) |
(4) |
Assuming that after scheduling at least one component latency can be increased, the dynamic power after scheduling has been calculated by the following way. As mentioned before one solution to increase the latency of a component is to decrease its clock frequency to be F2. F2 equals φ F1 where φ is a real <1
(5) |
Based on Eq. 4 and 5, the gain in power:
(6) |
Since φ<1, then G>0.
Thus, the purpose of this study is to propose a scheduling algorithm for multi-component system that aims to optimize its dynamic power through the increase of the latency of some components whenever it is possible.
MATERIALS AND METHODS
As mentioned previously, the aim of the study is to come up with a new scheduling technique that manages to optimize the dynamic power of a global circuit which contains many components, through the decrease of clock frequency of some components whenever it is possible while ensuring that such a decrease does not influence the dependency constraint. Figure 1 summarizes and clarifies the proposed approach.
Proposed algorithm: The input of the proposed algorithm is a system S = (C1, C2, C3, C4 Cn) where, C1, C2, C3, C4 and Cn present its components that run with a master clock having a frequency value equal to F1. Each block is characterized by its latency in terms of cycle numbers. The algorithm is divided into 4 steps:
Step 1: | In this step the global system has been transformed into a graph with nodes represented by components and edges referring to the dependencies between them |
Step 2: | In this step, the as soon and as late scheduling has been applied on the graph above, respectively |
Step 3: | The start time Tstartmin and Tstartmax of each node has been determinate according to the as soon and to the as late scheduling, respectively |
Step 4: | In this step, for each node (ni) if Tstartmax(ni)>Tstartmin(ni) then it will be attributed to (ni) a new clock running with a frequency Fi such as: |
(7) |
where, L(ni) is the latency of the node (ni) in terms of number of cycle.
Pseudo-code:
Fig. 1: | Proposed approach |
Fig. 2: | Input system |
Fig. 3: | Graph of input system |
Fig. 4(a-b): | Scheduling (a) As soon as possible and (b) As late as possible |
Table 1: | Components latencies |
Illustrative example: To further clarify the algorithm, it has been applied on the below system shown in Fig. 2. Assuming that the master clock has a frequency equal to F1.
Table 1 represents the latencies in terms of number of cycles of each component.
First, the system, shows in Fig. 2 has been transformed to a graph as represents Fig. 3.
Second, the as soon and as late scheduling algorithms have been applied as shows Fig. 4.
Third, the Tstartmin and Tstartmax of each node have been calculated to obtain:
Fourth, Tstartmax (C2)>Tstartmin (C2) and Tstartmax (C3)>Tstartmin (C3), for that, the Eq. 7 will be applied to calculate the new clocks frequencies of both nodes C2 and C3, respectively:
F2= (0+4/3+4)×F1 = (4/7) F1
F3= (4+2/7+2)×F1 = (2/3) F1
According to the results above, the clock frequency of C2 and C3 have been decreased to (4/7)F1 and (2/3)F1, respectively. That decrease will be automatically followed by an increase in clock period which will lead to a simultaneous decrease in power consumption.
To prove the efficiency of the proposed technique concretely, the gain of dynamic power before and after the secluding algorithm has been calculated.
Dynamic power before secluding: Since all components are running with the master clock, dynamic power is as follows:
(8) |
Dynamic power after secluding: After scheduling only C1, C4 and C5 still run with the master clock. However, C2 and C3 use a new clocks frequencies which are F2 = (4/7) F1, F3 = (1/3) F1, respectively. Thus, the dynamic power is:
(9) |
Based on Eq. 8 and 9 we deduce that the proposed approach managed to bring the consumed power into a lower level attaining a decrease of 16% compared to the classical technique. This interesting gain proves the effectiveness of the new technique compared to the other one in terms of dynamic power.
Experiment: In this part, the proposed algorithm has been applied on an application that converts RGB color to HMMD in order to carry out a comparative study about dynamic power dissipating by the system with applying the proposed technique and compared to a system in which we dont use it. The study will cover three main fields which are power, response time and resources. It is meant to prove the efficacy of the presented technique and to illustrate the gain in dynamic power. For that, the following tools consisting of nexys 2 based on Spartan 3E the electronic board, the synthesis tool Xilinx ISE (Version 12.3) and the simulation tool ModelSim (Version 6.5) have been used. The application target is to convert an RGB to HMMD color. In fact, the color structure descriptor which is widely used in the area of image processing and compression, must be defined using the HMMD color space. Hence, the color pixels of any other color spaces like RGB must be converted to the HMMD color space.
Table 2: | Frequencies of each component after and before scheduling |
Fig. 5: | Block diagram presenting conversion of RGB color space to HMMD color space |
From here comes the necessity of the suggested application which convert the RGB color space to HMMD color space. Figure 5 represents the block diagram.
In this part, we are interested in the red block which is the converter block. That block consists of eight blocks as shown in Fig. 6. It consists of many inputs such as "St" which is meant to activate the application, when it is valid the application starts its treatment. The end of the treatment is presented when the "End" input is valid.
Frequencies before and after scheduling: After applying the new scheduling algorithm on the Fig. 6, the frequencies of Adder2, Devider2 and synchronous block have been changed while the other components have kept the same old frequency. Table 2 indicates the old and new frequencies of each component.
RESULTS
Design results at simulation level: In this part, the simulation results have been compared in terms of functionality, response time and power dissipating.
Comparison in terms of functionality: Figure 7 and 8 represents the simulation results of the system before and after scheduling, respectively.
According to the Fig. 7 and 8, the same functionality of the target system has been kept.
Comparison in terms of response time: The response time is the time elapsed from the moment where the application starts its execution until it gives final response. Practically speaking, from the moment where St is active until End is valid, the Fig. 9 and 10 represents the response time of the application before and after scheduling, respectively.
Fig. 6: | Target application |
Fig. 7: | Outputs of the system before scheduling |
Fig. 8: | Outputs of the system after scheduling |
Fig. 9: | Before scheduling |
According to Fig. 9 and 10 we figure out that while the response time of the target application before scheduling is 266 nsec, the given response after scheduling is 282 nsec. Unfortunately, the proposed technique has produced a loss of 6% in response time but that loss may seem to be unimportant if it makes us to meet power optimization.
Comparison in terms of dynamic power: Dynamic power dissipated before scheduling:
(10) |
Dynamic power dissipated after scheduling:
(11) |
Gain in dynamic power:
(12) |
Fig. 10: | After scheduling |
Table 3: | Design result |
Table 4: | Power design result |
Table 5: | Resources design results |
Based on Eq. 10-12, the described technique provides a low power consuming system. In fact it offers a gain of 18.75% in dynamic power compared to a system on which we have not applied our technique. To recapitalize Table 3 has been presented.
Design results at physical level: To further approve the efficiency of the proposed technique, a physical comparison in terms of power and resources between a classical systems on which we have not applied our technique and another one using our technique, has been applied.
Power design results: Table 4 summarizes the power consumed by the two systems, before and after scheduling, notably the gain obtained using the proposed algorithm. Noting that, in this application the execution frequency equal to 50 MHz.
A further indicator of the efficacy of the proposed technique is the fact that dissipated power before scheduling is as high as 4.13 MW whereas, in the system on which we have used the proposed technique there is a decrease with the dissipated power reaching a level of only 3.14 MW. In other word, our technique provides an interesting gain of dynamic power which is 23.79%. Therefore, researchers and designers looking for less power consumption should opt for this more efficient technique.
Resources design results: It is evident that to optimize important parameters like power, other parameters such as area will be influenced. Table 5 presents the obtained loss.
In Table 5, there is a loss of 15.26% in number of slices and 17.22% in number of slice Flip Flop. These lost values become insignificant compared to the gain obtained in power consumption.
DISCUSSION
To preserve power, there exist many solutions like designing batteries with longer life cycles or coming up with new architectures. As most of research groups dont have the necessary sources to do that they have chosen two ways in order to attain their aim which is power minimization. One of the most commonly used techniques is to use scheduling algorithms.
In fact, Fan et al. (2013) have proposed a dual-threshold voltage assignment algorithm in order to reduce the static power of the circuit. This algorithm that uses static timing analysis gets the timing information of each node by double traverse. In addition, it assigns the threshold voltage of all nodes through primary optimization and accurate one. In other words, this algorithm is meant to replace maximum number of high-Vt gates from low-Vt and reduce the static power as high as possible. As demands on cost and performance in modern digital system design have been hugely increasing, Cong et al. (2011) have proposed a new algorithm that handles general array access beyond affine array references. That algorithm offers a technique of an automatic memory partitioning in order to improve throughput and minimize energy consumption of pipelined loop kernels for platform requirements and given throughput constraints. So, the scheduling algorithm is a widely used method that provides a gain in power. For that, many other authors have opted for that technique. For instance, Babu and Shunmugalatha (2013) have presented a self adaptive firefly algorithm in order to solve the economic load dispatch problem. This heuristic numeric optimization algorithm which is inspired by the behavior of fireflies has appeared to be a reliable and a robust technique. Additionally, Wang et al. (2010) have developed a scheduling heuristics and presented an application experience that manages to reduce power consumption of parallel tasks in a cluster with the Dynamic Voltage Frequency Scaling technique. Practically speaking, they have studied the slack time for non-critical jobs then extended their execution time to reduce energy consumption without influencing the whole tasks execution time. Besides, Kuhn and Mashaly (2013) have used the self adapting algorithms as well to minimize power consumption. In fact, they have introduced a novel methodology that optimizes the system parameters all along with respecting the thresholds for the indicate delay. In addition, Choudhury and Pradhan (2012) who have used the genetic algorithm have proposed a probabilistic power model of power-gated design of FSM to solve the dilemma of encoding and bi-partitioning in order to optimize power consumption. So using scheduling algorithms, certain power minimization can be attained. One may, nevertheless, wonder about the advantage of our algorithm compared to other existing algorithms. Actually, each work has its own benefits; some works have focused only on power optimization while other works have focused on the optimization of the response time or the area constraint. Our algorithm is characterized by its simplicity and efficacy. In other word, it is unsophisticated and easy to use and guaranty a less consuming system.
CONCLUSION
Power optimization has become a high priority in electronic world. For that reason, researchers and designers have augmented their efforts to find new solution in order to reduce power. One of the most important ways is the use of scheduling algorithms. For that purpose, we have proposed a new scheduling algorithm that achieves components scheduling in order to bring a reduction in power consumption. In fact, power design results have proved the efficiency of the novel technique which should be used by designers in order to obtain less consuming systems.