Nehru group of institutions Rajeshwari Banakar
Download 104.13 Kb. Pdf ko'rish
|
Scratchpad Memory A Design Alternative for Cache O (1) (2)
IS READING Nehru group of institutions Rajeshwari Banakar Engineering Embedded systems 2788 Technical University Dortmund 551 VIEW PROFILE Peter Marvedel QUOTES Notebook Memory: An Alternative Design for Cache Memory Stephen Steinke Book: Embedded Systems Design View Project 316 PUBLICATIONS 4887 CITATIONS 25 PUBLICATIONS 1848 CITATIONS 413 PUBLICATIONS 7 182 CITATIONS The user has requested an improvement on the uploaded file. 62 PUBLICATIONS 1338 CITATIONS M. Balakrishnan DOI: 10.1145/774789.774805 Source: CiteSeer Machine Translated by Google See discussions, statistics, and author profiles for this publication at: https://www.researchgate.net/publication/2589679. VIEW PROFILE 5 authors, including: Energy efficiency in IT View project BV Bhoomaraddi College of Technology (BVBCET) Article May 2002 Some of the authors of this publication are also working on the following related projects: VIEW PROFILE VIEW PROFILE All content following this page has been uploaded by Peter Marveldel. February 25, 2014 Machine Translated by Google marwedel@ls12.cs.uni-dortmund.de University 1. Introduction Indian Technological Institute, Delhi 110 016 steinke lee mbala@cse.iitd.ernet.in Dortmund, sec. Informatics Otto-Hahn-Strasse 16 44221 Dortmund, Germany Rajeshwari Banakar, Stefan Steinke, Bo-Sik Lee, M. Balakrishnan, Peter Marveledel banakar 2 Notepad memory focused on on-chip scratchpad memory for further work. either by the user or automatically by the compiler using 1. To support the comparison of memory systems, we create area models from 25% to 45% of the total chip power [2]. Recently there has been interest packaging (which is supported by the compiler) to match memory display of program elements is performed during In this article, we consider the problem of choosing the built-in memory 2 we explain the RAM area and energy models. supported by an efficient compiler. Modern built-in are matched against zeros. area was not considered. This article compares models 2. We are developing a systematic framework for evaluating the ratio light weight and low power consumption. multimedia applications, various applications. We turn on the power consumption of the main memory are calculated using the CACTI tool, and chip using static RAM consumes power in the range 5 contains the results. In Section 6, we draw conclusions and also clarify energy consumption by 40%. In addition, the average area-time reduction for main memory was 46% of cache.1 1 Experimental environment requires the use of an algorithm controllers have built-in notepad memory. On systems with cache In particular, we address the following issues evaluation, AT91M40400 was chosen. The results clearly show that memory occupies more than 50% of the total area of the crystal [1]. Usually this on the other hand, they can only replace caches if they column schemas. This model is designed to take into account that memory objects behavior of RAM for embedded systems, the impact on calculated for different cache and notepad sizes. Area and energy for various sizes of RAM and cache A distinctive feature of portable devices is for various sizes of cache and RAM for 3. Finally, we report performance and power consumption cache alternatives. implies a reduction in the total switching capacity. Cache on Section 4 describes the methodology and experimental setup, while section power consumption in most situations with an average derating The rest of the test is organized as follows. In chapter simulator traces. As the target processor selected for require an efficient memory design because the built-in energy consumed with each access to the cache and notepad, reduce power consumption and improve performance. WITH Notepad is a memory array with decoding and logic Although previous studies have been suitable algorithm. execution, whereas in systems with RAM it is done for resource-intensive applications, offering RAM as memory elements. notepad memory is a low cost alternative reduces the power consumption of the memory unit, since a smaller area In Section 3, we present the cache used in our work. cache/notepad areas along with their energy models. IN area and performance of cache/notepad based systems. processors, especially in the field of multimedia applications and graphics video processing, speech processing, DSP applications and wireless communications memory to study the power consumption of the entire system. performance is evaluated using results for various caches and RAM. Besides, Notepad Memory: An alternative design for onboard cache memory embedded systems Abstract 1This project is supported by project number DST-DAAD. ISS 216 Machine Translated by Google Machine Translated by Google Memory array Column scheme 2 Notepad Edecoder Ememcol Asde Asda Asko Aspr Asse Asu _ Ememcol Cmemcol V dd P01 Cmemcol ncols Cpre Creadwrite Decoder Static RAM with six transistors This contributes to the reduction of energy and area. ememcol memory. A similar analysis is performed for the decoder circuit, taking lines, a bit and bit strip, and one line of words. Complete (4) for each block. Then the energy is evaluated. As an example Figure 1: Temporary Memory Array during pre-charge, and Creadwrite is efficient in the decoder path. This leads to energy dissipation in the path main memory. Thus, we do not need to check for the presence energy dissipated in a memory cell. Thus column. Let As be the RAM area. and a memory location in 1(b). Cmemcol in equation (3) is the block capacity are being charged, and during actual reading/writing, one side words of all will be power consumption is per memory array unit. Procedure, Cmemcol is calculated from equation (4). This is the sum of the capacities pad at the last stage of the compiler. Here it is assumed that taking into account the different switching activity at the inputs of each the organization of the notebook is shown in fig. 2. we will only describe the energy calculation for the memory array. load capacity when reading/writing a cell. ncols - number decoder. Transition at the last stage, i.e. at the driver stage data/instructions in notepad. This reduces the comparator and circuit memory array. P01 is taken as 0.5 - probability read/write. When accessing RAM, the address decoder first decodes the address bits to find the desired string. Notepad memory power consumption can be estimated by A static RAM cell with 6 transistors is shown in fig. 1(c). bit lines are omitted. So the energy is dissipated Figure 2: Organization of RAM (1) used in the CACTI tool to estimate consumption for pre-charge and read access to operational RAM occupies one separate part of the address (2) cascade. From the organization shown in Fig. 2, the area of the notepad represents columns in memory. word string, triggers a switch on the word string. Regardless signal pass/hit confirmation. in sense amplifiers, column multiplexers, output driver circuits and memory cells due to the word bus circuit, precharge circuit and bit line circuit. Main bit switching. energy consumption of its components, i.e. decoder Edecoder and columns The cell has one R/W port. Each cell has two bits in bitlines due to pre-charge and access for energy, is to first calculate the capacitances memory. Cpre is the effective load capacity of bit lines A transition in the address bits causes the capacitances to charge and discharge. memory space, and the rest of the space is occupied by The energy in the memory array consists of the energy consumed Consider the energy dissipation of Ememcol. It consists is the sum of the area occupied by the decoder, the data array and the circuit The cell of the scratchpad memory array is shown in fig. 1(a). When preparing for access, bit lines are preliminarily on how many address bits will change, only two lines (3) a piece Machine Translated by Google Word choice a piece speakers, output drivers, logic pre-charge) (Sensing amplifiers, multiplexer (b) (with) (a) bit_bar Memory cell Memory array RAM Columns VddO Vdd Word choice bit_bar Unit where Asde, Asda, Asco, Aspr, Asse and Asou are the area of the data decoder, area of data array, column multiplexer, precharge, respectively. data readout amplifiers and output driver blocks Machine Translated by Google temporal and spatial locality of memory access. The equations are derived based on [4]. The only allowable case is access for reading or writing. SPaccess is the number of accesses to Rice. 3. Cache organization[4] is the sum of the area occupied by the array of tags (Atag) and data array (Adata). , From the organization shown in Fig. 3, cache area (Ac) Memory. The analysis is similar to the analysis described for the notebook. is the sum of the energy costs of all the above components. CACTI [4] performs power estimation at the transistor level. The power consumption per cache access is etc. (8) The basic organization of the cache is taken from [4] and is shown in fig. 3. obtained from our analytical notebook model. cache reads, read misses, write hits, and write misses. You can get the number of hits from the trace file random access memory. Notepad is energy for access Caches are mainly used to use 4.2 Accessing the cache 4.1 Access to RAM multiplexer, respectively. tags, tag array, column multiplexer, precharge, sense amplifiers, tag comparators and driver blocks where Adt, Ata, Aco, Apr, Ase, Acom and Amu are the area of the decoder block switched. One will be logical 0 and the other will be logical 1. Clock cycle estimate based on ARMulator trace output clock cycles, the lower the performance. Wherein predicted for RAM can only increase it is assumed that changing the onboard memory configuration (cache/notebook and its size) does not change the clock period. This assumption, although restrictive, does not affect for cache or RAM. It is assumed that this in the event that both of them affect the clock period. to our results. This is because we always compare directly reflects performance, i.e. the larger the number Identification and assignment of important data structures in notepad cache the same size as RAM, and cache latency were based on the packing algorithm briefly described in 4.3. will always be higher. So performance improvement, memory implemented using the same technology, Atag and Adata are calculated using the area of its layout. Machine Translated by Google 4 Overview of our methodology 3 Cache The domain model that we use in our work is based on on the number of transistors in the circuit. All transistor counters calculated on the basis of circuit designs. Esptotal SPAccess Esccratchpad (5) ÿÿÿ Esptotal — In the case of a notepad, unlike the cache, we don't have write miss events and read miss events. Atag Adt Ata Ako Apr Ase Akom Amu (7) is the total energy expended on notepad memory. Adata Ade Ada Akol Apre Asen Aut where Ade, Ada, Acol, Apre, Asen, Aout is the area of the decoder block White Atag Adata (6) data, data array, column multiplexer, precharge, data readout amplifiers, and output driver blocks, respectively. Power estimation can be done at different levels, from transistor to architectural [ access to the notepad for reading or writing. If it's 16 bit is the number of accesses multiplied by the energy Since the scratchpad is assumed to occupy a portion of the entire memory address space, from the address values obtained by the trace analyzer, the access is classified as a scratchpad or memory access, and the total program latency is added access to main memory, then we take it as one cycle plus 1 per access, as described in Equation 5. main memory, we consider it as one cycle plus 3 waiting state (see Table 1). If it's a 32-bit access to waiting states. Total time in clock cycles used corresponding delay. One cycle is assumed if it is to determine performance. Notepad power consumption From the trace file, you can make a performance evaluation. Machine Translated by Google Access Pad size to learn design and new optimization techniques. Input data for L Main memory 32 bits 1 cycle + 3 wait states there is a cache read hit, the data is read from the cache. additional transitions associated with the display of successive member of the ATMEL AT91 family of 16/32-bit microcontrollers cache tag (to establish a miss) is followed by an entry in the main l 1 The compiler output is ARM binary, which can the number of cache reads, and Ncwrite is the number of writes blocks of code and data in RAM. This algorithm identifies 0 block diagram used in our work to compare the operational assigned to a notepad. Cache read miss: When a cache read miss occurs, it is The number of cycles From this data, we calculate the number of cache hits based on cycles. Projected area and energy Cache as a parameter and generates performance as a quantity code for ARM7 core. This is the research compiler used data, an array of cache tags is accessed. If memory address space. Price 1 as our target architecture. AT91M 40400 is model, we consider four cases of cache access. cache read followed by L words to be written to cycles. In the trace analyzer, we model the cache as described above, and Read miss Where Ecache is the energy spent in the cache. ncread is packing algorithm, known as the knap sack algorithm [5], to accommodate In this subsection, we will explain the experimental setup and which are likely to provide the most energy savings, 0 power consumption. It has 4 KB built-in RAM. resulting blocks of instructions and data that are frequently accessed and is a high performance RISC processor with very low 0 For on-chip cache configuration, ARMulator accepts cache size in reading main memory of size L without writing to main memory. On fig. 4 shows a block diagram. The Energy Aware (encc) compiler [7] generates Notepad 1 cycle Cache read hit: when the processor needs some cache. The energy E is calculated as in the equation frequently used blocks of data and instructions and matches them with 1 Using this model, we get the cache energy equation as Cache write miss : in case of a cache write miss after a read (3), taking into account the corresponding load and the number of shown in Table 1. The cache is write-through. ÿ ours from main memory to cache. In this case, we have an operation 0 (9) used for reading or writing. in C. As an option, after passing encc, it uses a special Table 1: Memory access cycles 1 blocks to notepad and main memory is taken into account by the algorithm. IN based on the embedded ARM7TDMI processor. This processor is 1 be simulated by ARMulator to create a trace file. ARM7TDMI comes with a 32-bit data path and two instruction sets. memory. In this case, there is no cache refresh. cache, where L is the row size. Therefore, an event will occur use it in our performance and power consumption estimates. on-chip cache memory. We use AT91M 40400 in Table 2: Cache Interaction Model Figure 4: Experimental block diagram means the data is not in the cache and the line needs to be wrapped table 2, where the number of cycles required for each type of access, this compiler is an application benchmark written 0 No cache writes are made, and main memory is not 0 0 Ecache Ncread Ncwrite E Access Type Cread Cawrite Mmread Mmwrite Cache Entry Hit: If there is a cache entry hit, we have a cache entry followed by a main memory entry. Energy Aware Machine Translated by Google Energy Ratings Trace analysis Compiler support Display algorithm The number of cycles The number of cycles Cache notebook Analysis Compiler Analytical Model Area estimate CACTUS Cash/Scratch ARMulator traces 4.3 Experimental setup and block diagram write a hit Write Miss C benchmark test Read hit Use of table 2 Main memory 16 bits 1 cycle + 1 wait state 1 0 Machine Translated by Google l l l l l l 0 0 The product of area times time AT is calculated using notepad cache transistors) (number Area notepad based on memory 6 Conclusion and future work 5 results 100000 150000 800 1000 1200 1400 1600 1800 2000 2200 600 200000 250000 200 400 50000 X Table 4: Access energy for various devices Machine Translated by Google quick sort region nJ to Energy Main memory read access, 2 bytes 24.00 nJ Main memory read access, 4 bytes 49.30 nJ Main memory write access, 4 bytes 41.10 nJ Bisquare matrix cartoon To demonstrate the benefits of using Table 3 shows the trade-off between area and can be reduced by 46% (on average) by repeated on the ATMEL evaluation board. To compare power consumption, we as well as the power consumption of the built-in memory. In this article, we presented an approach to choosing configurations Rice. 5. Comparison of cache memory and RAM compilation step. We use double sided set as is the number of CPU cycles per 1000 s for memory systems based on The average reduction in area, time and product AT is Access cache (2 KB) 2048 bytes were obtained from the models in section 2 and 3, the values of the main cache size, except for quicksort with a cache size of 256 show that notepad-based compile-time memory RAM for different sizes. We get that on average column 8 indicates the improvement in the area-time product AT (at built-in RAM and cache, we spent a series of performance. Column 1 is the size of the notepad or cache in the power consumption of the main memory must also be taken into account. On fig. 6 shows energy consumption for biquad, matrixmult examples built-in memory. The article presents a comprehensive social cache configurations for comparison. Area represented cache and RAM, respectively. Column 6 34%, 18% and 46% respectively for this example. 1.53 nJ memories were obtained from real measurements on the ATMEL board [5] . byte. On average, we found that energy consumption is reduced outperforms cache-based runtime memory by nearly Figure 6: Energy consumed by the memory system Our Cycle Considerations When Evaluating Performance the area occupied by the notebook is less than the cache memory by 34%. condition of constant cycle time). experiments for both of these configurations. Trace Analysis bytes. Columns 2 and 3 represent the cache area and AT kak NsAc Nc The energy required for access by various devices is indicated and quicksort for both cache and notepad. In all cases we methodology for calculating area, energy and productivity based on the CACTI model [4] for 0.5 µm technology. The models themselves are described in the number of transistors. They are obtained from the cache and organization shows the reduction in area due to the replacement of the cache with the operational one Access notepad (2 KB) by 40% when using RAM. all parameters. We see that the product of area times time (AT) based on static RAM chips installed Thus, we take into account the power consumption of the main memory, and , for notepad and cache was done during the design process after RAM in transistors. Columns 4 and 5 represent (10) 4.57 nJ in Table 4. Cache and RAM values for size we observe that the notebook consumes less energy with the same for various sizes of cache and RAM. results sections 2 and 3. notepad. On fig. 5 shows a comparison of cache area and memory, and column 7 corresponds to a decrease in the number of cycles. IN 30000 120000 & & 45000 & @ 135000 64 128 256 512 1024 2048 & @ Cache/notepad size in bytes 100005 75000 @ 60000 15000 @ & Cache @ & 150000 @ 90000 Size in bytes X X X X X X X X X Machine Translated by Google How Ns Nk Recommendations area-time Region Marvelel, Comparison of Cache-Based Memory Systems 2048 4032 0,40 VDAT 2001 Bangalore August 2001 Optimization and Research, Kluwer Academic Publishers, 1999. 241,5 1024 memory and built-in memory capacity are 0,20 [10] T. Ishihara and H. Yasuura: Energy Reduction Method 0,28 53444 [4] S. Wilton and Norm Juppi: Cacti: an extended access model and Average 38630 237,9 0,61 (DATE 2000), March 2000. 0,10 128 264,0 the power consumption of systems with RAM is less than embedded DRAM memory, physical design and issues and notebook in terms of performance, area and 7104 Area CPU cycles CPU cycles 0,33 0,37 [7] ls12-www.cs.uni-dortmund.de/research/encc Time [3] V. Zivoinovich, J. Velarde and K. Schlager: DSPStone: Methodology 347,5 6744 with object code fusion for embedded processors 0,44 limiting factors for many applications, should 102852 0,28 [9] J. Keene, M. Gupta, and W. H. Mangoni-Smith: Filter Cache: cycle time, IEEE Journal of Solid State Circuits, May 1996. 192,0 74680 0,55 [2] Preity Ranjan Panda, Nikhil Dutt, Alexandru Nicolaou: Issues 256 0,10 242,6 energy consumption, Technical Report 762, Dortmund systems with cache memory. In the application under consideration, the average Applications, IEEE Design and Test of Computers, Vol 18 Number 3, 14306 Notepad Cache Cache [8] Luca Benini, Alberto Masii, Enrico Masii, Massino Ponchino: 0,34 0,18 benchmarking, DSP oriented, Shortcut Notepad Shortcut Product 0,51 11238 239,9 study the comparison of DRAM-based memory. Models for specific applications, Automation Materials 481,9 placing the cache next to the RAM. We discovered, 0,28 energy saving memory structure, IEEE Micro-30, December 1997 [5] Rajeshwari Banakar, S. Steinke, B. S. Lee, M. Balakrishnan, and P. 192,0 142224 with memory in embedded systems-on-a-chip - 0,57 512 the decrease was 40%. Because the throughput 241,7 0,21 Page 7-15, May/June 2001. university, September 2001 0,31 26722 Synthesizing Specialized Memory for Power Optimization in Embedded Systems, DAC 2000, Los Angeles, CA, pp. 300-303. 0,54 Proceedings of the 5th International Conference on Signal Processing Applications and Technologies, October 1994 21586 0,55 237,9 the power consumption of cache memory and notepad should be confirmed by real measurements. design and testing, European conference 302,4 64 [1] Doris Keitel-Skulz and Norbert Wen., Engineering Technology 0,21 which is common for most applications and memory configurations Low Power - Abstraction Layers and Methods RTL design, Seminar on testing and design of VLSI, [6] Rajeshwari M. Banakar, Ranjan Bose, M. Balakrish nan: Design Region Machine Translated by Google View publication statistics I Table 3: Area/capacity ratio for bubble sort in bytes Size Machine Translated by Google Download 104.13 Kb. Do'stlaringiz bilan baham: |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling