Nehru group of institutions Rajeshwari Banakar

Sana	04.04.2023
Hajmi	104.13 Kb.
	#1326275

Bog'liq
Scratchpad Memory A Design Alternative for Cache O (1) (2)

IS READING
Nehru group of institutions
Rajeshwari Banakar
Engineering
Embedded systems
2788
Technical University Dortmund
551
VIEW PROFILE
Peter Marvedel
QUOTES
Notebook Memory: An Alternative Design for Cache Memory
Stephen Steinke
Book: Embedded Systems Design
View Project
316
PUBLICATIONS 4887 CITATIONS
25
PUBLICATIONS 1848 CITATIONS
413
PUBLICATIONS 7 182 CITATIONS
The user has requested an improvement on the uploaded file.
62
PUBLICATIONS 1338 CITATIONS
M. Balakrishnan
DOI: 10.1145/774789.774805 Source: CiteSeer
Machine Translated by Google
See discussions, statistics, and author profiles for this publication at:
https://www.researchgate.net/publication/2589679.
VIEW PROFILE
5 authors,
including:
Energy efficiency in IT
View project
BV Bhoomaraddi College of Technology (BVBCET)
Article
May 2002
Some of the authors of this publication are also working on the following related projects:
VIEW PROFILE
VIEW PROFILE
All content following this page has been uploaded
by Peter Marveldel.
February 25, 2014
Machine Translated by Google

marwedel@ls12.cs.uni-dortmund.de University
1. Introduction
Indian Technological
Institute, Delhi 110 016 steinke lee
mbala@cse.iitd.ernet.in
Dortmund, sec. Informatics Otto-Hahn-Strasse 16 44221 Dortmund, Germany
Rajeshwari Banakar, Stefan Steinke, Bo-Sik Lee, M. Balakrishnan, Peter Marveledel banakar
2 Notepad memory
focused on on-chip scratchpad memory for
further work.
either by the user or automatically by the compiler using
1. To support the comparison of memory systems, we create area models
from 25% to 45% of the total chip power [2]. Recently there has been interest
packaging (which is supported by the compiler) to match
memory display of program elements is performed during
In this article, we consider the problem of choosing the built-in memory
2 we explain the RAM area and energy models.
supported by an efficient compiler. Modern built-in
are matched against zeros.
area was not considered. This article compares models
2. We are developing a systematic framework for evaluating the ratio
light weight and low power consumption. multimedia applications,
various applications. We turn on the power consumption of the main
memory are calculated using the CACTI tool, and
chip using static RAM consumes power in the range
5 contains the results. In Section 6, we draw conclusions and also clarify
energy consumption by 40%. In addition, the average area-time reduction for main memory was 46% of cache.1
1
Experimental environment requires the use of an algorithm
controllers have built-in notepad memory. On systems with cache
In particular, we address the following issues
evaluation, AT91M40400 was chosen. The results clearly show that
memory occupies more than 50% of the total area of the crystal [1]. Usually this
on the other hand, they can only replace caches if they
column schemas. This model is designed to take into account that memory objects
behavior of RAM for embedded systems, the impact on
calculated for different cache and notepad sizes.
Area and energy for various sizes of RAM and cache
A distinctive feature of portable devices is
for various sizes of cache and RAM for
3. Finally, we report performance and power consumption
cache alternatives.
implies a reduction in the total switching capacity. Cache on
Section 4 describes the methodology and experimental setup, while section
power consumption in most situations with an average derating
The rest of the test is organized as follows. In chapter
simulator traces. As the target processor selected for
require an efficient memory design because the built-in
energy consumed with each access to the cache and notepad,
reduce power consumption and improve performance. WITH
Notepad is a memory array with decoding and logic
Although previous studies have been
suitable algorithm.
execution, whereas in systems with RAM it is done
for resource-intensive applications, offering RAM as
memory elements.
notepad memory is a low cost alternative
reduces the power consumption of the memory unit, since a smaller area
In Section 3, we present the cache used in our work.
cache/notepad areas along with their energy models. IN
area and performance of cache/notepad based systems.
processors, especially in the field of multimedia applications and graphics
video processing, speech processing, DSP applications and wireless communications
memory to study the power consumption of the entire system.
performance is evaluated using results
for various caches and RAM. Besides,
Notepad Memory: An alternative design for onboard cache memory
embedded systems
Abstract
1This project is supported by project number DST-DAAD. ISS 216
Machine Translated by Google
Machine Translated by Google

Memory array
Column scheme
2
Notepad Edecoder Ememcol
Asde Asda Asko Aspr Asse Asu _
Ememcol Cmemcol V dd P01
Cmemcol ncols Cpre Creadwrite
Decoder
Static RAM with six transistors
This contributes to the reduction of energy and area.
ememcol memory.
A similar analysis is performed for the decoder circuit, taking
lines, a bit and bit strip, and one line of words. Complete
(4)
for each block. Then the energy is evaluated. As an example
Figure 1: Temporary Memory Array
during pre-charge, and Creadwrite is efficient
in the decoder path. This leads to energy dissipation in the path
main memory. Thus, we do not need to check for the presence
energy dissipated in a memory cell. Thus
column. Let As be the RAM area.
and a memory location in 1(b).
Cmemcol in equation (3) is the block capacity
are being charged, and during actual reading/writing, one side
words of all will be
power consumption is per memory array unit. Procedure,
Cmemcol is calculated from equation (4). This is the sum of the capacities
pad at the last stage of the compiler. Here it is assumed that
taking into account the different switching activity at the inputs of each
the organization of the notebook is shown in fig. 2.
we will only describe the energy calculation for the memory array.
load capacity when reading/writing a cell. ncols - number
decoder. Transition at the last stage, i.e. at the driver stage
data/instructions in notepad. This reduces the comparator and circuit
memory array. P01 is taken as 0.5 - probability
read/write. When accessing RAM, the address decoder first decodes the address bits to find the
desired string.
Notepad memory power consumption can be estimated by
A static RAM cell with 6 transistors is shown in fig. 1(c).
bit lines are omitted. So the energy is dissipated
Figure 2: Organization of RAM
(1)
used in the CACTI tool to estimate consumption
for pre-charge and read access to operational
RAM occupies one separate part of the address
(2)
cascade.
From the organization shown in Fig. 2, the area of the notepad represents
columns in memory.
word string, triggers a switch on the word string. Regardless
signal pass/hit confirmation.
in sense amplifiers, column multiplexers, output driver circuits and memory cells due to the word
bus circuit, precharge circuit and bit line circuit. Main
bit switching.
energy consumption of its components, i.e. decoder Edecoder and columns
The cell has one R/W port. Each cell has two bits
in bitlines due to pre-charge and access for
energy, is to first calculate the capacitances
memory. Cpre is the effective load capacity of bit lines
A transition in the address bits causes the capacitances to charge and discharge.
memory space, and the rest of the space is occupied by
The energy in the memory array consists of the energy consumed
Consider the energy dissipation of Ememcol. It consists
is the sum of the area occupied by the decoder, the data array and the circuit
The cell of the scratchpad memory array is shown in fig. 1(a).
When preparing for access, bit lines are preliminarily
on how many address bits will change, only two lines
(3)
a piece
Machine Translated by Google
Word choice
a piece
speakers, output drivers, logic
pre-charge)
(Sensing amplifiers, multiplexer
(b)
(with)
(a)
bit_bar
Memory cell
Memory array
RAM
Columns
VddO
Vdd
Word choice
bit_bar
Unit
where Asde, Asda, Asco, Aspr, Asse and Asou are the area of the data decoder,
area of data array, column multiplexer, precharge,
respectively.
data readout amplifiers and output driver blocks
Machine Translated by Google

temporal and spatial locality of memory access.
The equations are derived based on [4].
The only allowable case is access for
reading or writing. SPaccess is the number of accesses to
Rice. 3. Cache organization[4]
is the sum of the area occupied by the array of tags
(Atag) and data array (Adata).
,
From the organization shown in Fig. 3, cache area (Ac)
Memory.
The analysis is similar to the analysis described for the notebook.
is the sum of the energy costs of all the above components.
CACTI [4] performs power estimation at the transistor level. The power consumption
per cache access is
etc.
(8)
The basic organization of the cache is taken from [4] and is shown in fig. 3.
obtained from our analytical notebook model.
cache reads, read misses, write hits, and write misses.
You can get the number of hits from the trace file
random access memory. Notepad is energy for access
Caches are mainly used to use
4.2 Accessing the cache
4.1 Access to RAM
multiplexer, respectively.
tags, tag array, column multiplexer, precharge,
sense amplifiers, tag comparators and driver blocks
where Adt, Ata, Aco, Apr, Ase, Acom and Amu are the area of the decoder block
switched. One will be logical 0 and the other will be logical 1.
Clock cycle estimate based on ARMulator trace output
clock cycles, the lower the performance. Wherein
predicted for RAM can only increase
it is assumed that changing the onboard memory configuration (cache/notebook
and its size) does not change the clock period. This assumption, although restrictive,
does not affect
for cache or RAM. It is assumed that this
in the event that both of them affect the clock period.
to our results. This is because we always compare
directly reflects performance, i.e. the larger the number
Identification and assignment of important data structures in notepad
cache the same size as RAM, and cache latency
were based on the packing algorithm briefly described in 4.3.
will always be higher. So performance improvement,
memory implemented using the same technology,
Atag and Adata are calculated using the area of its layout.
Machine Translated by Google
4 Overview of our methodology
3 Cache
The domain model that we use in our work is based on
on the number of transistors in the circuit. All transistor counters
calculated on the basis of circuit designs.
Esptotal SPAccess Esccratchpad (5) ÿÿÿ Esptotal —
In the case of a notepad, unlike the cache, we don't have write miss events
and read miss events.
Atag Adt Ata Ako Apr Ase Akom Amu (7)
is the total energy expended on notepad memory.
Adata Ade Ada Akol Apre Asen Aut
where Ade, Ada, Acol, Apre, Asen, Aout is the area of the decoder block
White Atag Adata (6)
data, data array, column multiplexer, precharge, data readout
amplifiers, and output driver blocks, respectively. Power estimation
can be done at different levels, from transistor to architectural [
access to the notepad for reading or writing. If it's 16 bit
is the number of accesses multiplied by the energy
Since the scratchpad is assumed to occupy a portion of the entire memory
address space, from the address values obtained by the trace analyzer, the access
is classified as a scratchpad or memory access, and the total program latency is
added
access to main memory, then we take it as one cycle plus 1
per access, as described in Equation 5.
main memory, we consider it as one cycle plus 3
waiting state (see Table 1). If it's a 32-bit access to
waiting states. Total time in clock cycles used
corresponding delay. One cycle is assumed if it is
to determine performance. Notepad power consumption
From the trace file, you can make a performance evaluation.
Machine Translated by Google

Access
Pad size
to learn design and new optimization techniques. Input data for
L
Main memory 32 bits 1 cycle + 3 wait states
there is a cache read hit, the data is read from the cache.
additional transitions associated with the display of successive
member of the ATMEL AT91 family of 16/32-bit microcontrollers
cache tag (to establish a miss) is followed by an entry in the main
l 1
The compiler output is ARM binary, which can
the number of cache reads, and Ncwrite is the number of writes
blocks of code and data in RAM. This algorithm identifies
0
block diagram used in our work to compare the operational
assigned to a notepad.
Cache read miss: When a cache read miss occurs, it is
The number of cycles
From this data, we calculate the number of cache hits based on
cycles. Projected area and energy
Cache
as a parameter and generates performance as a quantity
code for ARM7 core. This is the research compiler used
data, an array of cache tags is accessed. If
memory address space. Price
1
as our target architecture. AT91M 40400 is
model, we consider four cases of cache access.
cache read followed by L words to be written to
cycles. In the trace analyzer, we model the cache as described above, and
Read miss
Where Ecache is the energy spent in the cache. ncread is
packing algorithm, known as the knap sack algorithm [5], to accommodate
In this subsection, we will explain the experimental setup and
which are likely to provide the most energy savings,
0
power consumption. It has 4 KB built-in RAM.
resulting blocks of instructions and data that are frequently accessed and
is a high performance RISC processor with very low
0
For on-chip cache configuration, ARMulator accepts cache size in
reading main memory of size L without writing to main memory.
On fig. 4 shows a block diagram. The Energy Aware (encc) compiler [7] generates
Notepad 1 cycle
Cache read hit: when the processor needs some
cache. The energy E is calculated as in the equation
frequently used blocks of data and instructions and matches them with
1
Using this model, we get the cache energy equation as
Cache write miss : in case of a cache write miss after a read
(3), taking into account the corresponding load and the number of
shown in Table 1. The cache is write-through. ÿ ours
from main memory to cache. In this case, we have an operation
0
(9)
used for reading or writing.
in C. As an option, after passing encc, it uses a special
Table 1: Memory access cycles
1
blocks to notepad and main memory is taken into account by the algorithm. IN
based on the embedded ARM7TDMI processor. This processor is
1
be simulated by ARMulator to create a trace file.
ARM7TDMI comes with a 32-bit data path and two instruction sets.
memory. In this case, there is no cache refresh.
cache, where L is the row size. Therefore, an event will occur
use it in our performance and power consumption estimates.
on-chip cache memory. We use AT91M 40400 in
Table 2: Cache Interaction Model
Figure 4: Experimental block diagram
means the data is not in the cache and the line needs to be wrapped
table 2, where the number of cycles required for each type of access,
this compiler is an application benchmark written
0
No cache writes are made, and main memory is not
0 0
Ecache Ncread Ncwrite E
Access Type Cread Cawrite Mmread Mmwrite
Cache Entry Hit: If there is a cache entry hit, we have
a cache entry followed by a main memory entry.
Energy Aware
Machine Translated by Google
Energy Ratings
Trace analysis
Compiler support
Display algorithm
The number of cycles
The number of cycles
Cache
notebook
Analysis
Compiler
Analytical Model
Area estimate
CACTUS
Cash/Scratch
ARMulator traces
4.3 Experimental setup and block diagram
write a hit
Write Miss
C
benchmark test
Read hit
Use of table 2
Main memory 16 bits 1 cycle + 1 wait state
1 0
Machine Translated by Google

l
l
l
l
l
l
0 0
The product of area times time AT is calculated using
notepad
cache
transistors)
(number
Area
notepad
based
on
memory
6 Conclusion and future work
5 results
100000
150000
800 1000 1200 1400 1600 1800 2000 2200
600
200000
250000
200 400
50000
X
Table 4: Access energy for various devices
Machine Translated by Google
quick sort
region
nJ
to
Energy
Main memory read access, 2 bytes 24.00 nJ
Main memory read access, 4 bytes 49.30 nJ
Main memory write access, 4 bytes 41.10 nJ
Bisquare matrix cartoon
To demonstrate the benefits of using
Table 3 shows the trade-off between area and
can be reduced by 46% (on average) by repeated
on the ATMEL evaluation board. To compare power consumption, we
as well as the power consumption of the built-in memory.
In this article, we presented an approach to choosing configurations
Rice. 5. Comparison of cache memory and RAM
compilation step. We use double sided set as
is the number of CPU cycles per 1000 s for memory systems based on
The average reduction in area, time and product AT is
Access cache (2 KB)
2048 bytes were obtained from the models in section 2 and 3, the values of the main
cache size, except for quicksort with a cache size of 256
show that notepad-based compile-time memory
RAM for different sizes. We get that on average
column 8 indicates the improvement in the area-time product AT (at
built-in RAM and cache, we spent a series of
performance. Column 1 is the size of the notepad or cache in
the power consumption of the main memory must also be taken into account.
On fig. 6 shows energy consumption for biquad, matrixmult examples
built-in memory. The article presents a comprehensive
social cache configurations for comparison. Area represented
cache and RAM, respectively. Column 6
34%, 18% and 46% respectively for this example.
1.53 nJ
memories were obtained from real measurements on the ATMEL board [5] .
byte. On average, we found that energy consumption is reduced
outperforms cache-based runtime memory by nearly
Figure 6: Energy consumed by the memory system
Our Cycle Considerations When Evaluating Performance
the area occupied by the notebook is less than the cache memory by 34%.
condition of constant cycle time).
experiments for both of these configurations. Trace Analysis
bytes. Columns 2 and 3 represent the cache area and
AT kak NsAc Nc
The energy required for access by various devices is indicated
and quicksort for both cache and notepad. In all cases we
methodology for calculating area, energy and productivity
based on the CACTI model [4] for 0.5 µm technology. The models themselves are described in
the number of transistors. They are obtained from the cache and organization
shows the reduction in area due to the replacement of the cache with the operational one
Access notepad (2 KB)
by 40% when using RAM.
all parameters. We see that the product of area times time (AT)
based on static RAM chips installed
Thus, we take into account the power consumption of the main memory, and
,
for notepad and cache was done during the design process after
RAM in transistors. Columns 4 and 5 represent
(10)
4.57 nJ
in Table 4. Cache and RAM values for size
we observe that the notebook consumes less energy with the same
for various sizes of cache and RAM. results
sections 2 and 3.
notepad. On fig. 5 shows a comparison of cache area and
memory, and column 7 corresponds to a decrease in the number of cycles. IN
30000
120000
&
&
45000
&
@
135000
64 128 256 512 1024 2048
&
@
Cache/notepad size in bytes
100005
75000
@
60000
15000
@
&
Cache
@
&
150000
@
90000
Size in bytes
X
X
X
X
X
X
X
X
X
Machine Translated by Google

How
Ns
Nk
Recommendations
area-time
Region
Marvelel, Comparison of Cache-Based Memory Systems
2048
4032
0,40
VDAT 2001 Bangalore August 2001
Optimization and Research, Kluwer Academic Publishers, 1999.
241,5
1024
memory and built-in memory capacity are
0,20
[10] T. Ishihara and H. Yasuura: Energy Reduction Method
0,28
53444
[4] S. Wilton and Norm Juppi: Cacti: an extended access model and
Average
38630
237,9
0,61
(DATE 2000), March 2000.
0,10
128
264,0
the power consumption of systems with RAM is less than
embedded DRAM memory, physical design and issues
and notebook in terms of performance, area and
7104
Area CPU cycles CPU cycles
0,33
0,37
[7] ls12-www.cs.uni-dortmund.de/research/encc
Time
[3] V. Zivoinovich, J. Velarde and K. Schlager: DSPStone: Methodology
347,5
6744
with object code fusion for embedded processors
0,44
limiting factors for many applications, should
102852
0,28
[9] J. Keene, M. Gupta, and W. H. Mangoni-Smith: Filter Cache:
cycle time, IEEE Journal of Solid State Circuits, May 1996.
192,0
74680
0,55
[2] Preity Ranjan Panda, Nikhil Dutt, Alexandru Nicolaou: Issues
256
0,10
242,6
energy consumption, Technical Report 762, Dortmund
systems with cache memory. In the application under consideration, the average
Applications, IEEE Design and Test of Computers, Vol 18 Number 3,
14306
Notepad Cache Cache
[8] Luca Benini, Alberto Masii, Enrico Masii, Massino Ponchino:
0,34
0,18
benchmarking, DSP oriented,
Shortcut Notepad Shortcut Product
0,51
11238
239,9
study the comparison of DRAM-based memory. Models
for specific applications, Automation Materials
481,9
placing the cache next to the RAM. We discovered,
0,28
energy saving memory structure, IEEE Micro-30, December 1997
[5] Rajeshwari Banakar, S. Steinke, B. S. Lee, M. Balakrishnan, and P.
192,0
142224
with memory in embedded systems-on-a-chip -
0,57
512
the decrease was 40%. Because the throughput
241,7
0,21
Page 7-15, May/June 2001.
university, September 2001
0,31
26722
Synthesizing Specialized Memory for Power Optimization in Embedded
Systems, DAC 2000, Los Angeles, CA, pp. 300-303.
0,54
Proceedings of the 5th International Conference on Signal Processing Applications and
Technologies, October 1994
21586
0,55
237,9
the power consumption of cache memory and notepad should be confirmed by real measurements.
design and testing, European conference
302,4
64
[1] Doris Keitel-Skulz and Norbert Wen., Engineering Technology
0,21
which is common for most applications and memory configurations
Low Power - Abstraction Layers and Methods
RTL design, Seminar on testing and design of VLSI,
[6] Rajeshwari M. Banakar, Ranjan Bose, M. Balakrish nan: Design
Region
Machine Translated by Google
View publication statistics
I
Table 3: Area/capacity ratio for bubble sort
in bytes
Size
Machine Translated by Google

Download 104.13 Kb.

Do'stlaringiz bilan baham: