
Multicore Cache
Other Unique Engineering Ideas
There are many accepted reasons that support a move to multicore design
in portable devices: scalability, specialty cores, increased
performance, and reduced power consumption.
1. Description
2. Why
3. How
4. Future Trends
5. Related Links
Description
As
technology moves forward, innovative advancements will rely on
researchers at the architecture, compiler, and software levels becoming
familiar with the bottlenecks of silicon scaling. Two trends are of
particular note:
- In many microprocessor families, increasing amounts of silicon area are being devoted to caches, and
-
Technology variability is causing stability concerns for our fundamental cache building block – six-transistor SRAM.
For each of these concerns, there are aggravating factors present in many proposed multicore designs:
- Where die size is kept relatively equal to that of prior generations, inclusion of multiple cores, even if lighter-weight, often leaves less silicon area available for caches; and
-
Many
proposals suggest complex multi-voltage management schemes, to deal
with power consumption, thus aggravating and complicating circuit-level
challenges in the face of increasing transistor-level variability.
Why
The benefits of a shared cache system are many:
-
Reduce cache under-utilization
-
Reduce cache coherency complexity
-
Reduce false sharing penalty
-
Reduce data storage redundancy at the L2 cache level: the same data only needs to be stored once in L2 cache.
-
Reduce front-side bus traffic: effective data sharing between cores
allows data requests to be resolved at the shared cache level instead
of going all the way to the system memory.
- Provide new opportunities and flexibility to designers
- Faster data sharing option between the cores than using system memory
- One core can pre/post-process data for the other core (application partitioning, pipelining)
Alternative communication mechanisms between cores by using shared cache.The usage models for migrating single-core applications to multi-core can be grouped into two categories. One usage model:
- To replace multiple single-core PCs with a single multi-core system, in which case users will likely deploy each core just like an individual PC.
- To combine the power of multiple cores in order to get a
performance boost for a single application, in which case each core
does part of the task in order to achieve the overall performance gain.
How to achieve
Scaling from a uni-core design to a multicore design is not nearly as easy as advocates say, since one almost always uncovers things like race conditions, timing problems, and a host of additional issues that don’t normally manifest on uni-core designs.Another common reason cited as a reason to move to multicore is the use of specialty cores. An application processor perform digital signal processing in a cell phone. It does however; require tens, if not hundreds or thousands of cycles, to do what a DSP can do in just one or in a few cycles.So specialty cores like DSPs can function a lot faster and use less power while achieving the same goal as general purpose application processors. Cache partitioning and sharing is critical to the effective utilization of multicore processors. However, almost all existing studies have been evaluated by simulation that often has several limitations, such as excessive simulation time, absence of OS activities and proneness to simulation inaccuracy.To address these issues an efficient software approach to supporting both static and dynamic cache partitioning in OS through memory address map- ping is being taken. It evaluates several representative cache partitioning schemes with different optimization objectives, including performance, fairness, and quality of service (QoS).One problem that may manifest in poorly designed AMP systems or may occur due to the load-balancing in SMP-based systems is the “Ping-Pong” effect. This occurs when two or more processes cause massive cache invalidations due to the ordering or frequency in which they access cache. The ping pong effect not only decreases performance and increases power consumption, but also impacts real-time determinism in the system as memory access for the memory fetch is also a shared resource is considered a contested resource. The potential for this problem is easily avoided in a well designed AMP multicore system. The reason is simple
- While load-balancing in SMP systems the tasks migrate from core to core.
- The task loading is done statically by the system architect in AMP systems.
- Additionally, the problem can be easily detected by profiling during testing and corrected.
This condition is more problematic to detect in SMP systems due to the
random nature of the load-balancing scheduler and it may not manifest
until after the device has been deployed into real world conditions.
Future Trends
Scalability is often cited a key reason to move to multicore because if
one core is not fast enough then one can add another. If this were
true, then we’d have to accept the fact that we cannot find a
reasonably faster processor within that processor’s architectural
family. For most devices, there are faster chips and chips with greater
throughput within the same family. Not only that, but Moore isn’t dead
yet, despite what the press says. So we do have the technology to
double frequency and create more powerful processors within a uni-core
chip family.Migrating
to multicore devices is a wonderful way to increase performance, or
better yet, to reduce overall power consumption by using every watt
more efficiently. It does, however, require as does all software
development, thorough planning, design, and tuning to maximize
performance and reduce the overall power budget of the device.
Keywords
Quality
of service (QoS), SMP systems, load-balancing, Multicore Cache
Partitioning, Divide-and-Conquer Algorithms, Hyper-Threading
Related Articles
- Ultra Mobile PC
- Embedded HSPDA
- Sematic Enabled Voice and Data Integration
- Simulcast Radio Systems
- Solar Telephone
- Wimax Bts
- 4G Architecture
- Exaflop Computer
- Haptic Mouse
- Liquid Cooled Comnputer
- Pen Based Computing
- Superconducting Qubits
- Topological Quantum Computing
Related Links
- Provably Good Multicore Cache Performance for Divide-and-Conquer Algorithms
- Gaining Insights into multicore Cache Partitioning: Bridging the Gap between Simulation and Real System
- Gaining Insights into Multicore Cache Partitioning
- Hierarchical cache coherence protocol verification one level at a time through assume guarantee
- Effective Use of the Shared Cache in Multi-core Architectures
- Cache Blocking Technique on Hyper-Threading Technology Enabled Processors

