Multicore Cache

Other Unique Engineering Ideas
There are many accepted reasons that support a move to multicore design in portable devices: scalability, specialty cores, increased performance, and reduced power consumption.

1. Description

2. Why

3. How

4. Future Trends

5. Related Links

Useful Links bus, SMP, AMP 

Description

As technology moves forward, innovative advancements will rely on researchers at the architecture, compiler, and software levels becoming familiar with the bottlenecks of silicon scaling. Two trends are of particular note:

  • In many microprocessor families, increasing amounts of silicon area are being devoted to caches, and
  • Technology variability is causing stability concerns for our fundamental cache building block – six-transistor SRAM. 

For each of these concerns, there are aggravating factors present in many proposed multicore designs:

  • Where die size is kept relatively equal to that of prior generations, inclusion of multiple cores, even if lighter-weight, often leaves less silicon area available for caches; and
  • Many proposals suggest complex multi-voltage management schemes, to deal with power consumption, thus aggravating and complicating circuit-level challenges in the face of increasing transistor-level variability. 

Why 

The benefits of a shared cache system are many: 

  • Reduce cache under-utilization
  • Reduce cache coherency complexity
  • Reduce false sharing penalty
  • Reduce data storage redundancy at the L2 cache level: the same data only needs to be stored once in L2 cache.
  • Reduce front-side bus traffic: effective data sharing between cores allows data requests to be resolved at the shared cache level instead of going all the way to the system memory.
  • Provide new opportunities and flexibility to designers
  • Faster data sharing option between the cores than using system memory
  • One core can pre/post-process data for the other core (application partitioning, pipelining)             

Alternative communication mechanisms between cores by using shared cache.The usage models for migrating single-core applications to multi-core can be grouped into two categories. One usage model:

  • To replace multiple single-core PCs with a single multi-core system, in which case users will likely deploy each core just like an individual PC. 
  • To combine the power of multiple cores in order to get a performance boost for a single application, in which case each core does part of the task in order to achieve the overall performance gain.

How to achieve 

Scaling from a uni-core design to a multicore design is not nearly as easy as advocates say, since one almost always uncovers things like race conditions, timing problems, and a host of additional issues that don’t normally manifest on uni-core designs.Another common reason cited as a reason to move to multicore is the use of specialty cores.  An application processor perform digital signal processing in a cell phone. It does however; require tens, if not hundreds or thousands of cycles, to do what a DSP can do in just one or in a few cycles.So specialty cores like DSPs can function a lot faster and use less power while achieving the same goal as general purpose application processors. Cache partitioning and sharing is critical to the effective utilization of multicore processors. However, almost all existing studies have been evaluated by simulation that often has several limitations, such as excessive simulation time, absence of OS activities and proneness to simulation inaccuracy.To address these issues an efficient software approach to supporting both static and dynamic cache partitioning in OS through memory address map- ping is being taken. It evaluates several representative cache partitioning schemes with different optimization objectives, including performance, fairness, and quality of service (QoS).One problem that may manifest in poorly designed AMP systems or may occur due to the load-balancing in SMP-based systems is the “Ping-Pong” effect. This occurs when two or more processes cause massive cache invalidations due to the ordering or frequency in which they access cache. The ping pong effect not only decreases performance and increases power consumption, but also impacts real-time determinism in the system as memory access for the memory fetch is also a shared resource is considered a contested resource. The potential for this problem is easily avoided in a well designed AMP multicore system. The reason is simple 

  • While load-balancing in SMP systems the tasks migrate from core to core.
  • The task loading is done statically by the system architect in AMP systems.
  • Additionally, the problem can be easily detected by profiling during testing and corrected. 

This condition is more problematic to detect in SMP systems due to the random nature of the load-balancing scheduler and it may not manifest until after the device has been deployed into real world conditions. 

Future Trends

Scalability is often cited a key reason to move to multicore because if one core is not fast enough then one can add another. If this were true, then we’d have to accept the fact that we cannot find a reasonably faster processor within that processor’s architectural family. For most devices, there are faster chips and chips with greater throughput within the same family. Not only that, but Moore isn’t dead yet, despite what the press says. So we do have the technology to double frequency and create more powerful processors within a uni-core chip family.Migrating to multicore devices is a wonderful way to increase performance, or better yet, to reduce overall power consumption by using every watt more efficiently.  It does, however, require as does all software development, thorough planning, design, and tuning to maximize performance and reduce the overall power budget of the device.

Keywords

Quality of service (QoS), SMP systems, load-balancing, Multicore Cache Partitioning,  Divide-and-Conquer Algorithms, Hyper-Threading

Related Articles

Related Links