Future embedded applications will require high performance processors integrating fast and low-power cache. Dynamic Non-Uniform Cache Architectures (D-NUCA) have been proposed to overcome the performance limit introduced by wire delays when designing large cache. In this paper, we propose an alternative design of D-NUCA cache, namely Triangular D-NUCA Cache, to reduce power consumption and silicon area occupancy of D-NUCA cache. We compare the performances of Triangular D-NUCA cache with the ones achieved by conventional rectangular organization. Results show that our approach is particular useful in the embedded application domain, as it permits the utilization of half-sized NUCA cache with performance improvements.