Since the Keccak algorithm was selected by the US National Institute of Standards and Technology (NIST) as the standard SHA-3 hash algorithm for replacing the currently used SHA-2 algorithm in 2015, various optimization methods have been studied in parallel and hardware environments. However, in a software environment, the SHA-3 algorithm is much slower than the existing SHA-2 family; therefore, the use of the SHA-3 algorithm is low in a limited environment using embedded devices such as a Wireless Sensor Networks (WSN) enviornment. In this article, we propose a software optimization method that can be used generally to break through the speed limit of SHA-3. We combine the θ, π, and ρ processes into one, reducing memory access to the internal state more efficiently than conventional software methods. In addition, we present a new SHA-3 implementation for the proposed method in the most constrained environment, the 8-bit AVR microcontroller. This new implementation method, which we call the chaining optimization methodology, implicitly performs the π process of the f-function while minimizing memory access to the internal state of SHA-3. Through this, it achieves up to 26.1% performance improvement compared to the previous implementation in an AVR microcontroller and reduces the performance gap with the SHA-2 family to the maximum. Finally, we apply our SHA-3 implementation in Hash_Deterministic Random Bit Generator (Hash_DRBG), one of the upper algorithms of a hash function, to prove the applicability of our chaining optimization methodology on 8-bit AVR MCUs.