This paper proposes a series of approximate square root circuit designs with high accuracy, low latency, low area, and low power dissipation requirements. The proposed designs are constructed using an array of controlled add–subtract cell elements with both exact and approximate versions. The utility of the proposed designs are evaluated by utilizing them in an example image contrast enhancement application with demonstrably satisfactory results and large peak signal-to-noise ratios and structural similarity values. The accuracy and hardware characteristics of the proposed square root designs are also analyzed and compared with previously proposed state-of-the-art approximate square root designs. When applied to a 16-bit radicand (the number under the square root symbol), the proposed designs have the lowest error rates, normalized mean error distances, and mean relative error distances by at least 1.8x when compared to all previous methods using the same number of approximate cells. When the designs were synthesized using Synopsys Design Compiler with a 28 nm bulk CMOS process, the delay, area, power, and power-delay-product characteristics outperform all previous designs in all but a few cases. These results demonstrate that the proposed designs permit the use of a flexible range of approximate designs with varying accuracy and hardware overhead characteristics, and a suitable design can be selected based on the user design requirements.