AbstractNatural proteins represent numerous but tiny structure/function islands in a vast ocean of possible protein sequences, most of which has not been explored by either biological evolution or research. Recent studies have suggested this uncharted sequence space possesses surprisingly high structural propensity, but development of an understanding of this phenomenon has been awaiting a systematic high-throughput approach.Here, we designed, prepared, and characterized two combinatorial protein libraries consisting of randomized proteins, each 105 residues in length. The first library constructed proteins from the entire canonical alphabet of 20 amino acids. The second library used a subset of only 10 residues (A,S,D,G,L,I,P,T,E,V) that represent a consensus view of plausibly available amino acids through prebiotic chemistry. Our study shows that compact structure occurrence (i) is abundant (up to 40%) in random sequence space, (ii) is independent of general Hsp70 chaperone system activity, and (iii) is not granted solely by “late” and complex amino acid additions. The Hsp70 chaperone system effectively increases solubility and stability of the canonical alphabet but has only a minor impact on the “early” library. The early alphabet proteins are inherently more stable and soluble, possibly assisted by salts and cofactors in the cell-like environment in which these assays were performed.Our work indicates that natural protein space may have been selected to some extent by chance rather than unique structural characteristics.