Together with language, music is perhaps our most distinctive behavioral trait. Following the lead of paleolinguistic research, different hypotheses have been proposed to explain why only humans perform music and how this ability might have evolved in the species. In this paper, we advance a new model of music evolution that builds on the theory of self-domestication, according to which the human phenotype is, at least in part, the outcome of a process similar to mammal domestication, triggered by a progressive reduction in reactive aggression levels in response to environmental changes. In the paper, we specifically argue that changes in aggression management through the course of human cultural evolution can account for the behaviors conducive to the emergence and evolution of music. We hypothesize 4 stages in the evolutionary development of music under the influence of environmental changes and evolution of social organization: starting from musilanguage, proto-music gave rise to personal and private forms of timbre-oriented music, then to small-group ensembles of pitch-oriented music, at first of indefinite and then definite pitch, and finally to collective (tonal) music. These stages parallel what has been hypothesized for languages and encompass the diversity of music types and genres described worldwide. Overall, music complexity emerges in a gradual fashion under the effects of enhanced abilities for cultural niche construction, resulting from the stable trend of reduction in reactive aggression towards the end of the Pleistocene, leading to the rise of hospitality codes, and succeeded by increase in proactive aggression from the beginning of the Holocene onward. This paper addresses numerous controversies in the literature on the evolution of music by providing a clear structural definition of music, identifying its structural features that distinguish it from oral language, and summarizing the typology of operational functions of music and formats of its transmission. The proposed framework of structural approach to music arms a researcher with means to identify and comparatively analyze different schemes of tonal organization of music, placing them in the context of human social and cultural evolution. Especially valuable contribution to the understanding of transition from animal communication to human music and language is the theory of so-called “personal song”, described and analyzed here from ethological, social, cultural, cognitive, and musicological perspectives. The emergence of personal song and its development into a social institution are interlinked with the evolution of kinship and placed into the timeline of cultural evolution, based on totality of ethnographic, archaeological, anthropological, genetic, and paleoclimatic data.