Most language learners have difficulties acquiring the phonemes of a second language (L2). Unfortunately, they are often judged on their L2 pronunciation, and segmental inaccuracies contribute to miscommunication. Therefore, we aim to determine how to facilitate phoneme acquisition. Given the close relationship between speech and co-speech gesture, previous work unsurprisingly reports that gestures can benefit language acquisition, e.g., in (L2) word learning. However, gesture studies on L2 phoneme acquisition present contradictory results, implying that both specific properties of gestures and phonemes used in training, and their combination, may be relevant. We investigated the effect of phoneme and gesture complexity on L2 phoneme acquisition. In a production study, Dutch natives received instruction on the pronunciation of two Spanish phonemes, /u/ and /θ/. Both are typically difficult to produce for Dutch natives because their orthographic representation differs between both languages. Moreover, /θ/ is considered more complex than /u/, since the Dutch phoneme inventory contains /u/ but not /θ/. The instruction participants received contained Spanish examples presented either via audio-only, audio-visually without gesture, audio-visually with a simple, pointing gesture, or audio-visually with a more complex, iconic gesture representing the relevant speech articulator(s). Preceding and following training, participants read aloud Spanish sentences containing the target phonemes. In a perception study, Spanish natives rated the target words from the production study on accentedness and comprehensibility. Our results show that combining gesture and speech in L2 phoneme training can lead to significant improvement in L2 phoneme production, but both gesture and phoneme complexity affect successful learning: Significant learning only occurred for the less complex phoneme /u/ after seeing the more complex iconic gesture, whereas for the more complex phoneme /θ/, seeing the more complex gesture actually hindered acquisition. The perception results confirm the production findings and show that items containing /θ/ produced after receiving training with a less complex pointing gesture are considered less foreign-accented and more easily comprehensible as compared to the same items after audio-only training. This shows that gesture can facilitate task performance in L2 phonology acquisition, yet complexity affects whether certain gestures work better for certain phonemes than others.