This paper focuses on the problem of finding a set of Hidden Markov Models that can be trained to model context dependencies with good statistical accuracy, given the constraint of a fixed amount of training data. Two aspects have been investigated in this work: clustering of intra-word context-dependent units with similar contexts on the basis of different similarity measures, and definition of inter-word coarticulation units. A Dynamic Programming procedure is presented that allows a large set of context-dependent units to be clustered into a given number of units while optimizing a global cost measure. Inter-word units were found to provide better phonetic representations of word junctures and to increase recognition accuracy, though less than it has been reported for the English language.