Machine learning minimizes bounds on E[h] computed over an unknown distribution p(x,y). Unlabeled data describe p(x), while scientific prior knowledge can describe p(y). This talk will discuss the use of unlabeled data to compute p(x), and of articulatory phonology to compute p(y), for acoustic modeling and pronunciation modeling in automatic speech recognition. We will demonstrate that, if either p(x) or p(y) is known, it's possible to substantially reduce the VC dimension of the function space, thereby substantially reducing the expected risk of the classifier. As a speculative example, we will show that p(y) can be improved (relative to usual ASR methods) using finite state machines based on articulatory phonology, and preliminary results will be reviewed. As a more fully developed example, we will show that the VC dimension of a maximum mutual information (MMI) speech recognizer can be bounded by the conditional entropy of y given x; the resulting training criterion is MMI over labeled data, minus conditional label entropy of unlabeled data. Algorithms and experimental results will be provided for the cases of isolated phone recognition, and of retraining using N-best lists.