In Machine Learning, there are two approaches
Discriminative Approach - In this approach, we try to model $p(y|x)$ directly.
Generative Approach - In this approach, we model the join distribution of $p(x,y)$, and then infer $p(y|x)$ by marginalising the joint distribution.
In Generative approach, $p(x,y)$ can be computed using $p(x,y) = p(x|y)p(y)$. If we are able to model $p(x|y)$ really well, then Generative approach is more accurate. However, in practice we need a lot more samples in order to model a high-dimensional $x$. Also, $p(x|y)$ can be a many to one function, for example, in face detection, images of several different person contain face images, and it is comparatively easy task to model. However, given that we have the face, and computing probablity distibution of all possible faces ($x$) is obviously infeasible.
Thus, although theoretically generative models will do well, in practice, it is not possible to model them accurately and discriminative models are preferred.