Look at the definition of cond probability. We are mostly interested in the continuous case, though the discrete case is *really* clearer than the continuous.
It’s a ratio of one integral over another. Example: Pr(poker card is below 3, given it’s not JQK) is defined as ratio of the 2 probabilities.
I feel often if not always, the numerator integral is being magnified, or scaled up, due to the denominator being smaller than 1.
In the important bivariate case, there’s a 3D pdf surface. Volume under entire surface = 1.0. If we cut vertically at y=3.3, on the cross-section view we get a curve of z vs x, where z is the vertical axis. This curve looks like a density function. We hope total area under this curve = 1.0 but highly unlikely.
To get 1.0, we need to scale the curve by something like 1/Pr(Y=3.3). This is correct in the discrete case, but in continuous case, Pr(Y=3.3) is always 0. What we use is f_Y(y=3.3) i.e. the marginal density function, evaluated at y=3.3.