It’s worthwhile to get an intuitive feel for the choice of words in this jargon.

* With discrete probabilities, there’s the concept of “probably mass function”

* With continuous probability space, the corresponding concept is “density function”.

Density is defined as mass per unit space.

For a 1D probability space, the unit space is length. Example – width of a nose is a RV with a continuous distro. Mean = 2.51cm, so the probability density at this width is probably highest…

For a 2D probability space, the unit space is an area. Example – width of a nose and temperature inside are two RV, forming a bivariate distro. You can plot the density function as a dome. Total volume = 1.0 by definition. Density at (x=2.51cm, y=36.01 C) is the height of the dome at that point.

The concentration of “mass” at this location is twice the concentration at another location like (x=2.4cm, y=36 C).