You are absolutely correct.
The reduction in light level is entirely due to the wider spread of the light projected by the lens, which is entirely caused by moving the lens away from the film, and is described by the inverse square law.
One could argue that the size of the light source (the exit pupil) isn't small enough (compared to the distances involved and to the film format) to be ignored, complicating things a bit, i.e. needing a bit more complex formula/law to describe what happens.
But unnecessarily so. The inverse square law works perfectly.