I based my thoughts on just the photo alone, at small size, so I could easily be wrong...
But a small aperture would require a long exposure. My initial thought was three staged photos combined (foreground, parked cart on the right, background), but taking Occam's razor to it, maybe it's two shots (foreground and background shot without moving the camera except to focus it).
The biggest problem I see with making such a composite image is how to direct the people.
I think he took the first shot to capture the foreground. Carefully arranged to leave visible street pavement (room to cut and paste). Then he'd only need to shoo some people out of the foreground - street side. This way he has a natural place to cut his outline. The fruit boxes don't move, so he can take the outline there. The street side is the only place where there would be a challenge. Nothing a little negative retouching couldn't remedy in case a blurry head stuck out.
Of course even Occam's razor dictates that he had to have an assistant and bullhorn yelling at people and telling them to look natural and stand still.
Photographer clearly wanted to demonstrate mastery of taking pictures of large groups of people. Probably has some darkroom mastery as well. He wanted a photograph nobody else could easily copy.