You know all those people playing GeoGuessr? Or, the late 00s / early 10s where 4chan would occasionally and notably dox an anonymous criminal? How people can use one photo, their memory, their intuition, and maybe Google / zillow listings / etcs to go from "photo" to "here is where and roughly when the photo was taken"?
Now, here's a thought experiment. If you had a super big computer, and you were able to suck in a stream of every video and photo taken by humanity, you would be able to build a "geospatialtemporal model". That is, every photo and video would make up one piece of a big jigsaw puzzle that could be pieced together across time and space.
Specifically, placing that piece would mean placing the photographer in the world, and projecting the points from the photo into 3D space. These points would be flatly placed upon Earth's surface, which could give you a useful prior for your model.
This thing is called "manifold learning", and it's an academic term for "desperate attempt to learn the true shape of a thing." Kind of like poetry.
With such a supercomputer and such a model, you could place an unseen photo into that model, across time or space, provided there's enough outside clues.
Anyways, Niantic announced what they were doing with all their Pokemon Go data, falling short of this thing: Rather than one giant model matching photo and video across Earth across all time, they've trained 50-million separate spatial networks (not temporal!) for separate locations. They have the pose from the camera to start with, which is very useful for making this model.
There's no point to this post, I just wanted to point out where we are along the torment nexus. The "giant geospatialtemporal model" is a natural and possibly-achievable endpoint (albeit not from strictly neural methods-- neurosymbolic methods are obviously the future of this kind of thing. Think, instead of 200 weirdos on 4chan using Google, you have a neural network querying a symbolic inference engine and a database.)
P.S. These Niantic losers used AI-generated images for their diagrams, lmao