Just see it as a pipeline and do it step by step. Because the output of the previous step often defines what to do next.
Start by trying to get raw bitmaps out of the camera. You can do that with an API (easiest), through the Linux driver (harder), see if you can find out how the GPU communicates with it (hard) or by looking what chips are inside the camera, who produced it and see if you can find out how it works (very hard).
The output of that step probably decides the next step to take. And during the process you learned how it works.