Hello everyone,
I’m currently working on a project that involves recognizing tree trunks and potentially other parts of trees from a live video stream of a rover. The rover can complete a mission and get close to a tree, but a robotic arm used for spraying will need to be remote-controlled (or one day automated) to work with higher precision. I want to augment the low-bandwidth video stream with an edge-based system that recognizes the parts of the tree that the sprayer points to.
Should I use a Raspberry Pi, OpenMV, or Jetson Orin Nano? Assuming that I have hardware on the edge that can run pretty much anything, what would be the most straightforward way to develop this system with minimal annotation? The traditional way would be to build a (large) training set with trees under various environmental settings, including different lighting conditions, weather phenomena like rain or snow, and possibly even changes in seasons. The training set should also have different angles from the same camera that I use. Then, there should be a custom GUI to annotate those images/streams. It’s a lot of work. Given that we’re now in an era of foundational models and fine-tuning, I wonder if there’s an easier/better way to do this (few-shot learning?) The output format is also custom since the output isn’t just a class and a bounding box (huggingface seems to help with such tasks), but it also requires orientation.
How would you approach this project?