About Yodo Labs
Yodo Labs is an applied AI research and engineering company based in Tokyo and Osaka. We build field visual AI systems for measurement, inspection, and safety verification -- the kind that runs on phones and edge devices your crews already carry. Rail, utilities, construction, manufacturing. Offline. Deployed in weeks.
About the Role
You have trained models that work beautifully on a benchmark and fail on the factory floor. You have dealt with the dataset that has ten thousand normal examples and three defective ones. You know that the gap between a research prototype and a system a maintenance crew actually uses is not a deployment problem -- it is a modelling problem.
At Yodo Labs, we build vision systems that make accurate visual judgments in the field. Our models need to detect defects that barely exist in any dataset, measure dimensions from a single smartphone image, and verify safety-critical connections before a crew energises a line. They need to do this in 10-50MB, fully offline, on hardware that was not designed for AI.
As a Computer Vision Engineer, you will own the pipeline from data to device. You will generate synthetic training data for defects our clients have seen three times in ten years. You will fine-tune vision-language models on tasks where the training set is a handful of reference images and a conversation with an inspector. You will distil the result into something that runs on a phone in a tunnel with no connectivity.
This is not a research role and it is not a pure engineering role. It is both, simultaneously, because the problems demand it.
What You Will Do
- Design and train vision models for field inspection, measurement, and safety verification -- each deployment is a new domain with minimal existing data
- Build and maintain synthetic data generation pipelines: take a handful of real examples and produce the thousands of training samples needed to cover the long tail of rare cases
- Fine-tune vision-language models to learn new inspection tasks from few-shot examples and natural language task descriptions
- Distil large models into compact on-device models (10-50MB) that run fully offline on smartphones and edge hardware
- Deploy, test, and iterate models in real field conditions -- your work ships to construction sites, rail corridors, and substations, not just a dashboard
Representative projects
- Building a synthetic data pipeline for rail overhead equipment wear detection, starting from twelve reference images provided by the client and generating a training set covering crack patterns, corrosion stages, and environmental conditions the client has never photographed
- Fine-tuning a vision-language model to perform live-line connection verification for a utility company, where the task definition comes from a two-hour conversation with a senior inspector rather than a labelled dataset
- Distilling a 2B-parameter vision model into a 30MB on-device model that maintains 95%+ accuracy for surface defect detection on a manufacturing line, running inference in under 200ms on a standard smartphone
- Designing an active learning loop where field crews flag uncertain predictions, feeding real-world edge cases back into the training pipeline without requiring ML expertise from the crew
You may be a good fit if you have
- Strong experience in computer vision: object detection, segmentation, image classification, or visual inspection -- you have trained models and evaluated them honestly
- Hands-on experience with model compression: distillation, quantisation, pruning, or on-device deployment (ONNX, CoreML, TFLite, or similar)
- Familiarity with vision-language models (CLIP, LLaVA, Florence, or similar) and how to adapt them to new domains with limited data
- A pragmatic approach to data scarcity -- you reach for synthetic data, augmentation, and few-shot learning before asking for ten thousand labelled examples
- Experience with PyTorch and comfort working across the full pipeline from training to deployment
- The instinct to test in the real world -- you know a model is not done when the loss curve flattens, it is done when it works on site
- Business-level English; Japanese is helpful but not required
Candidates need not have
- Publications in top-tier vision conferences -- we care about what you have built and shipped, not your citation count
- Experience with our specific industries -- if you can learn what a rail clearance check looks like, the domain knowledge follows
- A PhD -- strong engineering skills and genuine curiosity about vision problems matter more than credentials
How to Apply
Send an email to team@yodolabs.jp with the subject line "Computer Vision Engineer (Full-time) -- [Your Name]". Please include:
- Your CV or resume
- A brief description of a vision model you have built that you are proud of -- we want to understand the problem, your approach, and what you learned
- Links to your GitHub profile or relevant project repositories
Logistics
Location: Tokyo / Osaka, Japan -- remote-friendly Work style: Flexible hours; occasional site visits to client facilities Visa sponsorship: We sponsor work visas for Japan where possible
We encourage you to apply even if you do not meet every qualification listed.