About Yodo Labs

Yodo Labs is an applied AI research and engineering company based in Tokyo and Osaka. We build AI systems for industrial inspection, measurement, and operational decisions, combining computer vision and language models to produce judgments people can act on. Deployed in the real world, not just demos.

About the Role

You have trained models that work beautifully on a benchmark and fail on the factory floor. You have dealt with the dataset that has ten thousand normal examples and three defective ones. You know that the gap between a research prototype and a system a maintenance crew actually uses is not a deployment problem -- it is a modelling problem.

At Yodo Labs, we build vision systems that make accurate visual judgments in the field. Our models need to detect defects that barely exist in any dataset, measure dimensions from a single smartphone image, and verify safety-critical connections before a crew energises a line. They need to do this in 10-50MB, fully offline, on hardware that was not designed for AI.

As a Computer Vision Engineer, you will own the pipeline from data to device. You will generate synthetic training data for defects our clients have seen three times in ten years. You will fine-tune vision-language models on tasks where the training set is a handful of reference images and a conversation with an inspector. You will distil the result into something that runs on a phone in a tunnel with no connectivity.

This is not a research role and it is not a pure engineering role. It is both, simultaneously, because the problems demand it.

What You Will Do

Design and train vision models for field inspection, measurement, and safety verification -- each deployment is a new domain with minimal existing data
Build and maintain synthetic data generation pipelines: take a handful of real examples and produce the thousands of training samples needed to cover the long tail of rare cases
Fine-tune vision-language models to learn new inspection tasks from few-shot examples and natural language task descriptions
Distil large models into compact on-device models (10-50MB) that run fully offline on smartphones and edge hardware
Deploy, test, and iterate models in real field conditions -- your work ships to construction sites, rail corridors, and substations, not just a dashboard

Representative projects

Building a synthetic data pipeline for rail overhead equipment wear detection, starting from twelve reference images provided by the client and generating a training set covering crack patterns, corrosion stages, and environmental conditions the client has never photographed
Fine-tuning a vision-language model to perform live-line connection verification for a utility company, where the task definition comes from a two-hour conversation with a senior inspector rather than a labelled dataset
Distilling a 2B-parameter vision model into a 30MB on-device model that maintains 95%+ accuracy for surface defect detection on a manufacturing line, running inference in under 200ms on a standard smartphone
Designing an active learning loop where field crews flag uncertain predictions, feeding real-world edge cases back into the training pipeline without requiring ML expertise from the crew

You may be a good fit if you have

Strong experience in computer vision: object detection, segmentation, image classification, or visual inspection -- you have trained models and evaluated them honestly
Hands-on experience with model compression: distillation, quantisation, pruning, or on-device deployment (ONNX, CoreML, TFLite, or similar)
Familiarity with vision-language models (CLIP, LLaVA, Florence, or similar) and how to adapt them to new domains with limited data
A pragmatic approach to data scarcity -- you reach for synthetic data, augmentation, and few-shot learning before asking for ten thousand labelled examples
Experience with PyTorch and comfort working across the full pipeline from training to deployment
The instinct to test in the real world -- you know a model is not done when the loss curve flattens, it is done when it works on site
Business-level English; Japanese is helpful but not required

Candidates need not have

Publications in top-tier vision conferences -- we care about what you have built and shipped, not your citation count
Experience with our specific industries -- if you can learn what a rail clearance check looks like, the domain knowledge follows
A PhD -- strong engineering skills and genuine curiosity about vision problems matter more than credentials

How to Apply

Send an email to team@yodolabs.jp with the subject line "Computer Vision Engineer (Full-time) -- [Your Name]". Please include:

Your CV or resume
A brief description of a vision model you have built that you are proud of -- we want to understand the problem, your approach, and what you learned
Links to your GitHub profile or relevant project repositories

Logistics

Location: Tokyo / Osaka, Japan -- remote-friendly Work style: Flexible hours; occasional site visits to client facilities Visa sponsorship: We sponsor work visas for Japan where possible

We encourage you to apply even if you do not meet every qualification listed.