Client Background
A global semiconductor manufacturer running highly automated fabrication and advanced-packaging lines. Inspection automation had advanced as far as the available hardware allowed, and then stopped. Several critical inspection points sat inside dense tooling clusters and on the end-effectors of high-speed robotic arms. Conventional cameras couldn't physically fit, earlier miniature-camera trials had failed, and inspection at these stations stayed manual.
Challenge
Three constraints kept conventional cameras out.
Space. At the most constrained stations on this line, the volume available between tools is under a millimeter on the camera's optical axis. A traditional camera is mostly lens stack, and refractive lenses need a minimum focal distance. No mechanical refinement closes that gap.
Heat. Earlier miniature-camera trials had failed thermally. Focus actuators and their control electronics ran hot inside the tight enclosure, the sensor noise floor rose, and image quality degraded past what automated inspection could rely on.
Confidentiality. A conventional camera puts human-interpretable imagery onto the production network. Encrypting the transport doesn't change that. The content on either end of the link is still a recognizable picture of proprietary tooling. The customer's security team treated every in-line camera as a serious leakage surface, and they were right to.
The answer wasn't "find a smaller camera." A camera-first inspection workflow was the wrong fit for this deployment from the start.
What We Deployed
The Contextual Agentic Vision Platform offers a lensless sensor option for exactly this kind of envelope. Instead of a lens stack focusing light onto a sensor, a thin coded mask sits directly in front of the CMOS. The whole module fits in under a millimeter, with no focus motor, no zoom, and no active electronics in the optical path. Light passes through the mask and modulates into a pseudo-random pattern (the system's Point Spread Function) spread across the entire sensor. The raw output looks like noise to the human eye.
The optical design comes from prior research by Yodo Labs' founder, published in Optics Letters [1] and Optics Express [2], with coverage in WIRED, Nikkei, Phys.org, EurekAlert, and Tokyo Tech News [3]. The work has received Tokyo Tech's Tejima Memorial Research Award and the 38th Telecommunication Advancement Foundation Award [4].
The point that matters operationally is this: the Platform's specialist CV models analyze the encoded pattern directly. They don't need a reconstructed image first. A naive lensless setup would do two passes: reconstruct the image, then classify defects on the reconstruction. Two model calls, double the latency, an intermediate the pipeline doesn't actually need. On the Platform, the scouts are trained on encoded patterns and produce findings against them. Reconstruction is reserved for a separate path that runs only when a human needs to look at the image.
Alongside the sensor, the customer's own engineering knowledge sits in the Platform's context pack and is consulted on every inspection: the per-station process spec sheets, the dimensional tolerance tables pulled from the relevant drawings, the lot-level inspection history on this and adjacent stations, and the operator-review policy that decides which flagged findings need human verification before action. None of this is generic prior. It is the customer's own documented rules, accessed live. A flagged chip-out only means something against the tolerance row for this station and this product family, and that is the row the Platform's reasoning anchors against.
A typical inspection cycle at one of these stations, shown as the Platform's internal trace:
vlm inspect station-7 lot LX-0418 frame#0418
delegate → lensless-scout · id-scout · history-lookup
collect ← encoded-frame · lot-id LX-0418-2 · prior 3 defects
classify direct-on-encoded · candidate: chip-out · conf 0.82
context tolerance 12µm · prior defects on same lot
decide flag · chip-out · ~14µm · escalate to operator
Every step runs on the encoded pattern. The defect classifier was trained on encoded patterns. The localization head likewise. There is no reconstruction in this hot path.
When a flagged finding escalates to a human operator, and only then, a transformer-based reconstruction model [1] runs on an authorized workstation to render a viewable image. The reconstruction stays on the operator's local machine. The default data flowing on the production network is the encoded stream.
Why This Configuration Works
Passive optics, no thermal budget to manage. With no actuators in the module, the thermal envelope that defeated earlier miniature-camera attempts simply doesn't apply. The module dissipates only what the sensor itself dissipates, and that is a budget the line can plan around.
Enough signal at sub-millimeter form factor. A miniaturized lens collects light through a correspondingly small aperture, and signal-to-noise drops as the aperture shrinks. The lensless mask modulates incoming light across the entire sensor area in parallel. Even at sub-millimeter form factor, the system collects enough signal for the downstream classifier. The form factor became an option rather than a compromise.
No human-interpretable imagery on the network. The encoded pattern is not a cryptographic guarantee. An attacker who obtained both the decoding model and the system parameters could reconstruct images. But it materially shrinks the casual-leakage surface that the customer's security team had flagged on conventional cameras. The default on the production network is encoded; reconstruction is restricted to authorized local workstations.
Direct-on-encoded perception is the right path on this sensor. Reconstructing first and classifying second would introduce an intermediate whose errors compound into the downstream classifier. The single-stage path operating on the encoded representation avoids that propagation, and given how the scout-layer models were trained, it was the higher-confidence path for this deployment.
Results
Inspection points that had been manual-only, or sampled at low rate because robot-mounted automation wasn't feasible, came under continuous automated inspection. The sensor form factor was the change that unlocked it. Everything else on the Platform stayed the same as the customer's existing automated stations: the VLM orchestrator above the scouts, the judge layer, the context pack of drawings, tolerances, and inspection history.
The passive optical path eliminated the thermal-throttling failures that had ended earlier miniature-camera trials. The modules now run continuously on the line without temperature-driven degradation.
The default data on the production network is the encoded sensor stream. No human-interpretable imagery in transit. The leakage surface that the customer's security team had previously flagged on conventional cameras is materially reduced.
The same configuration is available to any manufacturer running inspection on the Platform under similar form-factor or confidentiality constraints: the lensless sensor selected for the physical envelope, encoded-input scouts enabled, the customer's drawings and tolerances loaded as inner context, reconstruction restricted to authorized local workstations. The Platform brings the orchestration and the sensor. The customer brings the drawings, the tolerances, and the operator workflow.
References
- Pan, X., Chen, X., Takeyama, S., Yamaguchi, M. "Image reconstruction with Transformer for mask-based lensless imaging." Optics Letters 47(7), 1843–1846 (2022). DOI: 10.1364/OL.455378. Code: github.com/BobPXX/Lensless_Imaging_Transformer.
- Pan, X., Chen, X., Nakamura, T., Yamaguchi, M. "Incoherent reconstruction-free object recognition with mask-based lensless optics and Transformer." Optics Express 29(23), 37962–37978 (2021). DOI: 10.1364/OE.443181. Code: github.com/BobPXX/LLI_Transformer.
- Media coverage: WIRED Japan, Nikkei, Phys.org, EurekAlert!, Tokyo Tech News.
- Awards: Tokyo Tech Tejima Memorial Research Award, 2022; 38th Telecommunication Advancement Foundation Award, 2022 (announcement, PDF).
