Senior Computer Vision Engineer
Production computer vision engineering skill for object detection, image segmentation, and visual AI system deployment.
Table of Contents
- Quick Start
- Core Expertise
- Tech Stack
- Workflow 1: Object Detection Pipeline
- Workflow 2: Model Optimization and Deployment
- Workflow 3: Custom Dataset Preparation
- Architecture Selection Guide
- Reference Documentation
- Common Commands
Quick Start
# Generate training configuration for YOLO or Faster R-CNN
python scripts/vision_model_trainer.py models/ --task detection --arch yolov8
# Analyze model for optimization opportunities (quantization, pruning)
python scripts/inference_optimizer.py model.pt --target onnx --benchmark
# Build dataset pipeline with augmentations
python scripts/dataset_pipeline_builder.py images/ --format coco --augment
Core Expertise
This skill provides guidance on:
- Object Detection: YOLO family (v5-v11), Faster R-CNN, DETR, RT-DETR
- Instance Segmentation: Mask R-CNN, YOLACT, SOLOv2
- Semantic Segmentation: DeepLabV3+, SegFormer, SAM (Segment Anything)
- Image Classification: ResNet, EfficientNet, Vision Transformers (ViT, DeiT)
- Video Analysis: Object tracking (ByteTrack, SORT), action recognition
- 3D Vision: Depth estimation, point cloud processing, NeRF
- Production Deployment: ONNX, TensorRT, OpenVINO, CoreML
Tech Stack
| Category | Technologies | |----------|--------------| | Frameworks | PyTorch, torchvision, timm | | Detection | Ultralytics (YOLO), Detectron2, MMDetection | | Segmentation | segment-anything, mmsegmentation | | Optimization | ONNX, TensorRT, OpenVINO, torch.compile | | Image Processing | OpenCV, Pillow, albumentations | | Annotation | CVAT, Label Studio, Roboflow | | Experiment Tracking | MLflow, Weights & Biases | | Serving | Triton Inference Server, TorchServe |
Workflow 1: Object Detection Pipeline
Use this workflow when building an object detection system from scratch.
Step 1: Define Detection Requirements
Analyze the detection task requirements:
Detection Requirements Analysis:
- Target objects: [list specific classes to detect]
- Real-time requirement: [yes/no, target FPS]
- Accuracy priority: [speed vs accuracy trade-off]
- Deployment target: [cloud GPU, edge device, mobile]
- Dataset size: [number of images, annotations per class]
Step 2: Select Detection Architecture
Choose architecture based on requirements:
| Requirement | Recommended Architecture | Why | |-------------|-------------------------|-----| | Real-time (>30 FPS) | YOLOv8/v11, RT-DETR | Single-stage, optimized for speed | | High accuracy | Faster R-CNN, DINO | Two-stage, better localization | | Small objects | YOLO + SAHI, Faster R-CNN + FPN | Multi-scale detection | | Edge deployment | YOLOv8n, MobileNetV3-SSD | Lightweight architectures | | Transformer-based | DETR, DINO, RT-DETR | End-to-end, no NMS required |
Step 3: Prepare Dataset
Convert annotations to required format:
# COCO format (recommended)
python scripts/dataset_pipeline_builder.py data/images/ \
--annotations data/labels/ \
--format coco \
--split 0.8 0.1 0.1 \
--output data/coco/
# Verify dataset
python -c "from pycocotools.coco import COCO; coco = COCO('data/coco/train.json'); print(f'Images: {len(coco.imgs)}, Categories: {len(coco.cats)}')"
Step 4: Configure Training
Generate training configuration:
# For Ultralytics YOLO
python scripts/vision_model_trainer.py data/coco/ \
--task detection \
--arch yolov8m \
--epochs 100 \
--batch 16 \
--imgsz 640 \
--output configs/
# For Detectron2
python scripts/vision_model_trainer.py data/coco/ \
--task detection \
--arch faster_rcnn_R_50_FPN \
--framework detectron2 \
--output configs/
Step 5: Train and Validate
# Ultralytics training
yolo detect train data=data.yaml model=yolov8m.pt epochs=100 imgsz=640
# Detectron2 training
python train_net.py --config-file configs/faster_rcnn.yaml --num-gpus 1
# Validate on test set
yolo detect val model=runs/detect/train/weights/best.pt data=data.yaml
Step 6: Evaluate Results
Key metrics to analyze:
| Metric | Target | Description | |--------|--------|-------------| | mAP@50 | >0.7 | Mean Average Precision at IoU 0.5 | | mAP@50:95 | >0.5 | COCO primary metric | | Precision | >0.8 | Low false positives | | Recall | >0.8 | Low missed detections | | Inference time | <33ms | For 30 FPS real-time |
Workflow 2: Model Optimization and Deployment
Use this workflow when preparing a trained model for production deployment.
Step 1: Benchmark Baseline Performance
# Measure current model performance
python scripts/inference_optimizer.py model.pt \
--benchmark \
--input-size 640 640 \
--batch-sizes 1 4 8 16 \
--warmup 10 \
--iterations 100
Expected output:
Baseline Performance (PyTorch FP32):
- Batch 1: 45.2ms (22.1 FPS)
- Batch 4: 89.4ms (44.7 FPS)
- Batch 8: 165.3ms (48.4 FPS)
- Memory: 2.1 GB
- Parameters: 25.9M
Step 2: Select Optimization Strategy
| Deployment Target | Optimization Path | |-------------------|-------------------| | NVIDIA GPU (cloud) | PyTorch → ONNX → TensorRT FP16 | | NVIDIA GPU (edge) | PyTorch → TensorRT INT8 | | Intel CPU | PyTorch → ONNX → OpenVINO | | Apple Silicon | PyTorch → CoreML | | Generic CPU | PyTorch → ONNX Runtime | | Mobile | PyTorch → TFLite or ONNX Mobile |
Step 3: Export to ONNX
# Export with dynamic batch size
python scripts/inference_optimizer.py model.pt \
--export onnx \
--input-size 640 640 \
--dynamic-batch \
--simplify \
--output model.onnx
# Verify ONNX model
python -c "import onnx; model = onnx.load('model.onnx'); onnx.checker.check_model(model); print('ONNX model valid')"
Step 4: Apply Quantization (Optional)
For INT8 quantization with calibration:
# Generate calibration dataset
python scripts/inference_optimizer.py model.onnx \
--quantize int8 \
--calibration-data data/calibration/ \
--calibration-samples 500 \
--output model_int8.onnx
Quantization impact analysis:
| Precision | Size | Speed | Accuracy Drop | |-----------|------|-------|---------------| | FP32 | 100% | 1x | 0% | | FP16 | 50% | 1.5-2x | <0.5% | | INT8 | 25% | 2-4x | 1-3% |
Step 5: Convert to Target Runtime
# TensorRT (NVIDIA GPU)
trtexec --onnx=model.onnx --saveEngine=model.engine --fp16
# OpenVINO (Intel)
mo --input_model model.onnx --output_dir openvino/
# CoreML (Apple)
python -c "import coremltools as ct; model = ct.convert('model.onnx'); model.save('model.mlpackage')"
Step 6: Benchmark Optimized Model
python scripts/inference_optimizer.py model.engine \
--benchmark \
--runtime tensorrt \
--compare model.pt
Expected speedup:
Optimization Results:
- Original (PyTorch FP32): 45.2ms
- Optimized (TensorRT FP16): 12.8ms
- Speedup: 3.5x
- Accuracy change: -0.3% mAP
Workflow 3: Custom Dataset Preparation
Use this workflow when preparing a computer vision dataset for training.
Step 1: Audit Raw Data
# Analyze image dataset
python scripts/dataset_pipeline_builder.py data/raw/ \
--analyze \
--output analysis/
Analysis report includes:
Dataset Analysis:
- Total images: 5,234
- Image sizes: 640x480 to 4096x3072 (variable)
- Formats: JPEG (4,891), PNG (343)
- Corrupted: 12 files
- Duplicates: 45 pairs
Annotation Analysis:
- Format detected: Pascal VOC XML
- Total annotations: 28,456
- Classes: 5 (car, person, bicycle, dog, cat)
- Distribution: car (12,340), person (8,234), bicycle (3,456), dog (2,890), cat (1,536)
- Empty images: 234
Step 2: Clean and Validate
# Remove corrupted and duplicate images
python scripts/dataset_pipeline_builder.py data/raw/ \
--clean \
--remove-corrupted \
--remove-duplicates \
--output data/cleaned/
Step 3: Convert Annotation Format
# Convert VOC to COCO format
python scripts/dataset_pipeline_builder.py data/cleaned/ \
--annotations data/annotations/ \
--input-format voc \
--output-format coco \
--output data/coco/
Supported format conversions:
| From | To | |------|-----| | Pascal VOC XML | COCO JSON | | YOLO TXT | COCO JSON | | COCO JSON | YOLO TXT | | LabelMe JSON | COCO JSON | | CVAT XML | COCO JSON |
Step 4: Apply Augmentations
# Generate augmentation config
python scripts/dataset_pipeline_builder.py data/coco/ \
--augment \
--aug-config configs/augmentation.yaml \
--output data/augmented/
Recommended augmentations for detection:
# configs/augmentation.yaml
augmentations:
geometric:
- horizontal_flip: { p: 0.5 }
- vertical_flip: { p: 0.1 } # Only if orientation invariant
- rotate: { limit: 15, p: 0.3 }
- scale: { scale_limit: 0.2, p: 0.5 }
color:
- brightness_contrast: { brightness_limit: 0.2, contrast_limit: 0.2, p: 0.5 }
- hue_saturation: { hue_shift_limit: 20, sat_shift_limit: 30, p: 0.3 }
- blur: { blur_limit: 3, p: 0.1 }
advanced:
- mosaic: { p: 0.5 } # YOLO-style mosaic
- mixup: { p: 0.1 } # Image mixing
- cutout: { num_holes: 8, max_h_size: 32, max_w_size: 32, p: 0.3 }
Step 5: Create Train/Val/Test Splits
python scripts/dataset_pipeline_builder.py data/augmented/ \
--split 0.8 0.1 0.1 \
--stratify \
--seed 42 \
--output data/final/
Split strategy guidelines:
| Dataset Size | Train | Val | Test | |--------------|-------|-----|------| | <1,000 images | 70% | 15% | 15% | | 1,000-10,000 | 80% | 10% | 10% | | >10,000 | 90% | 5% | 5% |
Step 6: Generate Dataset Configuration
# For Ultralytics YOLO
python scripts/dataset_pipeline_builder.py data/final/ \
--generate-config yolo \
--output data.yaml
# For Detectron2
python scripts/dataset_pipeline_builder.py data/final/ \
--generate-config detectron2 \
--output detectron2_config.py
Architecture Selection Guide
Object Detection Architectures
| Architecture | Speed | Accuracy | Best For | |--------------|-------|----------|----------| | YOLOv8n | 1.2ms | 37.3 mAP | Edge, mobile, real-time | | YOLOv8s | 2.1ms | 44.9 mAP | Balanced speed/accuracy | | YOLOv8m | 4.2ms | 50.2 mAP | General purpose | | YOLOv8l | 6.8ms | 52.9 mAP | High accuracy | | YOLOv8x | 10.1ms | 53.9 mAP | Maximum accuracy | | RT-DETR-L | 5.3ms | 53.0 mAP | Transformer, no NMS | | Faster R-CNN R50 | 46ms | 40.2 mAP | Two-stage, high quality | | DINO-4scale | 85ms | 49.0 mAP | SOTA transformer |
Segmentation Architectures
| Architecture | Type | Speed | Best For | |--------------|------|-------|----------| | YOLOv8-seg | Instance | 4.5ms | Real-time instance seg | | Mask R-CNN | Instance | 67ms | High-quality masks | | SAM | Promptable | 50ms | Zero-shot segmentation | | DeepLabV3+ | Semantic | 25ms | Scene parsing | | SegFormer | Semantic | 15ms | Efficient semantic seg |
CNN vs Vision Transformer Trade-offs
| Aspect | CNN (YOLO, R-CNN) | ViT (DETR, DINO) | |--------|-------------------|------------------| | Training data needed | 1K-10K images | 10K-100K+ images | | Training time | Fast | Slow (needs more epochs) | | Inference speed | Faster | Slower | | Small objects | Good with FPN | Needs multi-scale | | Global context | Limited | Excellent | | Positional encoding | Implicit | Explicit |
Reference Documentation
1. Computer Vision Architectures
See references/computer_vision_architectures.md for:
- CNN backbone architectures (ResNet, EfficientNet, ConvNeXt)
- Vision Transformer variants (ViT, DeiT, Swin)
- Detection heads (anchor-based vs anchor-free)
- Feature Pyramid Networks (FPN, BiFPN, PANet)
- Neck architectures for multi-scale detection
2. Object Detection Optimization
See references/object_detection_optimization.md for:
- Non-Maximum Suppression variants (NMS, Soft-NMS, DIoU-NMS)
- Anchor optimization and anchor-free alternatives
- Loss function design (focal loss, GIoU, CIoU, DIoU)
- Training strategies (warmup, cosine annealing, EMA)
- Data augmentation for detection (mosaic, mixup, copy-paste)
3. Production Vision Systems
See references/production_vision_systems.md for:
- ONNX export and optimization
- TensorRT deployment pipeline
- Batch inference optimization
- Edge device deployment (Jetson, Intel NCS)
- Model serving with Triton
- Video processing pipelines
Common Commands
Ultralytics YOLO
# Training
yolo detect train data=coco.yaml model=yolov8m.pt epochs=100 imgsz=640
# Validation
yolo detect val model=best.pt data=coco.yaml
# Inference
yolo detect predict model=best.pt source=images/ save=True
# Export
yolo export model=best.pt format=onnx simplify=True dynamic=True
Detectron2
# Training
python train_net.py --config-file configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml \
--num-gpus 1 OUTPUT_DIR ./output
# Evaluation
python train_net.py --config-file configs/faster_rcnn.yaml --eval-only \
MODEL.WEIGHTS output/model_final.pth
# Inference
python demo.py --config-file configs/faster_rcnn.yaml \
--input images/*.jpg --output results/ \
--opts MODEL.WEIGHTS output/model_final.pth
MMDetection
# Training
python tools/train.py configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py
# Testing
python tools/test.py configs/faster_rcnn.py checkpoints/latest.pth --eval bbox
# Inference
python demo/image_demo.py demo.jpg configs/faster_rcnn.py checkpoints/latest.pth
Model Optimization
# ONNX export and simplify
python -c "import torch; model = torch.load('model.pt'); torch.onnx.export(model, torch.randn(1,3,640,640), 'model.onnx', opset_version=17)"
python -m onnxsim model.onnx model_sim.onnx
# TensorRT conversion
trtexec --onnx=model.onnx --saveEngine=model.engine --fp16 --workspace=4096
# Benchmark
trtexec --loadEngine=model.engine --batch=1 --iterations=1000 --avgRuns=100
Performance Targets
| Metric | Real-time | High Accuracy | Edge | |--------|-----------|---------------|------| | FPS | >30 | >10 | >15 | | mAP@50 | >0.6 | >0.8 | >0.5 | | Latency P99 | <50ms | <150ms | <100ms | | GPU Memory | <4GB | <8GB | <2GB | | Model Size | <50MB | <200MB | <20MB |
Resources
- Architecture Guide:
references/computer_vision_architectures.md - Optimization Guide:
references/object_detection_optimization.md - Deployment Guide:
references/production_vision_systems.md - Scripts:
scripts/directory for automation tools