Reddit - r/MachineLearning

Update on CVIL: the free CV interview prep checklist after landing my internship... just added Segmentation, OCR, and VLM sections [D]

Math Foundations

Start with linear algebra, calculus, probability, and statistics. Key topics include matrix operations, eigenvalues/eigenvectors, gradients, optimization, Bayes' theorem, and distributions.

CNNs

Understand convolutional layers, pooling, activation functions, batch normalization, and architectures like ResNet, VGG, and EfficientNet.

Vision Transformers (ViTs)

Learn about self-attention, multi-head attention, patch embedding, and transformer architectures adapted for vision tasks.

Detection

Study object detection frameworks: YOLO, Faster R-CNN, SSD, and key concepts like anchor boxes, NMS, and loss functions.

Tracking

Cover object tracking algorithms: SORT, DeepSORT, and motion prediction models.

Specialization Tracks

Pick based on the role you're targeting:

  • Segmentation (new): Semantic, instance, and panoptic segmentation with U-Net, Mask R-CNN, and SAM.
  • OCR (new): Text detection and recognition pipelines, CRNN, attention-based decoders.
  • VLMs (new): Vision-language models like CLIP, BLIP, and multimodal fusion.
  • ReID: Person re-identification with metric learning and triplet loss.
  • Deployment: Model optimization, quantization, ONNX, TensorRT, and edge deployment.

Structure Updates

Cleaned up the overall organization and added proper contributing guidelines. Open tracks for contribution include 3D vision, pose estimation, and more.

GitHub

Repository: https://github.com/David-Magdy/CVIL

Feedback and PRs welcome, especially if something is outdated or miscategorized. And remember to keep it CVIL!

Comments

No comments yet. Start the discussion.