Update on CVIL: the free CV interview prep checklist after landing my internship... just added Segmentation, OCR, and VLM sections [D]
Math Foundations
Start with linear algebra, calculus, probability, and statistics. Key topics include matrix operations, eigenvalues/eigenvectors, gradients, optimization, Bayes' theorem, and distributions.
CNNs
Understand convolutional layers, pooling, activation functions, batch normalization, and architectures like ResNet, VGG, and EfficientNet.
Vision Transformers (ViTs)
Learn about self-attention, multi-head attention, patch embedding, and transformer architectures adapted for vision tasks.
Detection
Study object detection frameworks: YOLO, Faster R-CNN, SSD, and key concepts like anchor boxes, NMS, and loss functions.
Tracking
Cover object tracking algorithms: SORT, DeepSORT, and motion prediction models.
Specialization Tracks
Pick based on the role you're targeting:
- Segmentation (new): Semantic, instance, and panoptic segmentation with U-Net, Mask R-CNN, and SAM.
- OCR (new): Text detection and recognition pipelines, CRNN, attention-based decoders.
- VLMs (new): Vision-language models like CLIP, BLIP, and multimodal fusion.
- ReID: Person re-identification with metric learning and triplet loss.
- Deployment: Model optimization, quantization, ONNX, TensorRT, and edge deployment.
Structure Updates
Cleaned up the overall organization and added proper contributing guidelines. Open tracks for contribution include 3D vision, pose estimation, and more.
GitHub
Repository: https://github.com/David-Magdy/CVIL
Feedback and PRs welcome, especially if something is outdated or miscategorized. And remember to keep it CVIL!
Comments
No comments yet. Start the discussion.