Multimodal Foundation Models for Video: A Practical Guide | CrowdCore