This study examines how multimodal large language models evaluate hate speech. Larger models can make context-sensitive decisions aligned with human judgement. However, pervasive demographic and ...