Microsoft’s Bing team published a blog post on April 7 announcing the open-source release of Harrier AI, an industry-leading text embedding model series, to meet the high standards of information processing required by modern AI intelligent agent systems. This model ranked first in the multilingual MTEB-v2 benchmark.
Note: Embedding models are a technique that converts high-dimensional data such as text and images into low-dimensional vector representations, capturing the semantic features of the data and making similar content closer together in the vector space.
In the field of AI, it is a fundamental component for search engines, recommendation systems, and intelligent agents to perform information retrieval, semantic understanding, and knowledge reasoning, directly determining the quality and efficiency of information processing by the system.
To address the challenges of “implementation” in the transition of AI systems from simple question answering to operational execution, Microsoft’s Harrier series of models further improves embedding quality, significantly enhances the factual accuracy of first-time retrievals, reduces system latency and costs, and effectively reduces model illusions, thereby increasing user trust.
Harrier AI Models
The newly released Harrier series includes three versions: Harrier-OSS-v1-27B, Harrier-OSS-v1-0.6B, and Harrier-OSS-v1-270M. All models support over 100 languages, feature a 32k context window, and can generate fixed-size embedding vectors for any input.
In terms of technical implementation, the team built a scalable data pipeline, using GPT-5 to generate over 2 billion weakly supervised data samples for comparison pre-training, and over 10 million high-quality samples for fine-tuning.
In terms of training strategy, in order to adapt to the deployment requirements of low-end devices, the team, based on previous achievements such as E5 and GritLM, launched two lightweight versions, Harrier-OSS-v1-0.6b and Harrier-OSS-v1-270m, through knowledge distillation technology after the flagship model was trained.
In the authoritative multilingual MTEB-v2 benchmark test, the Harrier model successfully outperformed Google Gemini Embedding 2, ranking first in the industry.
Compared to its competitors, the Harrier model not only boasts superior performance but also employs a completely open-source strategy. Developers can use the model without licensing restrictions, easily improving the retrieval quality and semantic understanding capabilities of their AI applications.


Building on Harrier’s technological expertise, Microsoft is developing a brand-new search service. This service will offer superior search quality, enhanced semantic understanding, and more robust contextual selection, and will initially be applied to Bing search to improve the user experience.
