Software Alternatives & Reviews

Improving Search Quality for Non-English Queries with Fine-tuned Multilingual CLIP Models

OpenAI CommonCrawl
  1. 1
    GPT-3 access without the wait
    Pricing:
    • Open Source
    We’re going to look at a model that Open AI has trained with a broad multilingual dataset: The xlm-roberta-base-ViT-B-32 CLIP model, which uses the ViT-B/32image encoder, and the XLM-RoBERTa multilingual language model. Both of these are pre-trained:.

    #Productivity #Developer Tools #IDE 299 social mentions

  2. Common Crawl
    Open AI then co-trained the two encoders with the multilingual laion5b dataset, which contains 5.85 billion image-text pairs: 2.2 billion of these pairs are labelled in 100+ non-English languages, with the rest in English or containing text that can’t be nailed down to any one language (like place names or other proper nouns). These are taken from a sampling of images and their HTML alt-text in the Common Crawl web archive.

    #Search Engine #Web Scraping #Data Extraction 90 social mentions

Discuss: Improving Search Quality for Non-English Queries with Fine-tuned Multilingual CLIP Models

Log in or Post with