2023-02-09    Share on: Twitter | Facebook | HackerNews | Reddit

Datasets for Embeddings Performance Evaluation

Text search

Dataset: BEIR (ArguAna, ClimateFEVER, DBPedia, FEVER, FiQA2018, HotpotQA, NFCorpus, QuoraRetrieval, SciFact, TRECCOVID, Touche2020)

Code search

Dataset: CodeSearchNet

Sentence similarity

Dataset: SentEval (STS 2012–2016)

Text classification

Dataset: SentEval (MR, CR, SUBJ, MPQA, SST, TREC, MRPC)

Source: New and Improved Embedding Model

See also: Vectorview - analyzing data and user queries, providing actionable insights for a better fit between model and user needs