2023-02-09

Datasets for Embeddings Performance Evaluation

Dataset: BEIR (ArguAna, ClimateFEVER, DBPedia, FEVER, FiQA2018, HotpotQA, NFCorpus, QuoraRetrieval, SciFact, TRECCOVID, Touche2020)

Dataset: CodeSearchNet

Sentence similarity

Dataset: SentEval (STS 2012–2016)

Text classification

Dataset: SentEval (MR, CR, SUBJ, MPQA, SST, TREC, MRPC)

Source: New and Improved Embedding Model

See also: Vectorview - analyzing data and user queries, providing actionable insights for a better fit between model and user needs