Name | Description |
---|---|
Chat GPT | Conversational AI with Text Prompt |
T5 | Prompt-based text synthesis |
SciSpace | Automatic literature review and data extraction from PDFs |
Consensus AI | Literature search, synthesis, and Q&A |
Elicit | Literature search, synthesis, and Q&A |
Falcon | LLM that has integrated extensive scientific literature in pretraining |
LLM4SD | LLM for scientific discovery in physiology, biophysics, physical chemistry, and quantum mechanics |
ChatPDF | Chatbot with PDFs |
Claude | Chatbot with PDFs |
Perplexity.ai | Chatbot retrained with Wikipedia, etc. and your own PDFs |
LLMs trained on scientific publications (updated regularly)
Name | Description |
---|---|
BLOOM | BLOOM (BigScience Language Open-science Open-access Multilingual): the BigScience 176 billion parameters model is currently training. |
SciBERT | SciBERT is a BERT model trained on scientific text. SciBERT is trained on papers from the corpus of semanticscholar.org. |
GALACTICA by Meta | LLM rained on over 48 million papers, textbooks, reference material, compounds, proteins and other sources of scientific knowledge. It's taken offline due to misinformation. |
IBM-NASA Models | Trained on 60 billion tokens on a corpus of astrophysics, planetary science, earth science, heliophysics, and biological and physical sciences data. |
Mozi | Mozi is the first large-scale language model for the scientific paper domain, such as question answering and emotional support. With the help of the large-scale language and evidence retrieval models, SciDPR, Mozi generates concise and accurate responses to users' questions about specific papers and provides emotional support for academic researchers. |
Awesome Scientific Language Models | A curated list of pre-trained language models in scientific domains (e.g., mathematics, physics, chemistry, materials science, biology, medicine, geoscience). |
Selected readings
- Cai, H., Cai, X., Chang, J., Li, S., Yao, L., Wang, C., ... & Ke, G. (2024). Sciassess: Benchmarking llm proficiency in scientific literature analysis. arXiv preprint arXiv:2403.01976.
- Ma, Y., Gou, Z., Hao, J., Xu, R., Wang, S., Pan, L., ... & Sun, A. (2024). SciAgent: Tool-augmented Language Models for Scientific Reasoning. arXiv preprint arXiv:2402.11451.
- Shojaee, P., Meidani, K., Gupta, S., Farimani, A. B., & Reddy, C. K. (2024). Llm-sr: Scientific equation discovery via programming with large language models. arXiv preprint arXiv:2404.18400.
- Wang, Z., Cao, L., Danek, B., Zhang, Y., Jin, Q., Lu, Z., & Sun, J. (2024). Accelerating Clinical Evidence Synthesis with Large Language Models. arXiv preprint arXiv:2406.17755.
- Zheng, Y., Koh, H.Y., Ju, J., Nguyen, A.T., May, L.T., Webb, G.I. and Pan, S., (2023).
Large language models for scientific synthesis, inference and explanation. arXiv preprint arXiv:2310.07984.