Global OpenLabs for Performance-Enhancement Analytics and Knowledge System

Machine reading tools (updated regularly)

Name	Description
Chat GPT	Conversational AI with Text Prompt
T5	Prompt-based text synthesis
SciSpace	Automatic literature review and data extraction from PDFs
Consensus AI	Literature search, synthesis, and Q&A
Elicit	Literature search, synthesis, and Q&A
Falcon	LLM that has integrated extensive scientific literature in pretraining
LLM4SD	LLM for scientific discovery in physiology, biophysics, physical chemistry, and quantum mechanics
ChatPDF	Chatbot with PDFs
Claude	Chatbot with PDFs
Perplexity.ai	Chatbot retrained with Wikipedia, etc. and your own PDFs

LLMs trained on scientific publications (updated regularly)

Name	Description
BLOOM	BLOOM (BigScience Language Open-science Open-access Multilingual): the BigScience 176 billion parameters model is currently training.
SciBERT	SciBERT is a BERT model trained on scientific text. SciBERT is trained on papers from the corpus of semanticscholar.org.
GALACTICA by Meta	LLM rained on over 48 million papers, textbooks, reference material, compounds, proteins and other sources of scientific knowledge. It's taken offline due to misinformation.
IBM-NASA Models	Trained on 60 billion tokens on a corpus of astrophysics, planetary science, earth science, heliophysics, and biological and physical sciences data.
Mozi	Mozi is the first large-scale language model for the scientific paper domain, such as question answering and emotional support. With the help of the large-scale language and evidence retrieval models, SciDPR, Mozi generates concise and accurate responses to users' questions about specific papers and provides emotional support for academic researchers.
Awesome Scientific Language Models	A curated list of pre-trained language models in scientific domains (e.g., mathematics, physics, chemistry, materials science, biology, medicine, geoscience).

Selected readings

Cai, H., Cai, X., Chang, J., Li, S., Yao, L., Wang, C., ... & Ke, G. (2024). Sciassess: Benchmarking llm proficiency in scientific literature analysis. arXiv preprint arXiv:2403.01976.
Ma, Y., Gou, Z., Hao, J., Xu, R., Wang, S., Pan, L., ... & Sun, A. (2024). SciAgent: Tool-augmented Language Models for Scientific Reasoning. arXiv preprint arXiv:2402.11451.
Shojaee, P., Meidani, K., Gupta, S., Farimani, A. B., & Reddy, C. K. (2024). Llm-sr: Scientific equation discovery via programming with large language models. arXiv preprint arXiv:2404.18400.
Wang, Z., Cao, L., Danek, B., Zhang, Y., Jin, Q., Lu, Z., & Sun, J. (2024). Accelerating Clinical Evidence Synthesis with Large Language Models. arXiv preprint arXiv:2406.17755.
Zheng, Y., Koh, H.Y., Ju, J., Nguyen, A.T., May, L.T., Webb, G.I. and Pan, S., (2023). Large language models for scientific synthesis, inference and explanation. arXiv preprint arXiv:2310.07984.