Towards Automated Understanding of Scientific Software
Description
Data science projects require knowledge of software that changes rapidly. As a result, scientists spend hours reading long documentations and manuals instead of advancing their scientific fields. In this project, we aim to automatically extract relevant aspects of scientific software (e.g., what does it do, how to install it, how to operate with it or how to cite it) from documentation and code using machine learning techniques. The students will build on an existing baseline of classifiers and try to improve the existing results.
Awards
Best Data Science Open and Sharing Practices
Best Project Presentation
Highlighted Project
Students
Advisors
Skills Required by the team
- Python
- Knowledge Graphs
- Machine Learning
- Sklearn