Ensembl manual: structure and style guide
Brief explanation
Ensembl aims to release a new website in 2020 for exploring genomic data. With it, we’d like to produce a regularly updated manual that is available alongside the website, describing the data available and how to access it. This project aims to identify the technologies needed for such a manual, produce a logical structure for the manual, and create a style guide.
Ensembl has vast amounts and types of genomic data with different methods to access it. The documentation should cover what the data is, where it comes from and how we process it, plus how to use the different tools to work with it. This needs to be organised into a logical structure to make it easily accessible for our users.
We release new software and data several times a year, so this manual would need to be dynamic. Writers need to be alerted when changes occur that need to be reflected in the documentation. Previous suggestions for this include using a ghost browser to take screenshots, then producing alerts when image analysis software indicates differences between old and new screenshots.
The documentation will be written by many individuals within and outside the Ensembl project, with varying technical ability. The structure will guide the writers to ensure they are writing relevant and complete documentation, while the style guide will ensure consistency of voice throughout the documentation.
Expected results
- Identification of suitable technologies for storing, presenting and updating the documentation.
- A general structure listing the sections of the manual noting their approximate content.
- A style guide for the documentation, including how to use highlighting for different types of elements, diagram labelling and spelling.
- Sample documentation on topics that will be unchanged in the new website.
Commitment
Full-time
Mentors
Emily Perry, Andy Yates and Andrea Winterbottom
Manual gene annotation documentation
Brief explanation
Ensembl genes are annotated onto the genome by a combination of automatic annotation using a pipeline and manual annotation by skilled annotators. Current documentation describing the process of manual gene annotation is very limited and needs significant expansion.
Manual gene annotation is a very involved process, which uses data from a variety of sources, along with the annotators’ expert knowledge of gene structures, to determine the position of genes. Annotators make use of their own annotators’ guidelines, which may form the basis of any documentation. They have specialised software to carry out the task, which is not available to the public.
This project would aim to produce the documentation that will allow Ensembl users to understand where their genes came from. The source of genes and the reason for any changes in our gene models are some of the most popular topics on our email helpdesk. The documentation would not help people to annotate genes in Ensembl themselves.
Expected results
- Documentation pages describing the process of manual gene annotation.
- Images illustrating the process of manual gene annotation.
Commitment
Full-time
Mentors
Jonathan Mudge and Jane Loveland
eHive manual update
Brief explanation
eHive is a system to define and execute workflows. It operates by spawning autonomous workers which carry out parallel tasks in order to complete the larger pipeline. It was originally created for running in-house Ensembl workflows, but is now shipped separately and can be used to run any computational workflows. It is widely used in Ensembl and by other projects: we are aware of several hundreds active workflows, totalling thousands of CPU years on several compute clusters. eHive is written in Perl, with Python and Java plugins. It supports various job schedulers and has beta support for Docker clouds.
The existing eHive manual is hosted on Readthedocs, however there are improvements we wish to make to the documentation, including the addition of a cheat sheet. The overall goal is to make eHive more accessible: make the first steps easier for new users, whilst allowing users to reach intermediate and advanced levels quicker.
Expected results
- Better documentation of advanced features.
- Cheat sheet.
- Other miscellaneous improvements to the existing documentation.
Commitment
Part-time
Mentors
Matthieu Muffato and Brandon Walts
Ensembl production manual
Brief explanation
The Ensembl teams have developed many automated pipelines to process and analyse genomics data. Those are used to produce the Ensembl databases, including our gene annotation, gene tree, regulatory build and variation annotation pipelines. Many people who work with species not included in Ensembl or with confidential data, would like to run our pipelines on their own data and produce their own Ensembl-like databases.
All of the code needed to run an Ensembl pipeline is Open Source, either from a public provider or written by us and distributed on GitHub. However, getting started with this involves setting up a complex production environment, and several pipelines require domain-specific knowledge.
Expected results
- A manual detailing how to set up the production environment and run an Ensembl pipeline.
- A framework to host and document pipeline-specific information.
- A manuals for each pipeline, written with the help and knowledge from the relevant teams
Commitment
Full-time
Mentors
Helen Schuilenburg and Nishadi De Silva
VEP documentation
Brief explanation
The Ensembl Variant Effect Predictor (VEP) annotates lists of known and novel genetic variants with the genes they affect. This is available as an online tool, an offline package and REST API endpoints.
The offline package is the most flexible and powerful method of annotating variants. However, the many options are difficult to navigate, and the installation relies on a number of external packages which can be difficult to install.
We would like to rewrite our documentation to make it easier for people to navigate, and improve the installation documentation to include troubleshooting advice.
Expected results
- Restructure of the VEP documentation.
- Installation troubleshooting guide.
Commitment
Part-time