Known bugs in Ensembl | |
---|---|
Inconsistency in transcripts numbering in GFF3 and GTF exported files |
|
Affects: Live site | Versions: Ensembl 102, Ensembl 103, Ensembl 104, Ensembl 105 |
Description: We noticed, from a bug report that some inconstencies may appear in particular cases between our GFF3 and GTF FTP files available.
Sometime, depending on data underlying our dumps, the number of transcripts retrieved may differ from one file to the other, for the same species. The main difference between GTF and GFF3 dumping is that for GTF, we get the transcripts from the gene ($gene->get_all_Transcripts) while for the GFF3, we get the transcripts from the underlying slice ($transcript_adaptor->fetch_all_by_Slice) https://github.com/Ensembl/ensembl-production/blob/release/104/modules/Bio/EnsEMBL/Production/Pipeline/GFF3/DumpFile.pm#L199 This means if the transcript goes over the boundaries of the slice, we might not dump it although we dump the genes. We plan to fix this from 106 onwards. |
|
Workaround: No work around. Except using most up to date datasets |
GRCh37 REST VEP – Conservation Parameter |
|
Affects: Live site | Versions: Ensembl 101, Ensembl 102 |
Description: The parameter ‘Conservation’ on the VEP endpoint of [http://grch37.rest.ensembl.org/] does not provide data as expected. | |
Workaround: This will be fixed for release 103. |
Missing RefSeq data in homo_sapiens otherfeatures 102 |
|
Affects: Live site | Versions: Ensembl 100, Ensembl 101, Ensembl 102 |
Description:There are a number of RefSeq genes missing in the homo_sapiens_otherfeatures_102_38 database.
This will also affect VEP queries using the RefSeq transcript set. |
|
Workaround: The Ensembl transcript set is unaffected, but there is no work-around for VEP queries using the RefSeq or merged transcript set.
This will be fixed in Ensembl 103. |
Missing variant pathogenicity predictions for REVEL, MetaLR and MutationAssessor |
|
Affects: Live site, Mirrors | Versions: Ensembl 102 |
Description: We are missing variant pathogenicity predictions from REVEL, MetaLR and MutationAssessor on: * Variant page > Genes and regulation view * Transcript page > Variant table view This only affects human GRCh38 views. Predictions for CADD, SIFT and PolyPhen-2 are still available. This problem does not impact Ensembl VEP. |
|
Workaround: The scores can still be retrieved:
|
Missing human chrY gene in release 102 |
|
Affects: Live site, Mirrors | Versions: Ensembl 102 |
Description: The lncRNA gene XGY2, on human chrY, is missing from Ensembl release 102. It will be reinstated for release 103.. | |
Workaround: The gene can be accessed in the Ensembl release 101 archive. |
Missing data in mouse for 3D Protein Viewer |
|
Affects: Live site, Archives | Versions: Ensembl 100, Ensembl 101, Ensembl 102 |
Description: Mappings between Ensembl translations and PDBe protein structures are not available.
They are missing from the ‘Protein Summary’ view on our transcript pages. These data are also used to drive our interactive views showing variants on 3D PDB models. The ‘3D Protein Model’ views on the variant page and transcript pages currently return no data Views of novel variants on 3D structures are also missing in the VEP web interface. |
|
Workaround: The PDBe mappings and variant locations on 3D structures on transcript and variant pages can be viewed in Ensembl version 99. |
Genomes have been over-masked |
|
Affects: Live site, Mirrors | Versions: Ensembl 102 |
Description: Repeatmasked genomes have been masked using Repeatmodeler libraries for some species – we are not confident that this is not masking gene families and so will remove this masking, i.e. only mask the genomes using Repbase libraries. | |
Workaround: For the time being, masked genomes have been masked using the Repeatmodeler libraries. |
Broken/ missing links for transcripts with biotypes “tRNA” and “IG” for RefSeq tracks |
|
Affects: Live site | Versions: Ensembl 101, Ensembl 102 |
Description: When viewing the RefSeq track, the links to NCBI for transcripts with biotypes “tRNA” and “IG” are broken or incorrect. | |
Workaround: This will be fixed in an upcoming Ensembl release, in the meantime the links will be disabled. |
Compara ncRNA trees stats not described accurately |
|
Affects: Live site | Versions: Ensembl 100, Ensembl 101, Ensembl 102 |
Description: The stats computed in ncRNA trees under the names {{nb_genes_in_tree}} and {{nb_orphaned_genes}} are not actually referring to the final trees but the unfiltered clusters (earlier stage). | |
Workaround: In Ensembl 103 we have corrected this problem and they will match their name, but their values will decrease significantly in at least 50% of the species reported. |
Some protein coding genes turned into non_translating_CDS |
|
Affects: Live site | Versions: Ensembl 101, Ensembl 102 |
Description: A user spotted that peptide fasta files are considerably shorter for pachysolen_tannophilus_nrrl_y_2460_gca_001661245 (fungus). This is because in release 42 a lot of its protein coding genes were marked as nontranslating_CDS (although the underling data and annotation has not changed). | |
Workaround: No workaround |
Incorrect display ids/labels captured for UCSC external references in mouse |
|
Affects: Live site | Versions: Ensembl 100, Ensembl 101, Ensembl 102 |
Description: Ensembl identifiers (ENS ids) are displayed as UCSC external references for Homo sapiens. | |
Workaround: The linking out to UCSC website works correctly. |
rfam_genes have wrong strand when loaded with ensembl-genomeloader |
|
Affects: Live site | Versions: Ensembl 96, Ensembl 97, Ensembl 98, Ensembl 99, Ensembl 100, Ensembl 101, Ensembl 102 |
Description: The [https://github.com/Ensembl/ensembl-genomeloader] (GL) is used by NV divisions to load genomes and their annotations from the ENA. It also used to annotate non-coding genes matching RFAM HMMs, but apparently in some cases the assigned strand is the template strand. This affects some microbial and plant genomes loaded with the GL. | |
Workaround: We will remove the rfam_genes from the affected genomes and run the RNA features pipeline instead. |
GRCh37 – COSMIC insertion coordinates off by +1 |
|
Affects: Live site | Versions: Ensembl 100, Ensembl 101, Ensembl 102 |
Description: The coordinates for insertions imported for COSMIC source are off by +1.
For GRCh37 e100, e101, e102: 2.66 % (253,428 / 9,511,409) COSMIC variation is affected. |
|
Workaround: Release 99 can be used for GRCh37:
http://grch37-archive.ensembl.org/index.html http://ftp.ensembl.org/pub/grch37/release-99/variation/ This contained 4,478,854 COSMIC mutations, compared to 9,511,409 in the current database. |
Drosophila melanogaster RNA gene cross-reference links do not work |
|
Affects: Live site, Mirrors | Versions: Ensembl 99, Ensembl 100, Ensembl 101, Ensembl 102 |
Description: Rfam and miRBase cross-reference links do not work, because they use the FlyBase ID instead of the RNA gene. | |
Workaround: Search for the Rfam or miRBase ID on the respective websites. |