Skip to content

RepeatClassifier using wrong Libraries directory when RepeatMasker configured with -libdir #303

@TobyBaril

Description

@TobyBaril

Describe the issue

I have been testing a few runs and noticed a bug in which Libraries directory is used by RepeatClassifier when RepeatMasker has been configured with a library directory that is not default.

When configuring RepeatMasker with a library directory in a different location (e.g. /home/databases/RepeatMasker/Libraries/) and ALL Dfam partitions, RepeatMasker behaves as expected, but when running RepeatModeler (also configured after RepeatMasker), RepeatClassifier uses the Libraries directory within the RepeatMasker installation, which does not seem to be complete, resulting in the below error:

# Protein Library: /data/share/RepeatMasker/Libraries/RepeatPeps.lib
#   - 18011 proteins
# Consensi Library: /data/share/RepeatMasker/Libraries/RepeatMasker.lib
#   - 237 consensus sequences
# !!!! WARNING !!!! -- The curated TE consensus library does not appear to be complete.
#                      This may result in poor clasification performance.  Please
#                      ensure that you have the latest version of RepeatMasker
#                      installed and that you have installed the complete set of Dfam
#                      partitions.

Reproduction steps

  1. Install RepeatMasker and configure with:
perl ./configure -libdir /home/databases/RepeatMasker/Libraries/

Where /home/databases/RepeatMasker/Libraries/ looks like:

├── Artefacts.embl
├── CONS-Dfam_withRBRM_3.9
│   ├── fungi
│   │   ├── refineableHash.dat
│   │   ├── rmblastdb.log
│   │   ├── specieslib
│   │   ├── specieslib.ndb
│   │   ├── specieslib.nhr
│   │   ├── specieslib.nin
│   │   ├── specieslib.njs
│   │   ├── specieslib.not
│   │   ├── specieslib.nsq
│   │   ├── specieslib.ntf
│   │   ├── specieslib.nto
│   │   └── speciesMeta.pm
│   ├── general
│   │   ├── is.lib
│   │   ├── is.lib.ndb
│   │   ├── is.lib.nhr
│   │   ├── is.lib.nin
│   │   ├── is.lib.njs
│   │   ├── is.lib.not
│   │   ├── is.lib.nsq
│   │   ├── is.lib.ntf
│   │   ├── is.lib.nto
│   │   └── rmblastdb.log
│   ├── homo_sapiens
│   │   ├── cutlib
│   │   ├── cutlib.ndb
│   │   ├── cutlib.nhr
│   │   ├── cutlib.nin
│   │   ├── cutlib.njs
│   │   ├── cutlib.not
│   │   ├── cutlib.nsq
│   │   ├── cutlib.ntf
│   │   ├── cutlib.nto
│   │   ├── longlib
│   │   ├── longlib.ndb
│   │   ├── longlib.nhr
│   │   ├── longlib.nin
│   │   ├── longlib.njs
│   │   ├── longlib.not
│   │   ├── longlib.nsq
│   │   ├── longlib.ntf
│   │   ├── longlib.nto
│   │   ├── mirlib
│   │   ├── mirlib.ndb
│   │   ├── mirlib.nhr
│   │   ├── mirlib.nin
│   │   ├── mirlib.njs
│   │   ├── mirlib.not
│   │   ├── mirlib.nsq
│   │   ├── mirlib.ntf
│   │   ├── mirlib.nto
│   │   ├── mirslib
│   │   ├── mirslib.ndb
│   │   ├── mirslib.nhr
│   │   ├── mirslib.nin
│   │   ├── mirslib.njs
│   │   ├── mirslib.not
│   │   ├── mirslib.nsq
│   │   ├── mirslib.ntf
│   │   ├── mirslib.nto
│   │   ├── refineableHash.dat
│   │   ├── refinelib
│   │   ├── refinelib.ndb
│   │   ├── refinelib.nhr
│   │   ├── refinelib.nin
│   │   ├── refinelib.njs
│   │   ├── refinelib.not
│   │   ├── refinelib.nsq
│   │   ├── refinelib.ntf
│   │   ├── refinelib.nto
│   │   ├── retrolib
│   │   ├── retrolib.ndb
│   │   ├── retrolib.nhr
│   │   ├── retrolib.nin
│   │   ├── retrolib.njs
│   │   ├── retrolib.not
│   │   ├── retrolib.nsq
│   │   ├── retrolib.ntf
│   │   ├── retrolib.nto
│   │   ├── rmblastdb.log
│   │   ├── shortcutlib
│   │   ├── shortcutlib.ndb
│   │   ├── shortcutlib.nhr
│   │   ├── shortcutlib.nin
│   │   ├── shortcutlib.njs
│   │   ├── shortcutlib.not
│   │   ├── shortcutlib.nsq
│   │   ├── shortcutlib.ntf
│   │   ├── shortcutlib.nto
│   │   ├── shortlib
│   │   ├── shortlib.ndb
│   │   ├── shortlib.nhr
│   │   ├── shortlib.nin
│   │   ├── shortlib.njs
│   │   ├── shortlib.not
│   │   ├── shortlib.nsq
│   │   ├── shortlib.ntf
│   │   ├── shortlib.nto
│   │   ├── sinecutlib
│   │   ├── sinecutlib.ndb
│   │   ├── sinecutlib.nhr
│   │   ├── sinecutlib.nin
│   │   ├── sinecutlib.njs
│   │   ├── sinecutlib.not
│   │   ├── sinecutlib.nsq
│   │   ├── sinecutlib.ntf
│   │   ├── sinecutlib.nto
│   │   └── speciesMeta.pm
│   └── rmwritetest.deleteme
├── Dfam.h5
├── famdb
│   ├── dfam39_full.0.h5
│   ├── dfam39_full.10.h5
│   ├── dfam39_full.11.h5
│   ├── dfam39_full.12.h5
│   ├── dfam39_full.13.h5
│   ├── dfam39_full.14.h5
│   ├── dfam39_full.15.h5
│   ├── dfam39_full.16.h5
│   ├── dfam39_full.1.h5
│   ├── dfam39_full.2.h5
│   ├── dfam39_full.3.h5
│   ├── dfam39_full.4.h5
│   ├── dfam39_full.5.h5
│   ├── dfam39_full.6.h5
│   ├── dfam39_full.7.h5
│   ├── dfam39_full.8.h5
│   ├── dfam39_full.9.h5
│   └── rmlib.config
├── general
│   ├── is.lib
│   ├── is.lib.ndb
│   ├── is.lib.nhr
│   ├── is.lib.nin
│   ├── is.lib.njs
│   ├── is.lib.not
│   ├── is.lib.nsq
│   ├── is.lib.ntf
│   ├── is.lib.nto
│   └── rmblastdb.log
├── README.meta
├── README.RMRBSeqs
├── RepeatAnnotationData.pm
├── RepeatMasker.lib
├── RepeatMasker.lib.ndb
├── RepeatMasker.lib.nhr
├── RepeatMasker.lib.nin
├── RepeatMasker.lib.njs
├── RepeatMasker.lib.not
├── RepeatMasker.lib.nsq
├── RepeatMasker.lib.ntf
├── RepeatMasker.lib.nto
├── RepeatPeps.lib
├── RepeatPeps.lib.pdb
├── RepeatPeps.lib.phr
├── RepeatPeps.lib.pin
├── RepeatPeps.lib.pjs
├── RepeatPeps.lib.pot
├── RepeatPeps.lib.psq
├── RepeatPeps.lib.ptf
├── RepeatPeps.lib.pto
├── RepeatPeps.readme
├── RMRB.embl
├── RMRBMeta.embl
├── RMRBSeqs.embl
├── RMRB_spec_to_tax.json
└── taxonomy.dat

7 directories, 164 files
  1. Re-configure RepeatModeler (pointing to RepeatMasker install at /data/share/RepeatMasker/ when asked
perl ./configure
  1. Run RepeatModeler

  2. See warning for incomplete database when using RepeatClassifier

# Protein Library: /data/share/RepeatMasker/Libraries/RepeatPeps.lib
#   - 18011 proteins
# Consensi Library: /data/share/RepeatMasker/Libraries/RepeatMasker.lib
#   - 237 consensus sequences
# !!!! WARNING !!!! -- The curated TE consensus library does not appear to be complete.
#                      This may result in poor clasification performance.  Please
#                      ensure that you have the latest version of RepeatMasker
#                      installed and that you have installed the complete set of Dfam
#                      partitions.
  • Which version of RepeatMasker is this RepeatModeler installation using? Have you installed RepBase RepeatMasker Edition for RepeatMasker, or the full Dfam database?

v4.2.1 with RepBase RepeatMasker Edition and full Dfam 3.9 (all partitions).

I'm not sure if this is due to an issue with the configuration, or whether there is something I missed in the configuration that would resolve this, but I did not find an obvious solution.

EDIT:

I can fix this by simply deleting /data/share/RepeatMasker/Libraries/ and symlinking that of my specified libdir, which I guess could be implemented during the configuration to solve this?

# delete Libraries directory shipped with RepeatMasker
rm -r /data/share/RepeatMasker/Libraries/

# symlink the configured Libraries directory generated after RepeatMasker configuration 
ln -s /home/databases/RepeatMasker/Libraries/ /data/share/RepeatMasker/

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions