Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning: Columns 'name', 'value' are not present in the database and have been removed #128

Open
lgatto opened this issue Jan 7, 2022 · 6 comments

Comments

@lgatto
Copy link
Contributor

lgatto commented Jan 7, 2022

> library(ensembldb)
*** output flushed ***
> library(AnnotationHub)
*** output flushed ***
> ah <- AnnotationHub()
snapshotDate(): 2021-12-20
> edb105 <- ah[["AH98047"]]
loading from cache
> ensembldb:::cleanColumns(edb105, listColumns(edb105))
 [1] "seq_name"              "seq_length"            "is_circular"          
 [4] "gene_id"               "entrezid"              "exon_id"              
 [7] "exon_seq_start"        "exon_seq_end"          "gene_name"            
[10] "gene_biotype"          "gene_seq_start"        "gene_seq_end"         
[13] "seq_strand"            "seq_coord_system"      "description"          
[16] "gene_id_version"       "canonical_transcript"  "symbol"               
[19] "tx_id"                 "protein_id"            "protein_sequence"     
[22] "protein_domain_id"     "protein_domain_source" "interpro_accession"   
[25] "prot_dom_start"        "prot_dom_end"          "tx_biotype"           
[28] "tx_seq_start"          "tx_seq_end"            "tx_cds_seq_start"     
[31] "tx_cds_seq_end"        "tx_support_level"      "tx_id_version"        
[34] "gc_content"            "tx_external_name"      "tx_is_canonical"      
[37] "tx_name"               "exon_idx"              "uniprot_id"           
[40] "uniprot_db"            "uniprot_mapping_type" 
Warning message:
In ensembldb:::cleanColumns(edb105, listColumns(edb105)) :
  Columns 'name', 'value' are not present in the database and have been removed

with

> sessionInfo()
R Under development (unstable) (2021-11-10 r81172)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/libf77blas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] AnnotationHub_3.3.7     BiocFileCache_2.3.3     dbplyr_2.1.1           
 [4] ensembldb_2.19.6        AnnotationFilter_1.19.0 GenomicFeatures_1.47.5 
 [7] AnnotationDbi_1.57.1    Biobase_2.55.0          GenomicRanges_1.47.5   
[10] GenomeInfoDb_1.31.1     IRanges_2.29.1          S4Vectors_0.33.8       
[13] BiocGenerics_0.41.2    

loaded via a namespace (and not attached):
 [1] MatrixGenerics_1.7.0          httr_1.4.2                   
 [3] bit64_4.0.5                   shiny_1.7.1                  
 [5] assertthat_0.2.1              interactiveDisplayBase_1.33.0
 [7] BiocManager_1.30.16           blob_1.2.2                   
 [9] GenomeInfoDbData_1.2.7        Rsamtools_2.11.0             
[11] yaml_2.2.1                    progress_1.2.2               
[13] BiocVersion_3.15.0            pillar_1.6.4                 
[15] RSQLite_2.2.9                 lattice_0.20-45              
[17] glue_1.6.0                    digest_0.6.29                
[19] promises_1.2.0.1              XVector_0.35.0               
[21] httpuv_1.6.4                  htmltools_0.5.2              
[23] Matrix_1.3-4                  XML_3.99-0.8                 
[25] pkgconfig_2.0.3               biomaRt_2.51.1               
[27] zlibbioc_1.41.0               xtable_1.8-4                 
[29] purrr_0.3.4                   later_1.3.0                  
[31] BiocParallel_1.29.8           tibble_3.1.6                 
[33] KEGGREST_1.35.0               generics_0.1.1               
[35] ellipsis_0.3.2                withr_2.4.3                  
[37] cachem_1.0.6                  SummarizedExperiment_1.25.3  
[39] lazyeval_0.2.2                mime_0.12                    
[41] magrittr_2.0.1                crayon_1.4.2                 
[43] memoise_2.0.1                 fansi_0.5.0                  
[45] xml2_1.3.3                    tools_4.2.0                  
[47] prettyunits_1.1.1             hms_1.1.1                    
[49] BiocIO_1.5.0                  lifecycle_1.0.1              
[51] matrixStats_0.61.0            stringr_1.4.0                
[53] DelayedArray_0.21.2           Biostrings_2.63.0            
[55] compiler_4.2.0                rlang_0.4.12                 
[57] grid_4.2.0                    RCurl_1.98-1.5               
[59] rjson_0.2.20                  rappdirs_0.3.3               
[61] bitops_1.0-7                  restfulr_0.0.13              
[63] DBI_1.1.2                     curl_4.3.2                   
[65] R6_2.5.1                      GenomicAlignments_1.31.2     
[67] dplyr_1.0.7                   rtracklayer_1.55.3           
[69] fastmap_1.1.0                 bit_4.0.4                    
[71] utf8_1.2.2                    filelock_1.0.2               
[73] ProtGenerics_1.27.1           stringi_1.7.6                
[75] parallel_4.2.0                Rcpp_1.0.7                   
[77] vctrs_0.3.8                   png_0.1-7                    
[79] tidyselect_1.1.1             
@jorainer
Copy link
Owner

The columns "name" and "value" are from the metadata database table which is generally not used in any query in the database. Where did you get this warning message (I assume you did not specifically call the cleanColumns when you first saw the message)?

@lgatto
Copy link
Contributor Author

lgatto commented Jan 11, 2022

I suppose I wanted a way to get all possible columns and was surprised to get a warning when using clear (although un-exported) functions. Is there another way to do that?

@jorainer
Copy link
Owner

No, listColumns is actually the correct function to list all available database columns - maybe listTables might be even better because it tells which columns are in which table. I could fix the listColumns to not list columns from the metadata table because that table will usually not be queried anyway.

@lgatto
Copy link
Contributor Author

lgatto commented Jan 11, 2022

Yes, that sounds reasonable. Or at least not throw a warning but a simple message, if you think that's warranted.

@lgatto
Copy link
Contributor Author

lgatto commented Jan 11, 2022

On a similar note:

> ens <- proteins(edb105, listColumns(edb105))
Warning messages:
1: In cleanColumns(object, unique(c(columns, "protein_id"))) :
  Columns 'name', 'value' are not present in the database and have been removed
2: In .local(object, ...) :
  Exon specific columns are not allowed for proteins. Columns 'exon_id', 'exon_seq_start', 'exon_seq_end', 'exon_idx' have been removed.

Is the second warning warranted? As above, I should be able to get all columns for proteins without triggering a warning.

jorainer added a commit that referenced this issue Jan 12, 2022
- `listColumns` does no longer report column names from the metadata database
  table. This fixes issue #128.
@jorainer
Copy link
Owner

I think the second warning is OK. listColumns lists all database columns, but for protein annotations it makes no sense to also return exon coordinates - that would blow up the results (and in addition the join query would be rather complex and the query would eventually take very long).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants