Skip to content

Commit e97364e

Browse files
authored
Merge pull request #13 from ivoa-std/linetap-species-table
Linetap species table
2 parents dcd7c26 + 21fcbf7 commit e97364e

File tree

4 files changed

+195
-36
lines changed

4 files changed

+195
-36
lines changed

LineTAP.tex

Lines changed: 103 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -104,8 +104,16 @@ \section{Introduction}
104104
section~\ref{sect:quantities}, while the mapping between our columns and the
105105
VAMDC-XSAMS Data Model is given in section~\ref{sect:mapping}.
106106

107+
During the development of the standard, a major problem in molecular
108+
spectroscopy turned out to be species nomenclature. The core LineTAP
109+
table sidesteps this problem by identifying species using IUPAC standard
110+
InChIs, a choice unpopular with many practitioners. To facilitate the
111+
use of colloquial species designations (``ethyl alcohol''), this
112+
specification also defines a \textit{species table} associating common
113+
names and sum formulas with InChIs in section \ref{sect:speciestable}.
114+
107115
When accessed using the Table Access Protocol TAP
108-
\citep{2019ivoa.spec.0927D}, the table can be queried using the
116+
\citep{2019ivoa.spec.0927D}, the tables can be queried using the
109117
expressive SQL-derived query language ADQL, while query results are
110118
available in the VOTable format, easily readable by VO client
111119
applications. Line databases accessible in this way can be registered
@@ -220,6 +228,13 @@ \subsection{Credit}
220228
repository of line data, it should be as simple as possible for users to
221229
give credit to the contributors of line data.
222230

231+
\subsection{Resolution of Molecule Designation}
232+
\label{uc:resolution}
233+
234+
A researcher wants to find lines for the molecule they have been calling
235+
``Methyl Mercaptan'' or designated by a pseudo-structural formula like
236+
\verb|CH3SHv=0| for a long time.
237+
223238

224239
\subsection{Non-Use Cases}
225240

@@ -235,6 +250,7 @@ \subsection{Non-Use Cases}
235250
\end{itemize}
236251

237252

253+
238254
\begin{table}[hpt]
239255
\hskip -0.05\linewidth
240256
\begin{tabular}{p{0.43\linewidth}cp{0.5\linewidth}}
@@ -280,7 +296,7 @@ \subsection{Non-Use Cases}
280296
\end{table}
281297

282298

283-
\section{Spectral Line Data}\label{sect:quantities}
299+
\section{Spectral Lines Table}\label{sect:quantities}
284300

285301
Table~\ref{tab:ltcols} gives the columns that make up the LineTAP
286302
relational model. Implementations MUST have all columns given in this
@@ -379,12 +395,53 @@ \section{Spectral Line Data}\label{sect:quantities}
379395

380396
\end{itemize}
381397

398+
\section{Species Table}\label{sect:speciestable}
399+
\label{ref:speciestable}
400+
401+
The species table is used to facilitate the referencing of molecules. As
402+
there are many summary formulas and colloquial molecule names for common
403+
species (and more than one species may correspond to a given summary
404+
formula and even colloquial name), the resolution of such identifiers to
405+
InChIs is generally non-trivial.
382406

383-
\section{Protocol}
384-
\label{sect:protocol}
385-
\subsection{Queries: LineTAP}
407+
LineTAP's species table contains a mapping between common names and
408+
summary formulas and InChIs. It should be populated by data providers
409+
publishing molecule data to the best of their knowledge. It is
410+
explicitly possible to associate multiple names with a single InChI.
411+
There is no explicit relationship between a species table and LineTAP
412+
tables on a given service, i.e., the presence of a species in the the
413+
species table is not a guarantee that data on it is available from any
414+
table in the service.
415+
416+
For most cases, only the InChIKey is enough to reference a molecule. The InChi
417+
column is present in this table for the case that users want to use it to confirm if the
418+
returned molecule is the one they're searching for.
419+
420+
\begin{table}[hpt]
421+
\hskip -0.05\linewidth
422+
\begin{tabular}{p{0.43\linewidth}cp{0.5\linewidth}}
423+
\sptablerule
424+
\textbf{Name [Unit]} \ucd{UCD}&\textbf{Type}&\textbf{Description}\\
425+
\sptablerule
426+
% GENERATED: python3 make-species-table.py
427+
\texttt{inchikey} \hfil\break\ucd{} & text & \raggedright InChIKey of this species\tabularnewline
428+
\rowsep
429+
\texttt{inchi} \hfil\break\ucd{} & text & \raggedright InChI of this species\tabularnewline
430+
\rowsep
431+
\texttt{name} \hfil\break\ucd{} & text & \raggedright A common name of this species\tabularnewline
432+
\rowsep
433+
\texttt{formula} \hfil\break\ucd{} & text & \raggedright Chemical formula of this species in some free-ish notation\tabularnewline
434+
\rowsep
435+
\texttt{source\_id} \hfil\break\ucd{} & text & \raggedright VAMDC identifier of the origin of this mapping\tabularnewline
386436

387-
\subsection{User-defined functions}
437+
% /GENERATED
438+
\sptablerule
439+
\end{tabular}
440+
\caption{The columns that make up the Species Table. }
441+
\label{tab:spcols}
442+
\end{table}
443+
444+
\section{ADQL User-defined functions}
388445
\label{sect:udfs}
389446

390447
LineTAP services MUST implement the \texttt{ivo\_specconv} user defined
@@ -541,6 +598,24 @@ \subsubsection{Characterising a Service's Data Holdings}
541598
GROUP BY inchi
542599
\end{lstlisting}
543600

601+
\subsubsection{Searching With Trivial Molecule Names}
602+
603+
Searching with trivial names as discussed in use
604+
case~\ref{uc:resolution} would often be a two-step process where clients
605+
ask the researcher which InChI would correspond the the species they
606+
were looking for. In simple cases, however, a single joined query can be
607+
run, too.
608+
609+
% please-run-a-test
610+
\begin{lstlisting}[language=SQL]
611+
SELECT
612+
*
613+
FROM casa_lines.line_tap
614+
JOIN species.main as s USING (inchikey)
615+
WHERE s.name='Methylidynium'
616+
\end{lstlisting}
617+
618+
544619
\section{Mapping from VAMDCXSAMS}
545620
\label{sect:mapping}
546621

@@ -665,16 +740,13 @@ \section{LineTAP and the VO Registry}
665740

666741
\subsection{Registering LineTAP-conforming Tables}
667742

668-
LineTAP tables are registered using VODataService \citep{2021ivoa.spec.1102D}
743+
LineTAP line tables are registered using VODataService \citep{2021ivoa.spec.1102D}
669744
tablesets, where the table utype is set to
670-
$$\hbox{\verb|ivo://ivoa.net/std/linetap#table-1.0|}.$$
745+
$$\hbox{\verb|ivo://ivoa.net/std/linetap#lines-1.0|}.$$
671746

672-
The tableset is normally contained in a VODataService \xmlel{CatalogService}
673-
record with a TAP capability, and this capability normally is an auxiliary
674-
capability as per DDC \citep{2019ivoa.spec.0520D}. For one-table
675-
services a full TAPRegExt \citep{2012ivoa.spec.0827D} capability is also
676-
allowed; other resource types can be used for registration as
677-
appropriate.
747+
The tableset is contained in a VODataService \xmlel{CatalogResource}
748+
record with a TAP auxiliary capability
749+
as per DDC \citep{2019ivoa.spec.0520D}.
678750

679751
Further capabilities, for instance for full VAMDC or legacy SLAP
680752
services, may be given in the same record.
@@ -714,7 +786,7 @@ \subsection{Registering LineTAP-conforming Tables}
714786
<name>toss.ivoa_lines</name>
715787
<title>TOSS</title>
716788
<description> The LineTAP version of...</description>
717-
<utype>ivo://ivoa.net/std/linetap#table-1.0</utype>
789+
<utype>ivo://ivoa.net/std/linetap#lines-1.0</utype>
718790
...
719791
</table>
720792
\end{lstlisting}
@@ -726,6 +798,12 @@ \subsection{Registering LineTAP-conforming Tables}
726798
and is thus to be expected in most registrations of this type. Clients
727799
are advised to use the resource description for full text searches.
728800

801+
Species tables are registered in exactly the same way, except their
802+
utype is
803+
$$\hbox{\verb|ivo://ivoa.net/std/linetap#species-1.0|}.$$
804+
Data providers should only register line and species tables in one
805+
resource record if the species table really has the same metadata
806+
(description, author, source, etc) as the line table.
729807

730808
\subsection{Discovering LineTAP services}
731809

@@ -738,35 +816,34 @@ \subsection{Discovering LineTAP services}
738816
would return TAP access URLs and the table names:
739817

740818
\begin{lstlisting}[language=SQL]
741-
SELECT DISTINCT table_name, access_url
819+
SELECT table_name, access_url
742820
FROM rr.res_table
743821
NATURAL JOIN rr.capability
744822
NATURAL JOIN rr.interface
745823
WHERE
746-
table_utype LIKE 'ivo://ivoa.net/std/linetap#table-1.%'
824+
table_utype LIKE 'ivo://ivoa.net/std/linetap#lines-1.%'
747825
AND standard_id LIKE 'ivo://ivoa.net/std/tap%'
748826
AND intf_role='std'
827+
AND res_type='vs:catalogresource'
749828
\end{lstlisting}
750829

751-
The \texttt{DISTINCT} in the main query is a rough filter that removes
752-
entries duplicated because their tables are registred both in the main
753-
TAP record and in an auxiliary capability.
754-
755830
The regular expression in the utype match is to make sure minor version
756831
increments do not prevent service discovery; by IVOA versioning rules,
757832
all LineTAP services of minor version 1 can be operated by all LineTAP
758833
clients of version 1. We do not constrain the version of the TAP
759834
service. Clients may want to adapt the TAP discovery pattern to match
760835
their specific needs.
761836

762-
837+
Adapting the utype, this query will work analogously for species tables.
763838

764839
\appendix
765-
\section{Changes from Previous Versions}
840+
\section{Changes from WD-2023-03-23}
766841

767-
No previous versions yet.
768-
% these would be subsections "Changes from v. WD-..."
769-
% Use itemize environments.
842+
\begin{itemize}
843+
\item Adding the species table
844+
\item Changing the line table utype to \dots lines-1.0 (rather than
845+
\dots table-1.0 before).
846+
\end{itemize}
770847

771848

772849
\bibliography{ivoatex/ivoabib,ivoatex/docrepo, localrefs}

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ DOCNAME = LineTAP
77
DOCVERSION = 1.0
88

99
# Publication date, ISO format; update manually for "releases"
10-
DOCDATE = 2023-03-23
10+
DOCDATE = 2024-09-18
1111

1212
# What is it you're writing: NOTE, WD, PR, REC, PEN, or EN
1313
DOCTYPE = WD

linetap.vor

Lines changed: 15 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
1-
<ri:Resource
2-
xsi:type="vstd:Standard"
3-
created="2020-10-26T11:44:00"
1+
<ri:Resource
2+
xsi:type="vstd:Standard"
3+
created="2020-10-26T11:44:00"
44
updated="2020-10-26T11:44:00"
55
status="active"
6-
xmlns:vr="http://www.ivoa.net/xml/VOResource/v1.0"
7-
xmlns:vstd="http://www.ivoa.net/xml/StandardsRegExt/v1.0"
8-
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
6+
xmlns:vr="http://www.ivoa.net/xml/VOResource/v1.0"
7+
xmlns:vstd="http://www.ivoa.net/xml/StandardsRegExt/v1.0"
8+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
99
xmlns:ri="http://www.ivoa.net/xml/RegistryInterface/v1.0"
1010
xsi:schemaLocation="http://www.ivoa.net/xml/VOResource/v1.0
1111
http://www.ivoa.net/xml/VOResource/v1.0
@@ -16,7 +16,7 @@
1616

1717
<title>IVOA Relational model for Spectral Lines (LineTAP)</title>
1818
<shortName>linetap</shortName>
19-
<identifier>ivo://ivoa.net/std/linetap</identifier>
19+
<identifier>ivo://ivoa.net/std/linetap</identifier>
2020
<curation>
2121
<publisher>IVOA</publisher>
2222

@@ -61,8 +61,14 @@
6161
<endorsedVersion status="wd">1.0</endorsedVersion>
6262

6363
<key>
64-
<name>table-1.0</name>
65-
<description>The LineTAP table schema as of version 1.0.
64+
<name>lines-1.0</name>
65+
<description>The LineTAP lines table schema as of version 1.0.
66+
</description>
67+
</key>
68+
69+
<key>
70+
<name>species-1.0</name>
71+
<description>The LineTAP species table schema as of version 1.0.
6672
</description>
6773
</key>
6874

make-species-table.py

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
#!/usr/bin/python3
2+
"""
3+
This writes LaTeX for the rows of our table of LineTAP columns. Technically,
4+
this obtains the info from the standard columns of an operational (and
5+
hopefully validated) table at dc.g-vo.org.
6+
7+
Dependency: python3-pyvo (and hence astropy).
8+
"""
9+
10+
import pyvo
11+
12+
NON_NULL_COLUMNS = {'title', 'vacuum_wavelength'}
13+
TYPE_MAP = {
14+
("char", "*"): "text",
15+
("unicodeChar", "*"): "text",
16+
("int", ""): "integer",
17+
("double", ""): "float",}
18+
19+
20+
def e(tx):
21+
"""returns tx with TeX's standard active (and other magic) characters
22+
escaped.
23+
"""
24+
return tx.replace("\\", "$\\backslash$"
25+
).replace("&", "\\&"
26+
).replace("#", "\\#"
27+
).replace("%", "\\%"
28+
).replace("_", "\\_"
29+
).replace("}", "\\}"
30+
).replace("{", "\\{"
31+
).replace('"', '{"}')
32+
33+
34+
def get_type(datatype, arraysize, nonnull):
35+
"""returns a simple type identifier for a VOTable datatype/arraysize.
36+
37+
Well, this really only nows what people have manually entered into
38+
TYPE_MAP above...
39+
"""
40+
res = e(TYPE_MAP[datatype, arraysize])
41+
if nonnull:
42+
res = f"\\textbf{{{res}}}"
43+
return res
44+
45+
46+
def main():
47+
svc = pyvo.tap.TAPService("http://dc.g-vo.org/tap")
48+
rows = []
49+
50+
for row in svc.run_sync("""
51+
select column_name, description, unit, ucd, datatype, arraysize
52+
from tap_schema.columns
53+
where
54+
table_name='species.main'
55+
order by column_index"""):
56+
parts = [r"\texttt{{{}}}".format(e(row["column_name"]))]
57+
if row["unit"]:
58+
parts.append(e("["+row["unit"].replace("Angstrom", "Å")+"]"))
59+
parts.append(r"\hfil\break\ucd{{{}}}".format(e(row["ucd"])))
60+
61+
parts.append("&")
62+
parts.append(get_type(
63+
row["datatype"],
64+
row["arraysize"],
65+
row["column_name"] in NON_NULL_COLUMNS))
66+
67+
parts.append("&")
68+
parts.append(r"\raggedright "+e(row["description"]))
69+
70+
rows.append(" ".join(parts)+r"\tabularnewline")
71+
72+
print("\n\\rowsep\n".join(rows))
73+
74+
75+
if __name__=="__main__":
76+
main()

0 commit comments

Comments
 (0)