Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing glycan with UND structure #17

Open
bobaoai opened this issue Sep 10, 2019 · 5 comments
Open

Parsing glycan with UND structure #17

bobaoai opened this issue Sep 10, 2019 · 5 comments

Comments

@bobaoai
Copy link

bobaoai commented Sep 10, 2019

Hey Joshua,

I encountered a problem when dealing with the glycan with ambiguity structure as shown below.
image
The glycoct string I generated from GlycanBuilder is attached.
As I read through the code, it looks like the writer can generate the glycoct string that contains 'SubtreeLinkageID1' but we cannot load the string, am I correct? If so, what change should I make on the string or how can I load/build a Glycan object that has the ambiguity structure? Is it possible for me to do the Glycan.fragments on this kind of structure?

glycoct.loads(
"""RES
1b:b-dglc-HEX-1:5
2s:n-acetyl
3b:b-dglc-HEX-1:5
4s:n-acetyl
5b:b-dman-HEX-1:5
6b:a-dman-HEX-1:5
7b:a-dman-HEX-1:5
8b:a-lgal-HEX-1:5|6:d
LIN
1:1d(2+1)2n
2:1o(4+1)3d
3:3d(2+1)4n
4:3o(4+1)5d
5:5o(3+1)6d
6:5o(6+1)7d
7:1o(6+1)8d
UND
UND1:100.0:100.0
ParentIDs:1|3|5|6|7|8
SubtreeLinkageID1:u(2+1)u
RES
9b:b-dglc-HEX-1:5
10s:n-acetyl
11b:b-dgal-HEX-1:5
LIN
8:9d(2+1)10n
9:9o(3+1)11d
""")
---------------------------------------------------------------------------
GlycoCTError                              Traceback (most recent call last)
<ipython-input-10-b5e3c0a66dfc> in <module>
     38 10:12d(2+1)13n
     39 11:12o(3+1)14d
---> 40 """)

/anaconda3/lib/python3.7/site-packages/glypy/io/glycoct.py in loads(text, structure_class, allow_repeats, allow_multiple)
   1330 
   1331     text_buffer = StringIO(text)
-> 1332     return load(text_buffer, structure_class, allow_repeats, allow_multiple)
   1333 
   1334 

/anaconda3/lib/python3.7/site-packages/glypy/io/glycoct.py in load(stream, structure_class, allow_repeats, allow_multiple)
   1299     """
   1300     g = GlycoCTReader(stream, structure_class=structure_class, allow_repeats=allow_repeats)
-> 1301     first = next(g)
   1302     if not allow_multiple:
   1303         return first

/anaconda3/lib/python3.7/site-packages/glypy/io/glycoct.py in next(self)
    888         if self._iter is None:
    889             iter(self)
--> 890         return next(self._iter)
    891 
    892     #: Alias for next. Supports Py3 Iterator interface

/anaconda3/lib/python3.7/site-packages/glypy/io/glycoct.py in parse(self)
   1251                 self.handle_repeat_inner(line)
   1252             elif line.strip()[:3] == UND:
-> 1253                 self.handle_und_inner(line)
   1254             elif ALT == line.strip():
   1255                 raise GlycoCTSectionUnsupported(ALT)

/anaconda3/lib/python3.7/site-packages/glypy/io/glycoct.py in handle_und_inner(self, line)
   1151         if match is None:
   1152             raise GlycoCTError("Could not interpret UND SubtreeLinkage %r at line %d" % (
-> 1153                 subtree_linkage_line, self._source_line))
   1154         else:
   1155             link_dict = match.groupdict()

GlycoCTError: Could not interpret UND SubtreeLinkage 'SubtreeLinkageID1:u(2+1)u' at line 21

Thanks for your help in advance!

@mobiusklein
Copy link
Owner

No, the GlycoCTReader can read underdetermined glycans. The problem here is that the GlycoCT string you got doesn't match the specification from the GlycoCT manual (Page 17).
image

A linkage type may be specified as using any of the letters odhnxrs. glypy can interpret all but r and s because I have never seen either of those prochiral loss linkages. If I had to guess, you/GlycanBuilder intended for the linkage around the UND component to be unknown (going by the "u")? The appropriate way to denote that would be with an x.

I can patch the parser to support "u" in those positions and translate it to "x" in the next few days though if that is indeed the expected behavior.

@bobaoai
Copy link
Author

bobaoai commented Sep 10, 2019

Thanks for the reply. Yeah, it is kind of strange, but I have seen a lot of 'u' and I am literally taking 'u' as 'x' for a while, by locally modifying the glypy code. I will let you know if I find anything discussion regarding using the 'u'.

However, as I manually changed the u to o and d, the parser works but it looks like the output glycan is not same as the input by attaching the UND structure to the first possible parent node. Is this the expected case? Thanks for the help:)

"""RES
1b:b-dglc-HEX-1:5
2s:n-acetyl
3b:b-dglc-HEX-1:5
4s:n-acetyl
5b:b-dman-HEX-1:5
6b:a-dman-HEX-1:5
7b:a-dman-HEX-1:5
8b:a-lgal-HEX-1:5|6:d
LIN
1:1d(2+1)2n
2:1o(4+1)3d
3:3d(2+1)4n
4:3o(4+1)5d
5:5o(3+1)6d
6:5o(6+1)7d
7:1o(6+1)8d
UND
UND1:100.0:100.0
ParentIDs:7|8
SubtreeLinkageID1:o(2+1)d
RES
9b:b-dglc-HEX-1:5
10s:n-acetyl
11b:b-dgal-HEX-1:5
LIN
8:9d(2+1)10n
9:9o(3+1)11d
""")
a_glycan

RES
1b:b-dglc-HEX-1:5
2s:n-acetyl
3b:b-dglc-HEX-1:5
4s:n-acetyl
5b:b-dman-HEX-1:5
6b:a-dman-HEX-1:5
7b:b-dglc-HEX-1:5
8s:n-acetyl
9b:b-dgal-HEX-1:5
10b:a-dman-HEX-1:5
11b:a-lgal-HEX-1:5|6:d
LIN
1:1d(2+1)2n
2:1o(4+1)3d
3:3d(2+1)4n
4:3o(4+1)5d
5:5o(3+1)6d
6:6o(2+1)7d
7:7d(2+1)8n
8:7o(3+1)9d
9:5o(6+1)10d
10:1o(6+1)11d

@mobiusklein
Copy link
Owner

Ah, right. glypy.io.glycoct supports reading UND sections, but doesn't know how to write them back out most of the time. It wasn't high on my priority list to support this at the time. The sub-tree linkage is created using an AmbiguousLink instead of a Link.

AmbiguousLink objects have a list of possible parents, parent positions, children, and child positions to choose from. When a Glycan has an undefined linkage or an ambiguous attachement site, you can iterate over the possible states using glycan.iterconfigurations(). See the iterconfigurations docstring for usage.

@bobaoai
Copy link
Author

bobaoai commented Sep 12, 2019 via email

@mobiusklein
Copy link
Owner

You're welcome.

Right now, my main concern with glypy is to improve the documentation. My inexperience with Sphinx when I first set it up may mean substantial re-organization there. Eventually, I may do some performance tuning, but I do not have any specific plans for that at this time.

If you would like to contribute, I'd be happy to review pull requests and discuss ideas and applications you might have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants