Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add attributes dict accessor #107

Merged
merged 5 commits into from
Jun 8, 2018
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions parsel/selector.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,16 @@ def extract_first(self, default=None):
return default
get = extract_first

@property
def attrib(self):
"""Return the attributes dictionary for the first element.
If the list is empty, return an empty dict.
"""
for x in self:
return x.attrib
else:
return {}


class Selector(object):
"""
Expand Down Expand Up @@ -324,6 +334,12 @@ def remove_namespaces(self):
# remove namespace declarations
etree.cleanup_namespaces(self.root)

@property
def attrib(self):
"""Return the attributes dictionary for underlying element.
"""
return self.root.attrib
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This returns an lxml.etree._Attrib instance, which can lead to unforseen consequences:

  • changes in this object are reflected in a tree;
  • this object may keep a reference to a tree, preventing response GC;
  • isinstance(sel.attrib, dict) is False

What do you think about returning dict(self.root.attrib) instead? It will ensure a copy, and make output a regular Python dictionary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oohh, good catch!
Sounds good, I'll do that in a jiffy!


def __bool__(self):
"""
Return ``True`` if there is any real content selected or ``False``
Expand Down
28 changes: 28 additions & 0 deletions tests/test_selector.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,34 @@ def test_simple_selection_with_variables_escape_friendly(self):
lng=lt)],
[u'a'])

def test_accessing_attributes(self):
body = u"""
<html lang="en" version="1.0">
<body>
<ul id="some-list" class="list-cls" class="list-cls">
<li class="item-cls" id="list-item-1">
<li class="item-cls active" id="list-item-2">
<li class="item-cls" id="list-item-3">
</ul>
</body>
</html>
"""
sel = self.sscls(text=body)
self.assertEquals({'lang': 'en', 'version': '1.0'}, sel.attrib)
self.assertEquals({'id': 'some-list', 'class': 'list-cls'}, sel.css('ul')[0].attrib)

# for a SelectorList, bring the attributes of first-element only
self.assertEquals({'id': 'some-list', 'class': 'list-cls'}, sel.css('ul').attrib)
self.assertEquals({'class': 'item-cls', 'id': 'list-item-1'}, sel.css('li').attrib)
self.assertEquals({}, sel.css('body').attrib)
self.assertEquals({}, sel.css('non-existing-element').attrib)

self.assertEquals(
[{'class': 'item-cls', 'id': 'list-item-1'},
{'class': 'item-cls active', 'id': 'list-item-2'},
{'class': 'item-cls', 'id': 'list-item-3'}],
[e.attrib for e in sel.css('li')])

def test_representation_slice(self):
body = u"<p><input name='{}' value='\xa9'/></p>".format(50 * 'b')
sel = self.sscls(text=body)
Expand Down