Skip to content

Collect Data

Naibo Wang edited this page Feb 15, 2023 · 7 revisions

"Collect Data" operation is the core operation when designing web crawler tasks, which means to collect/extract data from the web page and then save them into data store, such as .csv or database.

It is very easy to use EasySpider to collect data via point-and-click.

Define Operation (at the Task Design stage)

Steps to define the collect/extract data operation are:

  1. Select the element we want to collect/extract by right-click or F7.

  2. Select the Extract element's text option, or Collect Inner/Outer Html of this element option based on your requirements.

image

  1. The example of parameter will be shown at the Operation Toolbox, we can delete unused text if needed by clicking the × mark in the "Delete" field.

image

  1. Click the "Confirm Collect" option to confirm.

image

Then the "Collect Data" operation will be added to the Workflow Manager.

image

Operation Properties

This section shows the available properties of "Click Element" operation in the Workflow Manager.

  • Option Name: the option name, click the "Confirm" button to refresh the name after modification.

  • Use text inside the Loop: whether locate the element with XPath set in the "Loop" operation instead of the "XPath" defined in this operation. This option will only appear when "Click Element" operation is inside the "Loop" operation. E.g., we can use this option to loop click the "Next Page" button many times to collect many pages' product information at ebay.com.

  • XPath: XPath of the textbox, generated by EasySpider and can be modified freely by user.

  • After executed, whether scroll down: Yes or No, means when the new web page is loaded after click an element, whether or not to scroll the web page to the bottom. This is used when some web page requires user to scroll down to get all available contents, such as twitter where users will view posts one by one by scroll their mouses down.

  • Scroll Times: when set "Yes" for the scroll down option above, how many times will EasySpider automatically scroll down to the bottom, because in some web pages, we often need to scroll many times to make all contents loaded.

  • Seconds after executed: how long should EasySpider wait after automatically input text into the textbox.

Execute Operation (at the Task Invocation stage)

When executing tasks, the element will be automatically clicked when we set the "Click Element" operation.

Clone this wiki locally