Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Makefile to uniformly manage the compilation process #601

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 92 additions & 0 deletions ElectronJS/MAKEFILE_README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
[中文](#环境构建)
[English](#environment-setup)

## 环境构建
确保处于EasySpider根目录下,且已经安装了Node.js、npm、chrome、对应版本的chromedriver和python。

## 参数说明

### 程序参数
测试参数默认为Ubuntu系统,若使用其他系统,请根据实际情况修改。
1. `ROOT_DIR`:项目根目录,默认为当前项目目录,即EasySpider下
2. `CHROME_DIR`: chrome安装目录
3. `SYSTEM`: 系统类型,默认为`linux64`,可选: `linux64`, `mac64`
4. `CHROMEDRIVER_DIR`: chromedriver路径,不存在默认值
5. `CHROMEDRIVER_SUFFIX`: chromedriver后缀名,默认为`linux64`,可选:`linux64`, `mac64`

### make参数
1. `dependency`: 安装Ubuntu依赖
2. `extension`:只编译浏览器扩展
3. `chrome`:复制chrome浏览器到指定目录
4. `chromedriver`:复制chromedriver到指定目录
5. `electron`:只编译ElectronJS主程序
6. `clean_extension`:清理浏览器扩展npm依赖
7. `clean_electron`: 清除ElectronJS主程序依赖
8. `all`: 包括1-5步骤
9. `dev`: 6-7, 2-5步骤, 移除npm依赖重新, 开发多使用
10. `clean`: 6-7步骤, 清理浏览器扩展和ElectronJS主程序

## 运行
在Ubuntu下, 构建好环境后:
```shell
make CHROMEDRIVER_DIR=~/Downloads/chromedriver
```
它完整的是:
```shell
make CHROME_DIR=/opt/google/chrome SYSTEM=linux64 CHROMEDRIVER_PATH=~/Downloads/chromedriver CHROMEDRIVER_SUFFIX=linux64
```
想清除npm依赖后重新编译:
```shell
make clean
```
开发过程多用:
```shell
make dev CHROMEDRIVER_DIR=~/Downloads/chromedriver
```

## Environment Setup
Make sure you are in the root directory of EasySpider and have installed Node.js, npm, chrome, corresponding versions of chromedriver and python.
Ensure that Node.js, npm, Chrome, the corresponding version of Chromedriver, and Python are installed.

## Parameter Description

### Program Parameters
The default testing parameters are for Ubuntu. If using a different system, please modify them according to your actual situation.
1. `ROOT_DIR`: The root directory of the project. By default, it is the current project directory, i.e., under EasySpider.
2. `CHROME_DIR`: The directory where Chrome is installed.
3. `SYSTEM`: The system type. By default, it is `linux64`. Options: `linux64`, `mac64`.
4. `CHROMEDRIVER_DIR`: The path to Chromedriver, which has no default value.
5. `CHROMEDRIVER_SUFFIX`: The suffix for Chromedriver, defaulting to `linux64`. Options: `linux64`, `mac64`.

### Make Parameters
1. `dependency`: Install Ubuntu dependencies.
2. `extension`: Only compile the browser extension.
3. `chrome`: Copy the Chrome browser to the specified directory.
4. `chromedriver`: Copy Chromedriver to the specified directory.
5. `electron`: Only compile the ElectronJS main program.
6. `clean_extension`: Clean up browser extension npm dependencies.
7. `clean_electron`: Clean up ElectronJS main program dependencies.
8. `all`: Execute steps 1-5.
9. `dev`: Execute steps 6-7, then steps 2-5. Removes npm dependencies and recompiles, primarily used during development.
10. `clean`: Execute steps 6-7 to clean up browser extensions and ElectronJS main program.

## Usage
After setting up the environment on Ubuntu, run the following command:
```shell
make CHROMEDRIVER_DIR=~/Downloads/chromedriver
```
The full command is:
```shell
make CHROME_DIR=/opt/google/chrome SYSTEM=linux64 CHROMEDRIVER_PATH=~/Downloads/chromedriver CHROMEDRIVER_SUFFIX=linux64
```

To clear npm dependencies and recompile:
```shell
make clean
```

For development, use the following command:
```shell
make dev CHROMEDRIVER_DIR=~/Downloads/chromedriver
```

10 changes: 7 additions & 3 deletions ElectronJS/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

[从源代码编译程序并设计运行和调试任务指南(基于Ubuntu24.04)](https://www.bilibili.com/video/BV1VE421P7yj/)

## 在类Unix系统上使用Makefile | Use Makefile in Unix-like system

[Makefile Readme](MAKEFILE_README.md)

# 环境编译说明 | Environment Compilation Instruction

EasySpider分三部分:
Expand Down Expand Up @@ -36,10 +40,10 @@ This section covers the compilation instructions for the `main program`.
3. Compile the execution stage program, otherwise the task cannot be executed, can only design the task.

## 注意事项 | Note
> [!Important]
> 请记住,每当EasySpider扩展程序和执行程序更新时,都要更新`EasySpider.crx`和`easyspider_executestage`文件。
> Remember to update the `EasySpider.crx` and `easyspider_executestage` files whenever the EasySpider extension and execution program are updated.

请记住,每当EasySpider扩展程序和执行程序更新时,都要更新`EasySpider.crx`和`easyspider_executestage`文件。

Remember to update the `EasySpider.crx` and `easyspider_executestage` files whenever the EasySpider extension and execution program are updated.

## 环境构建 | Environment Setup

Expand Down
File renamed without changes.
File renamed without changes.
16 changes: 15 additions & 1 deletion ExecuteStage/constants.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from enum import unique, IntEnum
from enum import unique, IntEnum, Enum


@unique
Expand All @@ -25,3 +25,17 @@ class GraphOption(IntEnum):
Custom = 5 # 自定义操作|Custom
Move = 7 # 移动操作|Move
Loop = 8 # 循环操作|Loop


@unique
class Platform(Enum):
Windows = 'Windows'
Linux = 'Linux'
MacOS = 'Darwin'


@unique
class Architecture(Enum):
Bit64 = '64bit'
Bit32 = '32bit'

117 changes: 65 additions & 52 deletions ExecuteStage/easyspider_executestage.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
# import undetected_chromedriver as uc
from utils import detect_optimizable, download_image, extract_text_from_html, get_output_code, isnotnull, lowercase_tags_in_xpath, myMySQL, new_line, \
on_press_creator, on_release_creator, readCode, rename_downloaded_file, replace_field_values, send_email, split_text_by_lines, write_to_csv, write_to_excel, write_to_json
from constants import WriteMode, DataWriteMode, GraphOption
from constants import WriteMode, DataWriteMode, GraphOption, Platform, Architecture
from myChrome import MyChrome
from threading import Thread, Event
from PIL import Image
Expand Down Expand Up @@ -2170,9 +2170,61 @@ def getData(self, param, loopElement, isInLoop=True, parentPath="", index=0):
self.maxViewLength, self.outputParametersRecord)
self.OUTPUT.append(line)


def get_extension_binary_driver_location():
current_system = platform.system()
current_architecture = platform.architecture()[0]
pwd = os.getcwd()
print(f'system info: {current_system}, {current_architecture}')
if current_system == Platform.MacOS.value and current_architecture == Architecture.Bit64.value:
extension_path = "EasySpider.app/Contents/Resources/app/XPathHelper.crx"
binary_location = "EasySpider.app/Contents/Resources/app/chrome_mac64.app/Contents/MacOS/Google Chrome"
driver_location = "EasySpider.app/Contents/Resources/app/chromedriver_mac64"
elif os.path.exists(pwd + "/EasySpider/resources"): # 打包后的路径
print("Finding chromedriver in EasySpider", pwd + "/EasySpider")
extension_path = "EasySpider/resources/app/XPathHelper.crx"
if current_system == Platform.Windows.value and current_architecture == Architecture.Bit32.value:
binary_location = os.path.join(pwd, "EasySpider/resources/app/chrome_win32/chrome.exe")
driver_location = os.path.join(pwd, "EasySpider/resources/app/chrome_win32/chromedriver_win32.exe")
elif current_system == Platform.Windows.value and current_architecture == Architecture.Bit64.value:
binary_location = os.path.join(pwd, "EasySpider/resources/app/chrome_win64/chrome.exe")
driver_location = os.path.join(pwd, "EasySpider/resources/app/chrome_win64/chromedriver_win64.exe")
elif current_system == Platform.Linux.value and current_architecture == Architecture.Bit64.value:
binary_location = "EasySpider/resources/app/chrome_linux64/chrome"
driver_location = "EasySpider/resources/app/chrome_linux64/chromedriver_linux64"
else:
print("Unsupported platform")
sys.exit()
elif os.path.exists(pwd + "/../ElectronJS"): # 软件dev用
print("Finding chromedriver in EasySpider", pwd + "/ElectronJS")
extension_path = "../ElectronJS/XPathHelper.crx"
if current_system == Platform.Windows.value and current_architecture == Architecture.Bit64.value:
binary_location = "../ElectronJS/chrome_win64/chrome.exe"
driver_location = "../ElectronJS/chrome_win64/chromedriver_win64.exe"
elif current_system == Platform.Windows.value and current_architecture == Architecture.Bit32.value:
binary_location = "../ElectronJS/chrome_win32/chrome.exe"
driver_location = "../ElectronJS/chrome_win32/chromedriver_win32.exe"
elif current_system == Platform.Linux.value and current_architecture == Architecture.Bit64.value:
binary_location = "../ElectronJS/chrome_linux64/chrome"
driver_location = "../ElectronJS/chrome_linux64/chromedriver_linux64"
else:
print("Unsupported platform in dev")
sys.exit()
else:
binary_location = "./chrome.exe" # 指定chrome位置
driver_location = "./chromedriver.exe"
extension_path = "XPathHelper.crx"
print(f'extension_path: {extension_path}')
print(f'Chrome location: {binary_location}')
print(f'Chromedriver location: {driver_location}')
return extension_path, binary_location, driver_location


if __name__ == '__main__':
# 如果需要调试程序,请在命令行参数中加入--keyboard 0 来禁用键盘监听以提升调试速度
# If you need to debug the program, please add --keyboard 0 in the command line parameters to disable keyboard listening to improve debugging speed
"""
如果需要调试程序,请在命令行参数中加入--keyboard 0 来禁用键盘监听以提升调试速度
If you need to debug the program, please add --keyboard 0 in the command line parameters to disable keyboard listening to improve debugging speed
"""
config = {
"ids": [0],
"saved_file_name": "",
Expand All @@ -2191,57 +2243,19 @@ def getData(self, param, loopElement, isInLoop=True, parentPath="", index=0):
print(c)
options = webdriver.ChromeOptions()
driver_path = "chromedriver.exe"
print(sys.platform, platform.architecture())

if not os.path.exists(os.getcwd() + "/Data"):
os.mkdir(os.getcwd() + "/Data")
if sys.platform == "darwin" and platform.architecture()[0] == "64bit":
options.binary_location = "EasySpider.app/Contents/Resources/app/chrome_mac64.app/Contents/MacOS/Google Chrome"
options.add_extension(
"EasySpider.app/Contents/Resources/app/XPathHelper.crx")
driver_path = "EasySpider.app/Contents/Resources/app/chromedriver_mac64"
print(driver_path)
if c.config_folder == "":
c.config_folder = os.path.expanduser(
"~/Library/Application Support/EasySpider/")
elif os.path.exists(os.getcwd() + "/EasySpider/resources"): # 打包后的路径
print("Finding chromedriver in EasySpider",
os.getcwd() + "/EasySpider")
if sys.platform == "win32" and platform.architecture()[0] == "32bit":
options.binary_location = os.path.join(
os.getcwd(), "EasySpider/resources/app/chrome_win32/chrome.exe") # 指定chrome位置
driver_path = os.path.join(
os.getcwd(), "EasySpider/resources/app/chrome_win32/chromedriver_win32.exe")
options.add_extension("EasySpider/resources/app/XPathHelper.crx")
elif sys.platform == "win32" and platform.architecture()[0] == "64bit":
options.binary_location = os.path.join(
os.getcwd(), "EasySpider/resources/app/chrome_win64/chrome.exe")
driver_path = os.path.join(
os.getcwd(), "EasySpider/resources/app/chrome_win64/chromedriver_win64.exe")
options.add_extension("EasySpider/resources/app/XPathHelper.crx")
elif sys.platform == "linux" and platform.architecture()[0] == "64bit":
options.binary_location = "EasySpider/resources/app/chrome_linux64/chrome"
driver_path = "EasySpider/resources/app/chrome_linux64/chromedriver_linux64"
options.add_extension("EasySpider/resources/app/XPathHelper.crx")
else:
print("Unsupported platform")
sys.exit()
print("Chrome location:", options.binary_location)
print("Chromedriver location:", driver_path)
elif os.path.exists(os.getcwd() + "/../ElectronJS"):
# 软件dev用
print("Finding chromedriver in EasySpider",
os.getcwd() + "/ElectronJS")
options.binary_location = "../ElectronJS/chrome_win64/chrome.exe" # 指定chrome位置
driver_path = "../ElectronJS/chrome_win64/chromedriver_win64.exe"
options.add_extension("../ElectronJS/XPathHelper.crx")
else:
options.binary_location = "./chrome.exe" # 指定chrome位置
driver_path = "./chromedriver.exe"
options.add_extension("XPathHelper.crx")

options.add_experimental_option(
'excludeSwitches', ['enable-automation']) # 以开发者模式
extension_location, binary_path, driver_path_location = get_extension_binary_driver_location()
options.add_extension(extension_location)
options.binary_location = binary_path
driver_path = driver_path_location
if platform.system() == Platform.MacOS.value and platform.architecture()[0] == Architecture.Bit64.value and \
c.config_folder == "":
c.config_folder = os.path.expanduser("~/Library/Application Support/EasySpider/")

options.add_experimental_option('excludeSwitches', ['enable-automation']) # 以开发者模式

# 总结:
# 0. 带Cookie需要用userdatadir
Expand All @@ -2258,8 +2272,7 @@ def getData(self, param, loopElement, isInLoop=True, parentPath="", index=0):
except:
pass

options.add_argument(
"--disable-blink-features=AutomationControlled") # TMALL 反扒
options.add_argument("--disable-blink-features=AutomationControlled") # TMALL 反扒
# 阻止http -> https的重定向
options.add_argument("--disable-features=CrossSiteDocumentBlockingIfIsolating,CrossSiteDocumentBlockingAlways,IsolateOrigins,site-per-process")
options.add_argument("--disable-web-security") # 禁用同源策略
Expand Down
52 changes: 52 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
.PHONY: all dependency extension chrome chromedriver electron clean_extension clean_electron dev

ROOT_DIR ?= $(shell pwd)
CHROME_DIR ?= /opt/google/chrome
SYSTEM ?= linux64
CHROMEDRIVER_SUFFIX ?= linux64

all: dependency extension chrome chromedriver electron
clean: clean_extension clean_electron
dev: clean extension chrome chromedriver electron

dependency:
@echo "=====> 安装依赖 | Install dependency"
sudo apt-get install libxcb1 libxcb-xinerama0 libxcb-cursor0 libxkbcommon-x11-0
@echo "=====> 安装依赖完成 | Dependency installed finish\n\n\n"

extension:
@echo "=====> 编译浏览器扩展 | Compile the browser extension"
cd Extension/manifest_v3 && npm install
cd $(ROOT_DIR)
@echo "=====> 编译浏览器扩展完成 | Compile the browser extension finish\n\n\n"

chrome:
@echo "=====> 复制Chrome文件夹到ElectronJS/chrome_xxx | Copy the Chrome folder to ElectronJS/chrome_xxx"
cp -rfT $(CHROME_DIR) $(ROOT_DIR)/ElectronJS/chrome_$(SYSTEM)
cp -rf $(ROOT_DIR)/ElectronJS/stealth.min.js $(ROOT_DIR)/ElectronJS/chrome_$(SYSTEM)
cp -rf $(ROOT_DIR)/ElectronJS/execute_${SYSTEM}.sh $(ROOT_DIR)/ElectronJS/chrome_$(SYSTEM)
@echo "=====> 复制Chrome文件夹完成 | Copy the Chrome folder finish\n\n\n"

chromedriver:
@echo "=====> 获取Chromedriver | Get Chromedriver"
cp -f $(CHROMEDRIVER_PATH) $(ROOT_DIR)/ElectronJS/chrome_$(SYSTEM)/chromedriver_$(CHROMEDRIVER_SUFFIX)
@echo "=====> 获取Chromedriver完成 | Get Chromedriver\n\n\n"

electron:
@echo "=====> 编译 ElectronJS | Compile the ElectronJS"
cd ElectronJS && npm install
cd ElectronJS/node_modules/electron/dist && sudo sudo chown root:root chrome-sandbox && sudo chmod 4755 chrome-sandbox
@echo "你可以去 $(ROOT_DIR)/ElectronJS 目录下运行\`npm run start_direct\`命令启动主程序"
@echo "You can go to $(ROOT_DIR)/ElectronJS directory and run\`npm run start_direct\` to start the main program."

clean_extension:
@echo "=====> 清理浏览器扩展 | Clean the browser extension"
rm -rf $(ROOT_DIR)/Extension/manifest_v3/node_modules
rm -rf $(ROOT_DIR)/Extension/manifest_v3/package-lock.json
@echo "=====> 清理浏览器扩展完成 | Clean the browser extension finish\n\n\n"

clean_electron:
@echo "=====> 清理 ElectronJS | Clean the ElectronJS"
rm -rf $(ROOT_DIR)/ElectronJS/node_modules
rm -rf $(ROOT_DIR)/ElectronJS/package-lock.json
@echo "=====> 清理 ElectronJS 完成 | Clean the ElectronJS finish\n\n\n"