Skip to content

Commit

Permalink
Merge branch 'main' into company_name
Browse files Browse the repository at this point in the history
  • Loading branch information
cullenwatson authored Aug 4, 2024
2 parents 50c1007 + 86c5849 commit c338ba3
Showing 1 changed file with 57 additions and 40 deletions.
97 changes: 57 additions & 40 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ account = LinkedInAccount(
password="mypassword",
solver_api_key="CAP-6D6A8CE981803A309A0D531F8B4790BC", # optional but needed if hit with captcha
solver_service=SolverType.CAPSOLVER,

session_file=str(session_file), # save login cookies to only log in once (lasts a week or so)
log_level=1, # 0 for no logs
)
Expand All @@ -56,68 +56,89 @@ users.to_csv("users.csv", index=False)

If you rather use a browser to log in, install the browser add-on to StaffSpy .

```pip install staffspy[browser]```
`pip install staffspy[browser]`

Do not pass the ```username``` & ```password``` params, then a browser will open to sign in to LinkedIn on the first sign-in. Press enter after signing in to begin scraping.
Do not pass the `username` & `password` params, then a browser will open to sign in to LinkedIn on the first sign-in. Press enter after signing in to begin scraping.

### Output
| profile_id | name | first_name | last_name | location | age | position | followers | connections | premium | company | past_company1 | past_company2 | school | extra_school | skill1 | skill2 | skill3 | is_connection | premium | creator | potential_email | profile_link | profile_photo |
|----------------|---------------|------------|-----------|------------------------------------------|-----|--------------------------------------------|-----------|-------------|---------|---------|---------------|---------------|-----------------------------------------------|-------------------------------|-----------|-------------|------------|---------------|----------|---------|----------------------------------------------|---------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------|
| javiersierra2102 | Javier Sierra | Javier | Sierra | London, England, United Kingdom | 39 | Software Engineer | 735 | 725 | FALSE | OpenAI | Meta | Oculus VR | Hult International Business School | Universidad Simón Bolívar | Java | JavaScript | C++ | FALSE | FALSE | FALSE | [email protected], [email protected] | https://www.linkedin.com/in/javiersierra2102 | https://media.licdn.com/dms/image/C4D03AQHEyUg1kGT08Q/profile-displayphoto-shrink_800_800/0/1516504680512?e=1727913600&v=beta&t=3enCmNDBtJ7LxfbW6j1hDD8qNtHjO2jb2XTONECxUXw |
| dougli | Douglas Li | Douglas | Li | London, England, United Kingdom | 37 | @ OpenAI UK, previously at Meta | 583 | 401 | FALSE | OpenAI | Shift Lab | Facebook | Washington University in St. Louis | | Java | Python | JavaScript | FALSE | TRUE | FALSE | [email protected], [email protected] | https://www.linkedin.com/in/dougli | https://media.licdn.com/dms/image/D4E03AQETmRyb3_GB8A/profile-displayphoto-shrink_800_800/0/1687996628597?e=1727913600&v=beta&t=HRYGJ4RxsTMcPF1YcSikXlbz99hx353csho3PWT6fOQ |
| nkartashov | Nick Kartashov| Nick | Kartashov | London, England, United Kingdom | 33 | Software Engineer | 2186 | 2182 | TRUE | OpenAI | Google | DeepMind | St. Petersburg Academic University | Bioinformatics Institute | Teamwork | Java | Haskell | FALSE | FALSE | FALSE | [email protected], [email protected] | https://www.linkedin.com/in/nkartashov | https://media.licdn.com/dms/image/D4E03AQEjOKxC5UgwWw/profile-displayphoto-shrink_800_800/0/1680706122689?e=1727913600&v=beta&t=m-JnG9nm0zxp1Z7njnInwbCoXyqa3AN-vJZntLfbzQ4 |

| profile_id | name | first_name | last_name | location | age | position | followers | connections | premium | company | past_company1 | past_company2 | school | extra_school | skill1 | skill2 | skill3 | is_connection | premium | creator | potential_email | profile_link | profile_photo |
| ---------------- | -------------- | ---------- | --------- | ------------------------------- | --- | ------------------------------- | --------- | ----------- | ------- | ------- | ------------- | ------------- | ---------------------------------- | ------------------------- | -------- | ---------- | ---------- | ------------- | ------- | ------- | ------------------------------------------------ | -------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| javiersierra2102 | Javier Sierra | Javier | Sierra | London, England, United Kingdom | 39 | Software Engineer | 735 | 725 | FALSE | OpenAI | Meta | Oculus VR | Hult International Business School | Universidad Simón Bolívar | Java | JavaScript | C++ | FALSE | FALSE | FALSE | [email protected], [email protected] | https://www.linkedin.com/in/javiersierra2102 | https://media.licdn.com/dms/image/C4D03AQHEyUg1kGT08Q/profile-displayphoto-shrink_800_800/0/1516504680512?e=1727913600&v=beta&t=3enCmNDBtJ7LxfbW6j1hDD8qNtHjO2jb2XTONECxUXw |
| dougli | Douglas Li | Douglas | Li | London, England, United Kingdom | 37 | @ OpenAI UK, previously at Meta | 583 | 401 | FALSE | OpenAI | Shift Lab | Facebook | Washington University in St. Louis | | Java | Python | JavaScript | FALSE | TRUE | FALSE | [email protected], [email protected] | https://www.linkedin.com/in/dougli | https://media.licdn.com/dms/image/D4E03AQETmRyb3_GB8A/profile-displayphoto-shrink_800_800/0/1687996628597?e=1727913600&v=beta&t=HRYGJ4RxsTMcPF1YcSikXlbz99hx353csho3PWT6fOQ |
| nkartashov | Nick Kartashov | Nick | Kartashov | London, England, United Kingdom | 33 | Software Engineer | 2186 | 2182 | TRUE | OpenAI | Google | DeepMind | St. Petersburg Academic University | Bioinformatics Institute | Teamwork | Java | Haskell | FALSE | FALSE | FALSE | [email protected], [email protected] | https://www.linkedin.com/in/nkartashov | https://media.licdn.com/dms/image/D4E03AQEjOKxC5UgwWw/profile-displayphoto-shrink_800_800/0/1680706122689?e=1727913600&v=beta&t=m-JnG9nm0zxp1Z7njnInwbCoXyqa3AN-vJZntLfbzQ4 |

### Parameters for `LinkedInAccount()`

```plaintext
Optional
├── session_file (str):
| file path to save session cookies, so only one manual login is needed.
| can use mult profiles this way
|
| For automated login
├── username (str):
| linkedin account email
├── password (str):
| linkedin account password
|
├── solver_service (SolverType):
| solves the captcha using the desired service - either CapSolver, or 2Captcha (worse of the two)
|
├── solver_api_key (str):
| api key for the solver provider
├── log_level (int):
| Controls the verbosity of the runtime printouts
| (0 prints only errors, 1 is info, 2 is all logs. Default is 0.)
```

### Parameters for `scrape_staff()`

```plaintext
Optional
├── company_name (str):
Optional
├── company_name (str):
| company identifier on linkedin, will search for that company if that company id does not exist
| e.g. openai from https://www.linkedin.com/company/openai
|
├── user_id (str):
| alternative to company_name, provide user identifier on linkedin, will find this user's company and then proceed
├── user_id (str):
| alternative to company_name, provide user identifier on linkedin, will scrape this user's company
| e.g. dougmcmillon from https://www.linkedin.com/in/dougmcmillon
|
├── search_term (str):
├── search_term (str):
| staff title to search for
| e.g. software engineer
|
├── location (str):
├── location (str):
| location the staff resides
| e.g. london
├── extra_profile_data (bool)
| fetches educations, experiences, skills, certifications (Default false)
├── max_results (int):
├── max_results (int):
| number of staff to fetch, default/max is 1000 for a search imposed by LinkedIn
├── session_file (str):
| file path to save session cookies, so only one manual login is needed.
| can use mult profiles this way
├── username (str):
| linkedin account email
├── password (str):
| linkedin account password
|
├── solver_service (SolverType):
| solves the captcha using the desired service - either CapSolver, or 2Captcha (worse of the two)
|
├── solver_api_key (str):
| api key for the solver provider
├── log_level (int):
| Controls the verbosity of the runtime printouts
| (0 prints only errors, 1 is info, 2 is all logs. Default is 0.)
```

### Parameters for `scrape_users()`

```plaintext
├── user_ids (list):
| user ids to scrape from
| e.g. dougmcmillon from https://www.linkedin.com/in/dougmcmillon
```

### LinkedIn notes

- only 1000 max results per search
- extra_profile_data increases runtime by O(n)
- if rate limited, the program will stop scraping
- if using non-browser sign in, turn off 2fa


### Staff Schema

```plaintext
Staff
├── Personal Information
Expand Down Expand Up @@ -175,9 +196,7 @@ Staff
├── school
└── degree
```
### LinkedIn notes
- only 1000 max results per search
- extra_profile_data increases runtime by O(n)
---

## Frequently Asked Questions

Expand All @@ -196,5 +215,3 @@ Staff
**Q: Encountering issues with your queries?**
**A:** If problems
persist, [submit an issue](https://github.com/cullenwatson/StaffSpy/issues).

---

0 comments on commit c338ba3

Please sign in to comment.