Skip to content

jo-cho/policy_download

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Economic Policy Data Web Crawling

Data Crawling (ver1)

  • (KDI EIEC) Collection of Korean economic policy documents (ID, title, publisher, date, content summary) (link to code)
  • (KDI EIEC) Collection of Korean and international research papers (ID, title, publisher, date, content summary) (link to code)
  • Collection of U.S. Department of the Treasury Press Releases (ID, title, category, date, content) (link to code)
  • Collection of Ministere de L'Economie des Finances Presse (ID, title, category, date, content) (link to code)

Data Crawling and Raw PDF Download (ver2)

  • KDI EIEC Collection of economic policy documents (ID, title, publisher, date, number of pages, file names 1–5, download links 1–5) (link to code)
    • Raw document PDF and HWP files available for download
  • KDI EIEC Collection of domestic and international research papers (ID, title, publisher, date, number of pages, file names 1–5, download links 1–5) (link to code)

Automatic Summarization (in progress)

  • PDF text extraction
  • PDF text tokenization
  • Summarization and Excel file download
  • Summarization models (e.g., GPT-4 API) not yet integrated

경제 정책 데이터 크롤링

데이터 크롤링(ver1)

데이터 크롤링 및 원자료 PDF 다운로드 (ver2)

  • KDI EIEC 경제정책자료 수집 (ID, 제목, 발간처, 날짜, 페이지수, 파일명1,2,3,4,5, 다운로드링크1,2,3,4,5) (파일 바로가기)
    • 원문자료 PDF, HWP 파일 다운로드 가능
  • KDI EIEC 국내외연구자료 수집 (ID, 제목, 발간처, 날짜, 페이지수, 파일명1,2,3,4,5, 다운로드링크1,2,3,4,5) (파일 바로가기)

자동 요약 (in progress)

  • PDF 텍스트 추출
  • PDF 텍스트 토큰화(tokenize)
  • 요약 후 엑셀 다운로드
  • 요약 모델(ex. gpt4) 미탑재

About

Web Scraping of Economic Policy Data

Resources

License

Stars

Watchers

Forks

Contributors

Languages