파이썬 행안부 보도자료 사이트 스크래핑하기. 이번달 보도자료만 제목은 엑셀에 저장. 상태페이지 이동

행안부 보도자료 사이트에서 이번달의 등록된 보도자료의 제목만 추출하는 코드를 구현하였습니다.

사용하면 사용할수록 놀라움을 주는 openai chat를 이용했습니다.

https://chat.openai.com/chat

이런 형식으로 코드를 구현하면 된다는 것이고

클래스 명들은 임시로 부여된 이름들입니다.

필요한 부분들만 잘 이용한다면 훌륭한 프로그램도 쉽게 만들 수 있을 거 같습니다.

import requests
from bs4 import BeautifulSoup
import datetime

# Set the URL of the website and the current month
url = "https://www.mois.go.kr/frt/bbs/type010/commonSelectBoardList.do?bbsId=BBSMSTR_000000000008"
current_month = datetime.datetime.now().month

# Use requests to fetch the page
response = requests.get(url)

# Parse the page using BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser')

# Find all the press release elements on the page
press_releases = soup.find_all('div', class_='board_list_item')

# Loop through the press releases and find the ones from the current month
for press_release in press_releases:
  date_string = press_release.find('span', class_='board_list_date').text
  date = datetime.datetime.strptime(date_string, '%Y-%m-%d')
  if date.month == current_month:
    # If the press release is from the current month, extract the title and link
    title = press_release.find('a', class_='board_list_subject').text
    link = press_release.find('a', class_='board_list_subject')['href']
    # Use requests to download the attached file from the detail page
    detail_response = requests.get(link)
    detail_soup = BeautifulSoup(detail_response.text, 'html.parser')
    file_link = detail_soup.find('a', class_='file')['href']
    file_response = requests.get(file_link)
    # Save the file to disk
    with open(title + '.pdf', 'wb') as f:
      f.write(file_response.content)

경제적자유

파이썬 행안부 보도자료 사이트 스크래핑하기. 이번달 보도자료만 제목은 엑셀에 저장. 상태페이지 이동

티스토리툴바