Skip to content

strangertomycode/ai_web_scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Web Scraper

An intelligent web scraping tool that extracts specific information from websites using natural language descriptions. Simply describe what data you want to extract, and the AI will find it for you.

Features

  • Natural Language Parsing: Describe what you want to extract in plain English
  • Interactive Web Interface: Built with Streamlit for easy use
  • AI-Powered Extraction: Uses Ollama with Gemma3 model for intelligent content parsing
  • Content Preview: View cleaned DOM content before parsing
  • Batch Processing: Handles large websites by chunking content

Prerequisites

  • Python 3.8+
  • Chrome browser installed
  • Ollama installed with Gemma3 model (Can change the model as you wish)

Installation

  1. Clone the repository:
git clone <repo_url>
cd ai_web_scraper
  1. Install dependencies:
pip install -r requirements.txt
  1. Install and run Ollama with Gemma3:
# Install Ollama from https://ollama.ai
ollama pull gemma3

Usage

  1. Start the application:
streamlit run ai_scraper.py
  1. Enter a website URL
  2. Click "Scrape Website" to extract content
  3. Describe what information you want to parse (e.g., "all email addresses", "product prices", "contact information")
  4. Click "Parse Content" to get AI-extracted results

Example Use Cases

  • Extract contact information from business websites
  • Gather product details from e-commerce sites
  • Collect news headlines and summaries
  • Parse job listings for specific requirements
  • Extract research paper abstracts

Tech Stack

  • Frontend: Streamlit
  • Web Scraping: Selenium, BeautifulSoup
  • AI Processing: LangChain + Ollama (Gemma3)
  • Language: Python

About

An AI-powered web scraper that lets users extract data from websites using plain English instructions. It combines Selenium, LangChain, and Ollama’s any model with a simple Streamlit interface.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages