Skip to content

AdemCE-eng/Content_Inspiration

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🌟 Content Inspiration

A powerful Streamlit web application that intelligently scrapes and summarizes articles from the Google AI Blog, featuring automated content processing, image management, and AI-powered insights.

Python Streamlit Ollama MIT License


✨ Key Features

Smart Collection

Smart Collection

Automatically discovers and extracts article links from Google AI Blog with intelligent parsing

AI Processing

AI-Powered Processing

Generates comprehensive summaries using local Ollama models for enhanced content understanding

Interactive Dashboard

Interactive Dashboard

Browse, search, and filter content through an intuitive Streamlit interface

🚀 Core Capabilities

  • 📄 Content Processing → Downloads and structures article content with metadata
  • 🖼️ Image Management → Automatically retrieves and organizes article images
  • ⚙️ Flexible Configuration → Easy setup through YAML configuration files
  • 🔍 Advanced Search → Powerful filtering and discovery tools
  • 📊 Content Analytics → Insights into your scraped content library

🏗️ Project Architecture

content-inspiration/
│
├── 📋 config/
│   └── config.yaml              # Application configuration
│
├── 📁 data/
│   ├── processed/               # Processed article storage
│   └── raw/                     # Raw scraped links
│
├── 🖼️ images/                   # Downloaded article images
│
├── 📝 logs/                     # Application logs
│
├── 🔧 src/
│   ├── utils/                   # Core utility modules
│   └── websites/                # Scraping logic & main app
│
├── 🚀 main.py                   # Streamlit application entry point
├── 📋 requirements.txt          # Python dependencies
└── 🏃‍♂️ run_app.bat               # Windows launch script

🛠️ Installation Guide

Prerequisites

Component Requirement Installation
🐍 Python 3.10+ Download Here
🦙 Ollama Latest Install Guide
🌐 Environment Variables Configuration needed

Quick Setup

1️⃣ Clone Repository

git clone https://github.com/AdemCE-eng/Content_Inspiration.git
cd content_inspiration

2️⃣ Environment Setup

# Create virtual environment
python -m venv venv

# Activate environment
# Windows:
venv\Scripts\activate
# macOS/Linux:
source venv/bin/activate

3️⃣ Install Dependencies

pip install -r requirements.txt

4️⃣ Configure Environment

Create .env file in project root:

USER_AGENT="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"

Note: Replace with your actual browser's user agent string. You can find this by searching "what is my user agent" in your browser.

5️⃣ Setup Ollama

# Install required model
ollama pull mistral

# Service starts automatically when needed

🎯 Usage Instructions

Launch Application

🖥️ Command Line

streamlit run main.py

🪟 Windows Batch

run_app.bat

📋 Step-by-Step Process

  1. 🌐 Access Interface → Navigate to http://localhost:8501
  2. ⚡ Run Pipeline → Execute 4-step scraping process via sidebar
    • Step 1 → Scrape article links from Google AI Blog
    • Step 2 → Download article content and metadata
    • Step 3 → Download and organize images
    • Step 4 → Generate AI summaries
  3. 🔍 Explore Content → Use interactive features for content discovery

⚙️ Configuration

Main Configuration (config/config.yaml)

Setting Category Description
📁 Storage Settings Configure data paths, image storage locations, and log file destinations
🌐 Source URLs Define target websites (Google AI Blog) and scraping endpoints
🤖 AI Model Settings Set up Ollama configuration, model selection (mistral default), and processing timeouts
🎨 UI Preferences Customize articles per page and interface settings

🔧 Custom Model Configuration

To use a different AI model, modify config/config.yaml:

ollama:
  model: "your-preferred-model"  # Change from default 'mistral'

🚨 Troubleshooting

Problem Solution
Ollama Connection Failed Ensure Ollama CLI is installed and model is pulled (ollama pull mistral)
User Agent Blocked Update .env with current browser user agent string
File Permission Denied Check write permissions for data/ and images/ directories
Module Import Error Reinstall dependencies: pip install -r requirements.txt
Port Already in Use Change Streamlit port: streamlit run main.py --server.port 8502

🔍 Debug Tips

  • Check logs in logs/ directory
  • Verify Ollama service status: ollama list
  • Test user agent at: httpbin.org/user-agent

🤝 Contributing

We welcome contributions! Here's how to get started:

Development Workflow

# 1. Fork the repository on GitHub

# 2. Clone your fork
git clone https://github.com/YOUR-USERNAME/Content_Inspiration.git

# 3. Create feature branch
git checkout -b feature/amazing-feature

# 4. Make your changes and commit
git commit -m 'Add amazing feature'

# 5. Push to your fork
git push origin feature/amazing-feature

# 6. Create Pull Request

📄 License

This project is licensed under the MIT License

See the LICENSE file for full details


🙏 Acknowledgments

Special Thanks

🏢 Google AI BlogFor providing excellent technical content

🦙 Ollama TeamFor local AI model infrastructure

🎨 StreamlitFor the intuitive web framework

🐍 Python CommunityFor the amazing ecosystem of libraries


Built with ❤️ and lots of ☕



⭐ Star this repo🐛 Report Bug💡 Request Feature

About

A Streamlit-based Python app that scrapes and locally summarizes articles from the Google AI Blog using a local Ollama LLM.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors