A powerful Streamlit web application that intelligently scrapes and summarizes articles from the Google AI Blog, featuring automated content processing, image management, and AI-powered insights.
- 📄 Content Processing → Downloads and structures article content with metadata
- 🖼️ Image Management → Automatically retrieves and organizes article images
- ⚙️ Flexible Configuration → Easy setup through YAML configuration files
- 🔍 Advanced Search → Powerful filtering and discovery tools
- 📊 Content Analytics → Insights into your scraped content library
content-inspiration/
│
├── 📋 config/
│ └── config.yaml # Application configuration
│
├── 📁 data/
│ ├── processed/ # Processed article storage
│ └── raw/ # Raw scraped links
│
├── 🖼️ images/ # Downloaded article images
│
├── 📝 logs/ # Application logs
│
├── 🔧 src/
│ ├── utils/ # Core utility modules
│ └── websites/ # Scraping logic & main app
│
├── 🚀 main.py # Streamlit application entry point
├── 📋 requirements.txt # Python dependencies
└── 🏃♂️ run_app.bat # Windows launch script
| Component | Requirement | Installation |
|---|---|---|
| 🐍 Python | 3.10+ | Download Here |
| 🦙 Ollama | Latest | Install Guide |
| 🌐 Environment | Variables | Configuration needed |
git clone https://github.com/AdemCE-eng/Content_Inspiration.git
cd content_inspiration# Create virtual environment
python -m venv venv
# Activate environment
# Windows:
venv\Scripts\activate
# macOS/Linux:
source venv/bin/activatepip install -r requirements.txtCreate .env file in project root:
USER_AGENT="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"Note: Replace with your actual browser's user agent string. You can find this by searching "what is my user agent" in your browser.
# Install required model
ollama pull mistral
# Service starts automatically when needed|
🖥️ Command Line streamlit run main.py |
🪟 Windows Batch run_app.bat |
- 🌐 Access Interface → Navigate to
http://localhost:8501 - ⚡ Run Pipeline → Execute 4-step scraping process via sidebar
- Step 1 → Scrape article links from Google AI Blog
- Step 2 → Download article content and metadata
- Step 3 → Download and organize images
- Step 4 → Generate AI summaries
- 🔍 Explore Content → Use interactive features for content discovery
| Setting Category | Description |
|---|---|
| 📁 Storage Settings | Configure data paths, image storage locations, and log file destinations |
| 🌐 Source URLs | Define target websites (Google AI Blog) and scraping endpoints |
| 🤖 AI Model Settings | Set up Ollama configuration, model selection (mistral default), and processing timeouts |
| 🎨 UI Preferences | Customize articles per page and interface settings |
To use a different AI model, modify config/config.yaml:
ollama:
model: "your-preferred-model" # Change from default 'mistral'| ❌ Problem | ✅ Solution |
|---|---|
| Ollama Connection Failed | Ensure Ollama CLI is installed and model is pulled (ollama pull mistral) |
| User Agent Blocked | Update .env with current browser user agent string |
| File Permission Denied | Check write permissions for data/ and images/ directories |
| Module Import Error | Reinstall dependencies: pip install -r requirements.txt |
| Port Already in Use | Change Streamlit port: streamlit run main.py --server.port 8502 |
- Check logs in
logs/directory - Verify Ollama service status:
ollama list - Test user agent at:
httpbin.org/user-agent
We welcome contributions! Here's how to get started:
# 1. Fork the repository on GitHub
# 2. Clone your fork
git clone https://github.com/YOUR-USERNAME/Content_Inspiration.git
# 3. Create feature branch
git checkout -b feature/amazing-feature
# 4. Make your changes and commit
git commit -m 'Add amazing feature'
# 5. Push to your fork
git push origin feature/amazing-feature
# 6. Create Pull RequestThis project is licensed under the MIT License
See the LICENSE file for full details
🏢 Google AI Blog → For providing excellent technical content
🦙 Ollama Team → For local AI model infrastructure
🎨 Streamlit → For the intuitive web framework
🐍 Python Community → For the amazing ecosystem of libraries