The Intelligent Audio Engine for the Semantic Web
AudiText is a next-generation AI-Powered Audio Reader that transforms the static web into immersive, high-fidelity audio experiences. By leveraging advanced Natural Language Processing (NLP) and state-of-the-art Text-to-Speech (TTS), it doesn't just read text—it understands context, declutters noise, and delivers a studio-quality listening experience.
Key Features • AI Capabilities • Tech Stack • Security • Quick Start
In an era of information overload, AudiText serves as your intelligent filter. Unlike standard screen readers that blindly recite metadata and ads, AudiText uses a bespoke Smart Polish Layer to semantically analyze content structure. It identifies the core narrative, strips away "hashtag spam" and repetitive headers, and synthesizes the remaining essence into fluid, human-like speech.
Whether you're commuting with a long-form article or multitasking with a Twitter thread, AudiText ensures you consume knowledge, not noise.
- Semantic Text Extraction: Automatically parses complex DOM structures from X (Twitter), Medium, Substack, and more.
- AI-Driven Polish Layer:
- Contextual Cleanup: Eliminates "clickbait" hooks, hashtags, and repetitive boilerplate.
- Smart Intro Generation: Synthesizes professional intros ("Title, by Author") even when metadata is sparse.
- Deduplication Engine: Detects and suppresses redundant information for seamless flow.
- Native Neural TTS: Leverages the browser's built-in Web Speech API for unlimited, offline-capable speech synthesis without API quotas.
- Clean Player Interface: Minimalist, bottom-aligned controls optimized for one-handed mobile use.
- Dynamic Speed Control: Variable playback rates (0.5x - 2.5x) with pitch correction.
- Deep Linking & Sharing: Share articles with
?share=URL parameters for instant playback.
- Row Level Security (RLS): Database policies strictly enforce data sovereignty—users can only access their own library items.
- Input Hardening: Advanced sanitization prevents SQL/Command injection and XSS attacks via URL inputs.
- Auth Integrity: Robust localized authentication handling via Supabase Auth.
- "Reactive Noir" Aesthetic: A cohesive design language featuring glassmorphism, adaptive film grain (noise), and procedural gradients.
- Mobile-First Progressive Web App (PWA): Touch-optimized scrub bars, haptic feedback integration, and 60fps animations on mobile devices.
- Interactive DotGrid Background: GPU-accelerated particle effect with click-to-ripple interaction.
- React.memo Optimization: All heavy components (DotGrid, Noise, SwipeableItem) are memoized to prevent unnecessary re-renders.
- GPU-Accelerated Animations: CSS animations use
transform: translateZ(0)andwill-changehints for buttery 60fps performance. - Spatial Partitioning: DotGrid uses O(1) spatial grid lookup instead of O(n) for efficient hover detection.
- Optimized Bundle: ~656 KB total (148 KB gzipped) with vendor chunk splitting.
The system uses a Dual-Layer Extraction Pipeline to ensure reliability even when AI credits are exhausted.
graph TD
Design[Figma Design] -.-> |"AI Generation (90% Fidelity)"| Components
User[User / PWA] -->|1. Paste URL| Edge[Supabase Edge Function]
User -->|Listen| BrowserTTS[Browser Native TTS]
User -->|Sync| DB[(Supabase Database)]
subgraph Backend [Edge Function: extract-content]
Edge -->|Fetch Raw HTML| Jina[Jina AI Reader]
Edge -->|Clean Text| AI_Logic{Has Credits?}
AI_Logic -->|Yes| Gemini[Google Gemini 2.0]
AI_Logic -->|No| Manual[Robust Regex Cleaner]
end
subgraph Frontend [React + Vite + Framer Motion]
Components[React Components]
Store[Local Storage] <-->|Cache| State[Audio Context]
Components -.-> State
State -->|Audio Data| Visuals
subgraph Visuals [Visual Engine]
Bits[react-bits / DotGrid]
Shimmer[GPU-Accelerated Shimmer]
end
end
| Technology | Role |
|---|---|
| React 19 | UI Library with modern hooks architecture |
| TypeScript | Strict static typing for robustness |
| Vite | Next-gen frontend tooling and bundling |
| Technology | Role |
|---|---|
| Framer Motion | Physics-based UI animations |
| GSAP (GreenSock) | Commercial-grade transitions for DotGrid |
| Custom Canvas | GPU-accelerated DotGrid with spatial partitioning |
| Lucide React | Consistent, lightweight iconography |
| Technology | Role |
|---|---|
| Supabase (PostgreSQL) | Relational database with real-time subscriptions |
| Supabase Auth | User management and secure session handling |
| Supabase Edge Functions | Serverless content extraction |
| Row Level Security (RLS) | Database-level access control policies |
| Technology | Role |
|---|---|
| Jina AI Reader | URL to clean Markdown extraction |
| Google Gemini 2.0 | Content cleaning and formatting |
- Node.js 18+
- npm or yarn
-
Clone the Repository
git clone https://github.com/nabrahma/AudiText.git cd AudiText -
Environment Setup Create a
.envfile in the root directory:cp .env.example .env
Populate it with your credentials:
VITE_SUPABASE_URL=your_supabase_url VITE_SUPABASE_ANON_KEY=your_supabase_anon_key
-
Install Dependencies
npm install
-
Launch Development Server
npm run dev
We welcome contributions from the community! Whether it's enhancing the AI parsing logic or adding new visual effects.
- Fork the repository.
- Create a feature branch (
git checkout -b feature/EnhancedTTS). - Commit your changes with clear messages (
git commit -m 'feat: Add voice selection'). - Push to the branch (
git push origin feature/EnhancedTTS). - Open a Pull Request.
The core AudiText experience (Content Extraction + Native Browser TTS) requires minimal setup. However, the backend infrastructure supports advanced capabilities if you wish to enable them.
| Variable Name | Service | Status | Purpose |
|---|---|---|---|
JINA_API_KEY |
Jina.ai | Required | Essential for converting raw URLs into clean Markdown. |
GEMINI_API_KEY |
Google Gemini | Recommended | Greatly improves article cleaning and formatting. |
Distributed under the MIT License. See LICENSE for more information.
Built with 🧠 + ❤️ by Nabaskar


