From de32fc1f82f617ec60f4ef998a092e1335610d7e Mon Sep 17 00:00:00 2001 From: lokesh-univest Date: Wed, 17 Dec 2025 12:18:24 +0530 Subject: [PATCH 1/2] Add docs --- docs/README.md | 29 ++++++++++ docs/architecture/flows.md | 98 +++++++++++++++++++++++++++++++++ docs/architecture/overview.md | 66 ++++++++++++++++++++++ docs/modules/configuration.md | 29 ++++++++++ docs/modules/core_logic.md | 41 ++++++++++++++ docs/modules/data_models.md | 33 +++++++++++ docs/modules/llm_integration.md | 38 +++++++++++++ docs/modules/utils.md | 17 ++++++ 8 files changed, 351 insertions(+) create mode 100644 docs/README.md create mode 100644 docs/architecture/flows.md create mode 100644 docs/architecture/overview.md create mode 100644 docs/modules/configuration.md create mode 100644 docs/modules/core_logic.md create mode 100644 docs/modules/data_models.md create mode 100644 docs/modules/llm_integration.md create mode 100644 docs/modules/utils.md diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 00000000..557e67a2 --- /dev/null +++ b/docs/README.md @@ -0,0 +1,29 @@ +# AIHawk - Technical Documentation + +Welcome to the developer documentation for **Jobs_Applier_AI_Agent_AIHawk**. This documentation is designed to help developers understand the architecture, core modules, and workflows of the application. + +## 📚 Table of Contents + +### [Architecture & Flows](architecture/overview.md) +- **[System Overview](architecture/overview.md)**: High-level architecture, tech stack, and component interactions. +- **[Application Flows](architecture/flows.md)**: Visual diagrams (Mermaid) of startup, resume generation, and parsing flows. + +### [Module Documentation](modules/core_logic.md) +- **[Core Logic](modules/core_logic.md)**: Entry points, orchestration, and main application logic. +- **[LLM Integration](modules/llm_integration.md)**: AI model adapters, prompt engineering, and LLM management. +- **[Data Models](modules/data_models.md)**: Resume structures, profile schemas, and data validation. +- **[Configuration](modules/configuration.md)**: Configuration handling, validation, and secrets management. +- **[Utilities](modules/utils.md)**: Shared utility functions and helpers. + +## 🚀 Quick Start + +Ensure you have the required dependencies and configuration files set up as per the main project [README](../README.md). + +```bash +# Run the application +python main.py +``` + +## 🤝 Contribution + +Please refer to [CONTRIBUTING.md](../CONTRIBUTING.md) for guidelines on how to contribute to this project. diff --git a/docs/architecture/flows.md b/docs/architecture/flows.md new file mode 100644 index 00000000..4eb2dc22 --- /dev/null +++ b/docs/architecture/flows.md @@ -0,0 +1,98 @@ +# Application Flows + +This document details the key workflows within the AIHawk application using Mermaid diagrams. + +## 1. App Startup Flow + +The application initialization process ensures all configurations and dependencies are ready before user interaction. + +```mermaid +graph TD + Start([Start main.py]) --> ValidateData[Validate Data Folder & Files] + ValidateData -->|Check| Secrets[secrets.yaml] + ValidateData -->|Check| Config[config.yaml] + ValidateData -->|Check| Resume[plain_text_resume.yaml] + + Secrets -->|Validate| LoadSecrets[Load API Keys] + Config -->|Validate| LoadConfig[Load User Preferences] + + LoadSecrets --> PromptUser[Prompt User for Action] + LoadConfig --> PromptUser + + PromptUser -->|Select Action| HandleInquiries[Handle Inquiries] +``` + +## 2. Resume Parsing & Tailoring Flow + +How the system takes a specific job URL and tailored a resume for it. + +```mermaid +sequenceDiagram + participant User + participant Facade as ResumeFacade + participant Browser as Selenium Browser + participant Parser as LLMJobParser + participant LLM as LLM Service + participant Generator as ResumeGenerator + + User->>Facade: Select "Tailor Resume" + Facade->>User: Request Job URL + User->>Facade: Provide URL + + Facade->>Browser: Navigate to Job URL + Browser->>Facade: Return Page HTML + + Facade->>Parser: Parse HTML + Parser->>LLM: Extract Role, Company, Description + LLM-->>Parser: Structured Job Data + + Facade->>Generator: Generate Tailored Resume + Generator->>LLM: Compare Resume vs Job Desc + LLM-->>Generator: Contextual Suggestions + + Generator->>Browser: Render HTML Template + Browser->>Facade: Return PDF Bytes + Facade->>User: Save PDF to Output +``` + +## 3. General Resume Generation Flow + +Generating a generic resume without specific job tailoring. + +```mermaid +graph TD + User[User Input] -->|Select Style| StyleManager[Style Manager] + StyleManager -->|Template Path| Generator[Resume Generator] + + subgraph Generation Process + ResumeData[Load Resume Data] -->|Inject| Generator + Generator -->|Render| HTML[HTML Resume] + HTML -->|Convert| PDF[PDF Generator (Selenium)] + end + + PDF --> Output[Output Folder] +``` + +## 4. LLM Request Lifecycle + +How the system handles requests to the Large Language Model, including logging and error handling. + +```mermaid +graph LR + Request[App Request] --> Adapter[AI Adapter] + Adapter -->|Select Provider| ModelFactory{Provider?} + + ModelFactory -->|OpenAI| OpenAI[OpenAI Model] + ModelFactory -->|Claude| Claude[Claude Model] + ModelFactory -->|Ollama| Ollama[Ollama Model] + + OpenAI --> API[External API] + Claude --> API + Ollama --> Local[Local Inference] + + API -->|Response| Logger[LLM Logger] + Local -->|Response| Logger + + Logger -->|Log Token Usage| LogFile[open_ai_calls.json] + Logger -->|Return Content| App[Application Logic] +``` diff --git a/docs/architecture/overview.md b/docs/architecture/overview.md new file mode 100644 index 00000000..3653faf9 --- /dev/null +++ b/docs/architecture/overview.md @@ -0,0 +1,66 @@ +# System Architecture Overview + +## Introduction +**Jobs_Applier_AI_Agent_AIHawk** is an automated tool designed to streamline the job application process. It leverages Large Language Models (LLMs) to parse job descriptions, tailor resumes and cover letters, and automate interactions via a web browser. + +## High-Level Architecture + +The system operates on a modular architecture where the **Core Controller** orchestrates interactions between the **User**, **LLM Service**, **Browser Automation**, and **Data Layer**. + +```mermaid +graph TD + User[User] -->|Config & Commands| CLI[CLI Entry Point (main.py)] + + subgraph Core Application + CLI --> Facade[ResumeFacade] + Facade --> generator[ResumeGenerator] + Facade --> Parser[LLMJobParser] + end + + subgraph Services + Facade -->|Controls| Browser[Selenium / Chrome Driver] + Parser -->|Queries| LLM[LLM Manager (OpenAI/Claude/Ollama)] + generator -->|Queries| LLM + end + + subgraph Data Layer + CLI -->|Reads| ConfigFiles[YAML Config & Secrets] + Facade -->|Reads| ResumeData[Plain Text Resume] + Facade -->|Writes| Output[PDF Output] + end +``` + +## Core Components + +### 1. Entry Point & Configuration (`main.py`) +- **Responsibilities**: + - Handles user input via CLI. + - Validates configuration (`secrets.yaml`, `config.yaml`). + - Initializes the application environment. +- **Key Classes**: `ConfigValidator`, `FileManager`. + +### 2. Logic Orchestration (`src/libs/resume_and_cover_builder/resume_facade.py`) +- **Responsibilities**: + - Acts as the central hub connecting the UI (CLI) with backend logic. + - Manages the flow of parsing job descriptions and generating documents. +- **Key Classes**: `ResumeFacade`. + +### 3. LLM Integration (`src/libs/llm_manager.py`) +- **Responsibilities**: + - Abstracts interactions with various AI providers (OpenAI, Claude, Ollama, Gemini, etc.). + - Manages prompt templates and chains for specific tasks (e.g., summarizing skills, generating cover letters). +- **Key Classes**: `GPTAnswerer`, `AIAdapter`. + +### 4. Resume Generation (`src/libs/resume_and_cover_builder/resume_generator.py`) +- **Responsibilities**: + - Fills HTML templates with tailored content. + - Converts HTML to PDF. +- **Key Classes**: `ResumeGenerator`. + +## Tech Stack + +- **Language**: Python 3.10+ +- **Browser Automation**: Selenium WebDriver, ChromeDriverManager +- **LLM Orchestration**: LangChain +- **Configuration**: YAML +- **Data Validation**: Pydantic, Dataclasses diff --git a/docs/modules/configuration.md b/docs/modules/configuration.md new file mode 100644 index 00000000..cae44934 --- /dev/null +++ b/docs/modules/configuration.md @@ -0,0 +1,29 @@ +# Configuration & Validation + +The application relies on three main YAML configuration files located in the `data_folder`. + +## Configuration Files + +1. **`secrets.yaml`**: Stores sensitive API keys (e.g., `llm_api_key`). +2. **`config.yaml`**: General settings like `remote`, `experience_level`, `locations`, `blacklists`. +3. **`plain_text_resume.yaml`**: The user's resume data in YAML format. + +## Config Validator (`main.py`) + +The `ConfigValidator` class ensures that the `config.yaml` file contains valid settings before the app runs. + +### Validation Rules +- **Required Keys**: Checks for existence of keys like `positions`, `locations`, `distance`. +- **Type Checking**: Ensures values are of correct types (list, bool, int). +- **Enums**: Validates against allowed values: + - `EXPERIENCE_LEVELS`: internship, entry, associate, etc. + - `JOB_TYPES`: full_time, contract, part_time, etc. + - `DATE_FILTERS`: all_time, month, week, 24_hours. + - `APPROVED_DISTANCES`: 0, 5, 10, 25, 50, 100. +- **Email Validation**: Regex checking for email formats. + +## File Manager (`main.py`) + +The `FileManager` class handles the filesystem interface. +- **`validate_data_folder`**: Ensures `data_folder` exists and contains all required YAML files. +- Creates the `output` directory if it doesn't exist. diff --git a/docs/modules/core_logic.md b/docs/modules/core_logic.md new file mode 100644 index 00000000..f3e604ff --- /dev/null +++ b/docs/modules/core_logic.md @@ -0,0 +1,41 @@ +# Core Logic & Entry Point + +## Main Application Entry (`main.py`) + +The `main.py` file serves as the CLI entry point for the application. + +### Key Functions + +- **`main()`**: The primary execution function. + - Initializes `FileManager` to validate data directories. + - Calls `ConfigValidator` to ensure all YAML configs are correct. + - Invokes `prompt_user_action()` to determine the user's intent. + - Delegates execution to `handle_inquiries()`. + +- **`handle_inquiries(selected_actions, parameters, llm_api_key)`**: + - Routes the user's selection to the appropriate `create_*` function. + - Supports: "Generate Resume", "Generate Tailored Resume", "Generate Cover Letter". + +- **`promp_user_action()`**: + - Uses the `inquirer` library to present an interactive CLI selection menu. + +## Resume Facade (`src/libs/resume_and_cover_builder/resume_facade.py`) + +The `ResumeFacade` class implements the Facade pattern to simplify the interface for resume generation operations. + +### Responsibilities +- **Initialization**: Sets up the environment, including API keys, style paths, and log output. +- **Job Parsing**: Coordinates with `LLMJobParser` to extract structured data from a raw job URL. +- **Browser Control**: Manages the Selenium driver instance for scraping and PDF generation. + +### Key Methods + +- **`create_resume_pdf_job_tailored()`**: + - Fetches the selected style. + - Generates HTML using `ResumeGenerator`. + - Converts HTML to PDF via `HTML_to_PDF` utility. + +- **`link_to_job(job_url)`**: + - Navigates the browser to the provided URL. + - Extracts the HTML body. + - Initialize `LLMJobParser` to interpret the page content. diff --git a/docs/modules/data_models.md b/docs/modules/data_models.md new file mode 100644 index 00000000..b6549733 --- /dev/null +++ b/docs/modules/data_models.md @@ -0,0 +1,33 @@ +# Data Models & Schemas + +The application uses rigorous data validation to ensure that resume data and job application profiles are well-structured. + +## Resume Schema (`src/resume_schemas/resume.py`) + +The `Resume` class is defined using `Pydantic` models, ensuring type safety and validation for user-provided data. + +### Key Classes +- **`Resume`**: The root model containing all sections. +- **`PersonalInformation`**: Name, email, phone, location, links. +- **`EducationDetails`**: List of education records. +- **`ExperienceDetails`**: List of work experience records. +- **`Project`**, **`Achievement`**, **`Certifications`**, **`Language`**. + +**Validation Features:** +- Email format validation (`EmailStr`). +- URL validation for links (`HttpUrl`). +- `normalize_exam_format`: Helper to handle inconsistent data formats in YAML. + +## Job Application Profile (`src/resume_schemas/job_application_profile.py`) + +Defined as a Python `dataclass`, this model holds user preferences and legal/demographic information often required by job boards. + +### Sections +- **`SelfIdentification`**: Gender, veteran status, disability, ethnicity. +- **`LegalAuthorization`**: Work authorization status for US, EU, Canada, UK. +- **`WorkPreferences`**: Remote/hybrid preferences, relocation. +- **`SalaryExpectations`**: Desired salary range. +- **`Availability`**: Notice period. + +## Data Loading +The `Resume` and `JobApplicationProfile` classes both include `__init__` methods that accept a YAML string, parsing it into the object structure and raising detailed errors (`ValueError`, `TypeError`) if the input validation fails. diff --git a/docs/modules/llm_integration.md b/docs/modules/llm_integration.md new file mode 100644 index 00000000..21a303af --- /dev/null +++ b/docs/modules/llm_integration.md @@ -0,0 +1,38 @@ +# LLM Integration + +The application relies heavily on Large Language Models (LLMs) for understanding job descriptions and generating human-like text for resumes and cover letters. + +## LLM Manager (`src/libs/llm_manager.py`) + +This module handles the abstraction layer for different AI providers. + +### AI Model Adapter Pattern + +The `AIAdapter` class acts as a factory, instantiating the correct model class based on the configuration (`LLM_MODEL_TYPE`). + +Supported Providers: +- **OpenAI** (`OpenAIModel`) +- **Claude** (`ClaudeModel`) +- **Ollama** (`OllamaModel`) - for local inference +- **Gemini** (`GeminiModel`) +- **HuggingFace** (`HuggingFaceModel`) +- **Perplexity** (`PerplexityModel`) + +### GPTAnswerer + +The `GPTAnswerer` class is a high-level service that uses the configured LLM to answer specific questions related to the resume or job application. + +**Key Features:** +- **`answer_question_textual_wide_range`**: Determines which section of the resume (e.g., Experience, Education) is relevant to a question and uses an appropriate prompt chain to generate an answer. +- **`is_job_suitable`**: Analyzes the job description against the resume to calculate a suitability score. +- **`summarize_job_description`**: Compresses long job descriptions into concise summaries. + +### Logging (`LLMLogger`) +All LLM requests and responses are logged to `open_ai_calls.json` for debugging and cost tracking. It captures: +- Model Name +- Token Usage (Input/Output/Total) +- Estimated Cost +- Prompts and Replies + +## Prompt Engineering +Prompts are stored in `src/libs/llm/prompts.py` (referenced in `llm_manager.py`). The application uses `LangChain` templates to structure these prompts dynamically with input variables like `{resume_section}` or `{job_description}`. diff --git a/docs/modules/utils.md b/docs/modules/utils.md new file mode 100644 index 00000000..34a44ae2 --- /dev/null +++ b/docs/modules/utils.md @@ -0,0 +1,17 @@ +# Utilities and Helpers + +Common utility functions used throughout the application. + +## Chrome Utils (`src/utils/chrome_utils.py`) + +Handles interactions with the Chrome browser via Selenium. + +- **`init_browser()`**: Initializes a Selenium Chrome driver with specific options (headless mode, user-agent spoofing, window size). +- **`HTML_to_PDF(html_content, driver)`**: Uses the browser's print-to-PDF capability to convert a rendered HTML string into a PDF byte stream. + +## Constants (`src/utils/constants.py`) + +Central repository for string constants used in prompts and configuration keys. +- **LLM Command Keys**: `PERSONAL_INFORMATION`, `SELF_IDENTIFICATION`, `EXPERIENCE_DETAILS`, etc. +- **File Names**: `SECRETS_YAML`, `WORK_PREFERENCES_YAML`. +- **Model Aliases**: `OPENAI`, `CLAUDE`, `GEMINI`. From fcdf1d9ea270c8d8eefca3912562a24312affd7f Mon Sep 17 00:00:00 2001 From: lokesh-univest Date: Wed, 17 Dec 2025 12:31:43 +0530 Subject: [PATCH 2/2] fix errors --- docs/architecture/flows.md | 12 ++++++------ docs/architecture/overview.md | 12 ++++++------ 2 files changed, 12 insertions(+), 12 deletions(-) diff --git a/docs/architecture/flows.md b/docs/architecture/flows.md index 4eb2dc22..ea3ecf3d 100644 --- a/docs/architecture/flows.md +++ b/docs/architecture/flows.md @@ -61,16 +61,16 @@ Generating a generic resume without specific job tailoring. ```mermaid graph TD - User[User Input] -->|Select Style| StyleManager[Style Manager] - StyleManager -->|Template Path| Generator[Resume Generator] + User["User Input"] -->|Select Style| StyleManager["Style Manager"] + StyleManager -->|Template Path| Generator["Resume Generator"] subgraph Generation Process - ResumeData[Load Resume Data] -->|Inject| Generator - Generator -->|Render| HTML[HTML Resume] - HTML -->|Convert| PDF[PDF Generator (Selenium)] + ResumeData["Load Resume Data"] -->|Inject| Generator + Generator -->|Render| HTML["HTML Resume"] + HTML -->|Convert| PDF["PDF Generator (Selenium)"] end - PDF --> Output[Output Folder] + PDF --> Output["Output Folder"] ``` ## 4. LLM Request Lifecycle diff --git a/docs/architecture/overview.md b/docs/architecture/overview.md index 3653faf9..4011f07b 100644 --- a/docs/architecture/overview.md +++ b/docs/architecture/overview.md @@ -9,7 +9,7 @@ The system operates on a modular architecture where the **Core Controller** orch ```mermaid graph TD - User[User] -->|Config & Commands| CLI[CLI Entry Point (main.py)] + User[User] -->|Config & Commands| CLI["CLI Entry Point (main.py)"] subgraph Core Application CLI --> Facade[ResumeFacade] @@ -18,15 +18,15 @@ graph TD end subgraph Services - Facade -->|Controls| Browser[Selenium / Chrome Driver] - Parser -->|Queries| LLM[LLM Manager (OpenAI/Claude/Ollama)] + Facade -->|Controls| Browser["Selenium / Chrome Driver"] + Parser -->|Queries| LLM["LLM Manager (OpenAI/Claude/Ollama)"] generator -->|Queries| LLM end subgraph Data Layer - CLI -->|Reads| ConfigFiles[YAML Config & Secrets] - Facade -->|Reads| ResumeData[Plain Text Resume] - Facade -->|Writes| Output[PDF Output] + CLI -->|Reads| ConfigFiles["YAML Config & Secrets"] + Facade -->|Reads| ResumeData["Plain Text Resume"] + Facade -->|Writes| Output["PDF Output"] end ```