How Lexithm works, how to use it, and what it can do for your codebase.
Lexithm is a code intelligence tool that reads your entire repository and lets you ask questions about it in plain English. It understands how every file connects — functions, imports, API routes, database models, and architecture layers. Instead of manually searching through files or pasting code into ChatGPT, you describe what you want to know and Lexithm finds the relevant parts across your whole codebase and gives you a clear answer.
The tool is designed for developers of all experience levels. Whether you are debugging a tricky issue, onboarding to a new project, reviewing a pull request, or just trying to understand how a piece of the system works, Lexithm gives you answers in simple language without requiring you to dig through unfamiliar files.
Sign in with your GitHub account. Lexithm uses GitHub for authentication only — no code is stored on external servers beyond what you choose to index. The OAuth flow only requests access to the repositories you explicitly select, and you can revoke access at any time from your GitHub settings.
Select a repository from the dropdown. Pick any repository you want to analyze. Both public and private repositories work the same way. The dropdown shows all repositories your GitHub account has access to, including ones from organizations you belong to.
Wait for indexing to complete. Most repositories finish in 2-5 minutes. Larger projects with over 100,000 files can take up to 20 minutes. You can watch the progress as each stage completes — file discovery, parsing, graph construction, embedding generation, and storage. The page shows real-time status updates so you know exactly what is happening.
Start asking questions. Type your question in plain English. Lexithm searches the full index and responds with answers that cite the specific files and code it references. You do not need to know the right terminology or file names — just describe what you want to know.
Deep code indexing. Lexithm parses your codebase using AST-level analysis for Python, JavaScript, TypeScript, Go, Java, and Rust. It extracts every symbol, function, class, import, export, API route, database model, and type definition. For other languages, it falls back to regex-based extraction for basic symbols and imports. The parser handles modern syntax including decorators, generics, async functions, type annotations, and pattern matching.
Dependency graph building. Every file is mapped against every other file. Lexithm knows who imports what, which functions call which, and how data flows between modules. This allows it to answer questions about architecture and impact analysis with high accuracy. When you ask about changing a function, it can tell you every file that would be affected.
Conversational interface. Each session keeps full context of your conversation. You can ask follow-up questions, dig deeper into specific areas, or ask for clarification on previous answers. The model remembers what was discussed earlier in the session so you can build understanding incrementally without repeating yourself.
Evidence-backed answers. Every response cites the files and line numbers it references. You can verify the source directly instead of trusting a black box. The model only answers based on what it finds in your actual code — it never guesses or hallucinates. If the answer is not in your codebase, it tells you that instead of making something up.
Real-time streaming. Responses appear as they are generated so you do not wait for the full answer. You see the text build up sentence by sentence, and you can start reading immediately. This makes the interaction feel faster and more natural, like talking to a person who thinks out loud.
Lexithm is split into two main parts: a Python backend and a Next.js frontend. The backend handles all the heavy lifting — code indexing, AST parsing, dependency graph construction, vector search, and LLM orchestration. The frontend provides the chat interface, repository management, and visualization of results.
The indexing pipeline is the core of the system. It reads every file in your repository, parses it into an AST (Abstract Syntax Tree), extracts symbols and relationships, generates embeddings for semantic search, and stores everything in a structured index. This index is what makes instant answers possible. Without it, every question would require re-scanning the entire codebase.
When you ask a question, the query understanding engine classifies your intent — whether you want a high-level overview, a deep code explanation, bug analysis, security review, or something else. It then retrieves the most relevant code chunks from the index and feeds them to the LLM along with a tailored system prompt. This means the LLM only sees the code that is relevant to your question, not your entire repository.
The system uses NVIDIA NIM as the primary LLM provider with OpenRouter as a fallback. Code context is sent to the model only when you ask a question. Your repository is never stored or retained by the LLM provider. The backend handles all caching and session management so repeated questions about the same code are fast.
Indexing is the process of reading every file in your repository and building a searchable map of everything in it. Think of it like creating an encyclopedia of your codebase that Lexithm can look up instantly. Once the index is built, every question you ask is answered by looking up the most relevant entries rather than scanning files from scratch.
The first stage is file discovery. Every file in the repository is found and categorized by type. Source files are queued for deep AST parsing. Configuration files like package.json, Dockerfile, docker-compose.yml, and tsconfig.json are cataloged for architecture context. Documentation files like README, CONTRIBUTING, and CHANGELOG are indexed for project understanding. Binary files and generated code are skipped to keep the index clean.
The second stage is AST parsing. Each source file is parsed into an Abstract Syntax Tree — a structured representation of the code that captures every function, class, variable, import, export, and type annotation. This is what allows Lexithm to understand the relationship between different parts of the code at a semantic level rather than just matching text patterns.
The third stage is graph construction. All the extracted symbols are connected into a dependency graph. This graph maps which files depend on which, which functions call which, and how data moves through the system. It captures both direct dependencies and transitive ones so impact analysis is complete.
The fourth stage is embedding generation. Each code chunk is converted into a vector embedding — a mathematical representation of its meaning. These embeddings are stored in a vector database for semantic search. When you ask a question, your question is also converted to an embedding, and the system finds the code chunks whose embeddings are most similar.
The final stage is storage. All the parsed data, graphs, and embeddings are written to disk. Future queries use this stored index directly without re-scanning the repository. The index is compressed and optimized for fast retrieval so answers come back in seconds even for large codebases.
When your code changes, you can re-index to pick up the latest updates. Re-indexing is incremental — it only processes files that have changed since the last index instead of starting from scratch. This means re-indexing a repository after a few commits takes seconds, not minutes.
You can ask anything about your codebase in plain English. Lexithm understands the intent behind your question and retrieves the most relevant parts of your code to answer it. You do not need to know file names, function names, or even the right terminology — just describe what you want to understand.
For high-level questions like "what does this project do" or "explain the architecture", Lexithm gives you a concise overview without diving into specific files. It describes the project purpose, main features, and overall structure based on what it finds in the code and documentation.
For specific questions like "how does authentication work" or "where are API routes defined", Lexithm finds the exact files and functions involved and explains how they work together. Answers include references to the relevant files with line numbers so you can verify the source.
For debugging questions like "why is the login failing" or "find the bug in the payment flow", Lexithm analyzes the code paths involved and identifies potential issues. It explains the root cause in plain English, shows you the relevant code, and suggests how to fix it based on patterns in the rest of your codebase.
For comparison questions like "what is the difference between these two approaches" or "which implementation is better", Lexithm compares the implementations side by side and explains the trade-offs in simple terms. It considers performance, maintainability, and consistency with the rest of your codebase.
Every answer maintains context from your previous questions in the same session. You can dig deeper, ask for clarification, or explore related areas without repeating yourself. The session context includes the repository structure, previous questions and answers, and the relevant code chunks that have been discussed.
Your repository stays on your machine. When you ask a question, only the relevant code context is sent to the LLM provider to generate an answer. Your code is never stored or retained by the LLM provider. The backend strips all identifying information from the context before sending it to the model.
Authentication is handled through GitHub OAuth. Lexithm never sees or stores your GitHub password. It only requests access to the repositories you explicitly choose to analyze. You can revoke access at any time from your GitHub settings under Applications.
The index data is stored locally on the backend server. It is not shared with any third party. You can delete the index at any time by re-indexing or removing the repository. The index is encrypted at rest and isolated by user account so no one else can access your code.
API keys are managed by the backend. You do not need to bring your own API key or set up any external service accounts. Everything runs through the backend infrastructure. The backend handles rate limiting, retries, and failover between providers automatically.
All communication between the frontend and backend is encrypted over TLS. The backend communicates with the LLM provider over encrypted channels as well. No plaintext data is ever transmitted over the network. Session tokens are short-lived and rotated regularly.
How is it different from just using ChatGPT? ChatGPT only sees what you paste into it. It has no context of your codebase beyond what you manually provide. Lexithm reads your entire repository — every file, function, and import — and understands how all the pieces connect. Answers are specific to your actual code, not generic advice. It also cites exact file locations so you can verify every claim.
Can I use it to understand a project I just joined? Yes. This is one of the most common use cases. Ask high-level questions like "what does this project do" or "explain the folder structure" to get oriented. Then dig into specific areas like "how does authentication work" or "where is the payment processing logic". It explains everything in plain English with references to the actual code.
Does it check the entire codebase or just what I ask about? It indexes the whole repository first. When you ask a question, it searches the full index to find the most relevant parts. This means even if you ask about something in a file you have never opened, it still finds the answer. The index covers 100% of your source files.
Do I need to install anything? No. Everything runs through your browser. Sign in with GitHub, pick a repository, and start asking questions. There is no CLI tool, plugin, or local setup required.
Is my code safe? Do you store it? Your repository is indexed on our backend, not stored by any LLM provider. Only the relevant code context is sent to generate each answer. You can delete the index or revoke GitHub access at any time.
How long does indexing take? Most repositories finish in 2-5 minutes. Larger projects with over 100,000 files can take up to 20 minutes. The progress screen shows each stage so you know exactly what is happening.
Is there a limit on questions? No. You can ask as many questions as you want across all your repositories. Each session maintains its own context so you can have multiple conversations about different topics or different parts of the codebase simultaneously.
What happens when my code changes? You can re-index a repository to pick up the latest changes. The index updates incrementally so it only processes files that have changed instead of starting from scratch. A typical re-index after a few commits takes seconds, not minutes.
Which languages are supported? Full AST-level parsing is supported for Python, JavaScript, TypeScript, Go, Java, and Rust. Regex-based extraction is used for other languages including Ruby, PHP, C++, C#, Swift, Kotlin, and Scala. The system works with any text-based file format including Markdown, YAML, JSON, and configuration files.
Ready to try it?
Get started© 2026 Lexithm