Product Roadmap
What we're building, what's next, and what just shipped.
Planned
37Promote proven calculations into the live method library
Work through the large catalogue of calculations from the old engine (already triaged into keep, improve, and drop) and add the keepers into the live method library one by one, each with its formula wired to the calculator, its inputs defined, and a sanity test. Unlocks cross-functional KPI questions (customer acquisition cost, churn, lifetime value, return on ad spend, inventory turnover, and so on) on any client that has the inputs, not just finance metrics. The engine that uses these methods already exists; this is the work to fill the library.
Knowledge base no longer errors on certain questions
Some questions fail with an internal database error instead of returning an answer. It is a pre-existing fault in the answer path, not caused by recent ingest changes. This fixes it so those questions return an answer or a clean gap. User-visible reliability fix.
Answer more of the questions the data can actually support
After the ingest rebuild, documents are clean and well-indexed and the system finds the right document more often, but the end-to-end answer rate stayed roughly flat (most questions still return no answer). The bottleneck is the retrieval-and-answer step, not document processing. Next step under evaluation: the newer planner and document-reading engine against the now-clean data.
Retry tool no longer reports false failures when processing actually succeeded
An internal retry tool sometimes reports a document as failed when it actually finished successfully a few seconds after the tool stopped waiting. Fix: wait longer for large documents, or check the processing job state directly instead of the document row. Internal tooling only, no customer impact.
Find the right column even when the names do not match exactly
Numeric questions return a no-data gap when the field the engine looks for (for example price in euros) does not exactly match what the spreadsheet calls it (for example revenue or amount). The column matching needs to be fuzzy or AI-driven so the engine finds the right field even when clients name things differently.
Read picture-only and chart-only files (OCR and vision)
Some visual-heavy files (cashflow chart images, a marketing-leads image, an API process-map PDF) come back as no readable content because there is no extractable text. The fix is to use a vision model to describe the image so its content reaches the knowledge base. Note: describing images and charts already shipped for many files (see the image-only files card); the remaining piece here is reading text out of scanned, image-only documents.
Keyword search with smart ranking
Fast keyword search with typo tolerance. Results ranked by relevance, recency, and document status. Archived items stay findable but rank lower.
Search index across all features
A background index keeps a searchable copy of everything a user can access and updates live as data changes. Foundation layer, with no user-visible surface in this phase.
Cmd+K / Ctrl+K search palette
Press Cmd+K (Mac) or Ctrl+K (Windows) from anywhere in the app to search across documents, findings, clients, research, and reports, with results grouped by type.
Meeting recordings become part of the knowledge base, traceable to the minute
Audio and video recordings flow down the text route: the transcript (transcription already works today) is filed into pages, stated facts, and quotes like any document, with speaker names and timestamps preserved - so an answer can cite "the management meeting recording, 14:32, operations manager". Delivers what the parked transcript cards (F04.M3/M4) promised, on the new engine. Phase T-3/T-4 of the text-engine proposal.
One recipe book for both computing and explaining numbers
Today the knowledge base keeps two recipe code paths in sync by hand: one to compute a number (like margin) and a separate one to explain what a number is made of (the why flow). Adding a metric means editing both. The goal is one registry that serves both reads, so adding a metric is a one-place change and the two never drift. This is the foundation for expanding the why coverage.
Explain why more metrics moved, not just a few
The why explanation today covers only margin, profit, and income. Why-questions about other metrics (gross or net margin, EBITDA, churn, conversion rate, customer acquisition cost, cash runway, and so on) fall back to a generic answer instead of a cause-and-effect breakdown. Each new metric needs its inputs and formula defined and its inputs recognised by the Librarian. Best done after the one-recipe-book change above, which collapses the work into a single step.
Answer profit and loss questions directly
Add a profit method (profit equals revenue minus cost for a period) so a question like what was my profit in 2024 returns a clean number instead of a vague text answer. Flagged as the highest-leverage gap in the May 2026 coverage review.
Answer how-many-people questions
Add a headcount method so consultants can ask current team size, growth over time, and per-person ratios. From the May 2026 coverage review.
Compute client and cost concentration ratios
Add methods for client and cost concentration so questions like top 5 clients as a share of revenue, or salaries as a share of total cost, return a proper composition answer instead of a vague text one. From the May 2026 coverage review.
Answer cash runway questions
Add cash-runway and monthly burn-rate methods (cash divided by monthly burn). Important for advising cash-tight small businesses. From the May 2026 coverage review.
Plot a metric across many years, not just year-on-year pairs
Add a multi-year trend method so the engine handles questions like revenue from 2020 to 2025 as a proper series across all the years, not just a two-year comparison. From the May 2026 coverage review.
Full mobile optimization across the app
Platform-wide mobile readiness pass: every user-facing feature must be usable on a phone from first ship (Brief rule 2.8). Phase 1: the app shell mobile navigation (a hamburger menu plus a slide-in drawer) so signed-in pages are navigable on a narrow screen. Phase 2 (rolling): review each feature at phone widths and fix it in place until the whole app passes, including tables, forms, modals, the Data Room uploader, admin surfaces, the roadmap board, and the Activity Log. Success: every route reachable from the hamburger, all main buttons tappable, no sideways scrolling, forms usable one-handed, and the full flow demoable on a phone.
Process YouTube links (audio extraction)
The Data Room accepts Vimeo, Dailymotion, SoundCloud, and direct audio or video links, but rejects YouTube because Google blocks downloads from server addresses. Options on the table: route YouTube through a service whose home-user addresses dodge the block (cheap, but relies on YouTube auto-captions rather than our higher-quality transcription); pay for a residential proxy to keep the current transcription (recurring cost, ongoing cat-and-mouse); or keep rejecting and improve the download-and-upload guidance. The service route looks lowest-friction. Decision pending.
Connect live software (CRM, ERP, accounting, chat)
Consultants often keep data in live software (CRM, ERP, accounting, analytics, chat) that cannot be handed over as a file or link. The platform should connect to such software and pull data in. Today all intake is files, links, or media only. One connector capability, not one card per product. Examples worth keeping: paste a Zoom cloud-recording share link to import a recording, or connect a Zoom account once and browse recent recordings to import.
Import a whole cloud folder (Drive or Dropbox) at once
A consultant should be able to point at an entire Google Drive or Dropbox folder and have every file inside it imported at once. Today only a single cloud share link (one file) is accepted. This is bulk file import from cloud storage, distinct from connecting live software.
Write a describe-step summary for documents, not just Excel
Every file should produce a brief with two parts: a human paragraph (the Data Room overview) and an engine part. The Excel route already does this. The document route (PDFs, Word, pictures, audio, video) does not yet get a real describe step, so most summaries are just the first couple of lines of text. Depends on building the document text-engine. --- 2026-06-12: delivered inside F04.T2.CLAIMS (the Clerk's text route) - the document brief v1 seeds the claims map per the text-engine proposal section 4.6.
Keep every row's label when a PDF table has no header or is stacked
When a PDF table has no column header over its label column — or when one block actually holds two tables stacked together — each row should still keep its label attached to its numbers. Today those numbers are captured and queryable and the label is preserved in the cell, but the label may not lift into its own clean column, so a row can read as bare figures. This makes every such row self-describing. (Distinct from G-12.P2, which is about reading tables out of scanned/image PDFs in the first place.)
Pull labelled fact boxes out of PDFs, not just full tables
PDFs often carry labelled fact boxes — a company's registration number, status, founding date — that are not laid out as a full table. Today these ride the document text path. This would read them into queryable structured data the same way real tables are, so a question like “what is the registration number” is answered exactly rather than approximated from prose.
Fewer false “needs a look” flags on non-English documents
A document is flagged for review when our coverage check cannot confirm every topic and page is accounted for. On non-English documents (e.g. Macedonian) that check is stricter than it needs to be — it can miss that a topic is already covered by a table we filed, or that a page is genuinely just a heading — and flags the document unnecessarily. This tightens the check so consultants see a review flag only when something truly needs a human.
Read scanned documents and images, not just digital text
Some uploads are scans or images rather than digital text — a photographed brochure, a handwritten meeting note, a screenshot of a chart. Today the knowledge base reads only documents that carry real text; scanned/image documents are flagged for review rather than guessed at (we never let the AI narrate what it thinks a picture shows — every stored quote must be verifiable). This adds a dedicated reader for scanned and image documents so their content becomes searchable like everything else. Concrete cases parked here from the 3-client ingestion (2026-06-13): Echo's scanned company-profile PDF and handwritten diagnostic notes, and SmartClick's 3 PNG charts (sales cycle, leads status, social stats).
New documents become answerable the moment processing finishes
When you upload a set of documents, the knowledge base now runs its final indexing pass automatically once every file has finished processing — so the precise facts a document states (a role's required years of experience, a contract term, a quoted price) are reliably found right away. Before this, the index could be built while the last few documents were still being processed, leaving their facts stored but not yet searchable; a precise-fact question could then return a gap even though the answer was already in the system. This makes onboarding a new client's whole document set reliable end-to-end. (Root cause found in the Target Group eval, 2026-06-13: the per-document stated-facts table for a job-description PDF was created 16 minutes after the index was last built, so the planner never saw a card for it and the "minimum experience for the Sales Director role" question gapped despite the value "5 години" being stored. Blocks F04.T4.STATED-FACTS on freshly-ingested clients.)
Answer precise fact questions from your documents, not just find quotes
The knowledge base reliably surfaces what documents SAY (quotes and themes) and honestly reports gaps. But a precise fact pulled from a document — e.g. "the Sales Director role requires a minimum of 5 years experience" — is stored correctly yet not yet returned when asked: the engine searches the quote notebook instead of the per-document facts table. Root-caused during the 3-client ingestion (2026-06-13): the document catalogue the question-router reads is not fully built after import (a librarian build step the import path skips). This completes that wiring so precise document facts answer as cleanly as spreadsheet facts. Reproduction + detail in docs/handoff/2026-06-13_text_ingest_3clients.md.
Read tables locked inside scanned PDFs
Tables embedded in Word docs and digital PDFs are already pulled into queryable numbers (G-12). Tables inside SCANNED PDFs are different: the scan-reader returns their rows as loose text, so the numbers do not become a real table and individual rows can fail verification and drop. Seen in the 3-client ingestion (2026-06-13): financial-ratio tables (ROA, ROE, equity) in Target Group's scanned reports. This extends table extraction to cover scanned-PDF tables. Sibling of G-12.
Turn documents (PDF, Word, transcripts, web) into queryable data
A document gets cleaned and briefed, then stops: there is no engine that reads it, organises it, and makes it answerable. Only spreadsheets currently lead to answers. This builds the text-engine behind the document door, which unblocks the document describe-step and the picture and scanned-PDF cases (run image-only files through vision so the description becomes answerable; add OCR or vision for scanned PDFs that have no text layer).
Flag it when two uploaded files disagree on the same number
When two documents report different values for the same metric, period and currency (Echo's two files each give a different September income — 4,272,549 vs 3,912,195 MKD), the platform should raise it for the client to confirm which is right, citing the exact file/tab/cell — never silently pick one. The cross-source detector (detectConceptConflicts) exists and both September tables share concept+period+currency, but no conflict was raised on the Echo overlap — needs debugging (likely the per-(year,currency) totals comparison). Surfaced in Stage 4 Echo eval (E-B1, E4.04).
Answer questions that use English category names on non-English files
Category/vendor values stored in another language can't be matched when the user asks in English — e.g. asking "how much rent?" on a file whose cost line reads "Кирија Канцеларија" returns the whole table instead of the rent row. The Librarian already translates column NAMES (the title field is English); it should also translate dimension VALUES so English filters resolve. Stage 4 Echo eval (E1.06, E2.06).
Read official financial statements that put each year in its own column
Registry/statutory statements lay line items down the rows and years (2023, 2024) across side-by-side columns. The Clerk should reshape these into a clean per-year ledger so single-year questions ("total revenue in 2024", "net profit in 2023") resolve. Today the wide shape answers year-over-year comparisons but not single-year lookups. Stage 4 Echo eval (E1.12-14, E4.07).
Rank best and worst months correctly, including loss-making ones
Month-by-month trends and best/worst-month rankings miss values on monthly-block files: loss (negative) months and the month that exists in two source files don't surface. The trend builder needs to read table-level periods (the sheet-name month stamp) and handle the dual-source September overlap. Stage 4 Echo eval (E4.02, E4.03, E4.05, E4.06).
Answer in the currency the question asks for
When the client default is MKD but the question says "in EUR", the engine should pick the euro figure from the same row rather than gap. Echo rows carry MKD and EUR side by side; the default-currency tiebreak picks MKD, but an explicit EUR request should override it. Stage 4 Echo eval (E3.07).
Give each month's table a consistent name across the file
The same table family gets a different English title per month (Outsource Transactions / Outsource Work / Outsourced Work…), and identical titles collapse in the catalogue (10 "Employee-Client Assignments", 3 "Currency Convertor USD"). Consistent family titling so a monthly series reads as one concept and per-month tables stay distinct. Stage 4 Echo eval (E-F1, B41; 64 duplicate-title flags raised).
Upgrade the web framework to the latest major version
The platform runs Next.js 15.5.x, security-patched against the recent known vulnerabilities. The next step is moving to the latest major version (16.x as of May 2026, likely newer when actioned) for long-term support and ongoing security patches. This is a major upgrade needing the official migration steps and a full regression pass plus a clean build, so it is deliberately deferred now that the active vulnerabilities are closed on the current line. Re-check the latest stable version when starting.
In Progress
1Onboard Target Group and SmartClick as full clients - tables and documents together
The second corpus wave (Dancho, 2026-06-12): once mixed text+table answering is proven on BizzBee, Target Group and SmartClick enter as new clients - their spreadsheets through the tabular engine and their text documents (first-meeting notes, SWOT, retro Q&A, research docs, Macedonian process-map PDFs) through the text route. Their diagnostic registries provide ready-made correct answers for grading. Proves the whole machine on fresh clients, multi-client isolation included. Phase T-3/T-4 of the text-engine proposal.
In Review
11Pull tables hidden inside PDFs and Word into queryable numbers
A PDF or Word file can contain tables, but the reader returns them as text, so they ride the document path as prose and never reach the structured-table engine. That means numbers locked in a PDF table are not answerable today. Only actual spreadsheet files get structured treatment.
Retire the old half-built document search completely
Delete the dormant retrieval experiment from the codebase: the chunk table population path, the similarity-search and embedding code, and the hidden LAKE_RAG switch. The new text engine answers from verified filed claims, not query-time search, so none of this code is reused - and no half-built engines stay in the codebase. Completes the cleanup that F04.UI.CLEANUP-RAG started on the UI side. Phase T-4 of the text-engine proposal.
One answer that combines your numbers and your documents - and shows disagreements openly
Cross-engine answering: "why did costs rise, and what did the management meeting say about it?" gets one answer - the number from the deterministic calculation, the narrative from verified quotes, each with its own citation. When a stated number disagrees with the ledger, both sides are shown, dated and attributed - the ledger answers, the disagreement is disclosed, the consultant resolves. Every element carries its full pedigree: quote, sentence, page, document, down to which audio or video it came from. Phase T-3; sections 4.3-4.4 of the text-engine proposal.
Answer qualitative questions from PDFs and Word docs, not just spreadsheets
When a consultant asks "what does the SWOT analysis reveal" or "what did the diagnostic meeting cover", the answer engine currently replies that it cannot answer qualitative questions, even when the document is indexed. This card teaches the one answering brain the text side: it reads the claims map next to the database map, recalls filed quotes, reads a whole document's pages when asked to summarize, and can compose a clearly-labeled impression from real quotes. Every quote is checked word-for-word against the stored pages before the answer ships - a quote that does not exist verbatim never reaches the consultant; no information means an honest gap. Phase T-2; sections 4.3-4.4 of docs/requirements/_F04_TEXT_ENGINE_PROPOSAL.md.
Prove no sentence is left behind when a document is filed
Coverage gates at ingest, mirroring the table engine's plan/execution/database-entry checks: the stored pages must reconstruct the full document text; every segment must either produce claims or be explicitly classified as no-content with a reason; every claim must anchor word-for-word to its page; and the document brief is reconciled against what was actually filed. Any failure flags the document Needs review at the door - a missed sentence is a bug caught at ingest, never a silent gap discovered at question time. Phase T-1; section 4.5 of the text-engine proposal.
Read each document once and file everything it contains: facts into a table, quotes into the notebook
The Clerk's text route (the textual engine, parallel to the tabular one). One reading pass per document at upload files everything into the 3-layer model: stated facts ("revenue 2025 is 20,000 EUR") become rows in a per-document stated-facts table the existing answering engine can read - marked weaker than ledger facts; qualitative quotes ("the team is exhausted") become claims with the exact sentence, speaker, date, topic tags, and entity links; impressions are never stored - composed at answer time. The document brief seeds the claims map so the planner can find everything. Phase T-1; sections 4.1-4.2a and 4.6 of the text-engine proposal.
Keep every document's text word-for-word, so answers can quote it exactly
The page store: every processed document's text saved verbatim, addressable by section, page, timestamp, and speaker. This is the ground truth that every quoted answer is checked against, and the engine's own copy of record - after processing, the original file retires, exactly as nobody reopens an Excel once the Receptionist has broken it into tables. Absorbs the never-used backbone_prose table. Phase T-1 of the text-engine proposal.
Write the exam for document answers before building the engine
A layered question map for text, mirroring the Stage-3 exam that brought the numeric engine to production quality: recall (what did X say about Y), synthesis (main concerns of a meeting), cross-document, timeline, cross-engine (numbers + narrative), false-premise questions that must be refused, and planted conflicts that must be disclosed. Ground truths come from the diagnostic registries and deterministic checks (a cited quote must exist word-for-word). Built against the frozen corpus; every later card is graded against this exam. Phase T-0 of the text-engine proposal.
Load BizzBee's documents into the platform - the first text corpus
Bring the first real text corpus into the platform: BizzBee ONLY (Dancho, 2026-06-12) - the five BizzBee proposal PDFs ingested through the Data Room and frozen as the first test corpus. BizzBee's spreadsheets are already in the knowledge base, so with its documents added the exam can mix table questions and text questions on ONE client and watch how the answering engine behaves across both. Target Group and SmartClick documents come later (F04.T4.CLIENTS), entered as full new clients with their tables. Phase T-0 of docs/requirements/_F04_TEXT_ENGINE_PROPOSAL.md.
Recognise the same customer, supplier, employee, or vendor under different spellings
When the same person or company is typed several ways across files (for example Masena, then 002 MasenaInvest (Mels), then Mels), the system should recognise they are the same and count every row regardless of spelling. It should look for this across customers, suppliers and vendors, employees, and cost centres or projects. The proper version finds likely matches with cheap signals, auto-merges only the clearly-identical ones, asks for confirmation on the borderline ones, and remembers approved matches. The naive substring version (built then disabled in May 2026) over-merged different people who shared a first name, so it needs the careful rebuild. The groundwork is in code behind a switch. Green-lit by Dancho on 2026-06-12, with one explicit guardrail: never treat two names as the same entity merely because they are a letter or two apart — spelling distance alone is only a candidate signal; a merge needs corroborating evidence (shared codes or IDs, same context) or human confirmation.
Knowledge base learns what each cell is about, for faster answers
Adds a Librarian step that tags every data cell after upload (revenue, cost, expense, cash, and so on) so the answer engine can jump straight to the right cell on each question instead of re-searching every time. Built behind two safety switches that stay off until it proves at least as accurate as today on the 16 controlled tests. Still in review: not yet switched on, accuracy parity not yet proven.
Ready to Ship
0Nothing queued for the next release.