# Dictato > Context-aware voice dictation for macOS This document contains the full content of all documentation pages for AI consumption. --- ## Audio **URL:** https://docs.dictato.ai/docs/audio **Description:** Microphone, sounds, and media behavior Audio settings live in **Settings → Audio**. ![Audio settings](/screenshots/audio.png) ## Microphone mode - **Automatic** - follows the system default input. Changes when you plug in headphones or a USB mic. - **On-device** - always uses the built-in Mac microphone, even if another input is default. - **Custom** - always uses a specific mic by UID. If that mic isn't plugged in, Dictato falls back to on-device. A live level meter shows whatever mic is currently active, so you can verify Dictato is hearing you before you start. ## Pause media while recording On by default. Dictato pauses system media playback (Music, Spotify, Safari video, etc.) when you start dictating and resumes after. ## Sound profile Pick the audible tones Dictato plays when recording starts, stops, and locks. Choose **None** to go silent. --- ## Configuration files **URL:** https://docs.dictato.ai/docs/config-files **Description:** The JSON files Dictato reads and writes, for hand-editing, syncing, or scripting Dictato stores every setting in plain JSON inside your [config folder](/docs/folders). Hand-edit them, commit them to a dotfiles repo, or script them — the app reads the files on launch and writes them back atomically when you change settings in the UI. ## `preferences.json` Every UI setting except shortcuts. Examples: ```json { "languages": ["en", "da"], "pauseMediaWhileRecording": true, "useScreenContext": true, "customVocabulary": "Dictato, WhisperKit, Parakeet", "saveAudioRecordings": true, "microphoneMode": "onDevice", "selectedModel": "parakeetV3", "soundProfile": "systemTink", "postProcessingEnabled": false, "postProcessingMethod": "onDevice", "middleMouseEnabled": false, "gesturesEnabled": true, "captureTextSelection": false, "captureCopy": false, "capturePaste": false } ``` Unknown keys are ignored. Missing keys fall back to defaults, so you can ship a minimal file with only the knobs you care about. ## `shortcuts.json` One entry per [shortcut action](/docs/shortcuts). Each action maps to an array of shortcuts — multiple bindings means any of them fires the action. ```json { "pushToTalk": [ { "keyCode": null, "modifierFlags": 8388608 }, { "keyCode": 15, "modifierFlags": 1310720 } ], "lockRecording": [ { "keyCode": 49, "modifierFlags": 0 } ], "stopLockedRecording": [ { "keyCode": 53, "modifierFlags": 1048576 } ], "regionScreenshot": [ { "keyCode": 21, "modifierFlags": 1179648 } ], "drawOnScreen": [ { "keyCode": null, "modifierFlags": 1572864 } ], "addTextQuote": [ { "keyCode": 17, "modifierFlags": 524288 } ], "exitActiveTool": [ { "keyCode": 53, "modifierFlags": 0 } ] } ``` ### Shortcut fields - `keyCode` — virtual key code (UInt16) or `null` for a modifier-only shortcut. These are standard macOS virtual key codes (`kVK_ANSI_A = 0`, `kVK_Space = 49`, `kVK_Escape = 53`, `kVK_ANSI_R = 15`, etc. — see `Carbon/HIToolbox/Events.h`). - `modifierFlags` — decimal `NSEvent.ModifierFlags` raw value. Sum the bits you want: | Modifier | Bit | Decimal | |---|---|---| | Shift | `1 << 17` | `131072` | | Control | `1 << 18` | `262144` | | Option | `1 << 19` | `524288` | | Command | `1 << 20` | `1048576` | | Function (Fn / Globe) | `1 << 23` | `8388608` | So `⇧ ⌘ 4` is `131072 + 1048576 = 1179648`. `Fn` alone is `8388608`. ### Action names `pushToTalk`, `lockRecording`, `stopLockedRecording`, `regionScreenshot`, `drawOnScreen`, `addTextQuote`, `exitActiveTool`. ## `history.json` The index of your past [dictations](/docs/history). Written by the app, not meant for hand-editing — deletions and renames are handled through the UI. Safe to back up or inspect. ## `hooks.json` Shell commands, Claude prompts, and webhooks to run on Dictato events. See [Hooks](/docs/hooks) for the full reference. ## Syncing Point **Settings → Folders → Config folder** at an iCloud Drive, Dropbox, or Syncthing folder to keep preferences, shortcuts, history, and hooks in sync across Macs. Dictato re-reads the files when the folder path changes. --- ## Context capture **URL:** https://docs.dictato.ai/docs/context **Description:** Give Dictato extra context about what you're doing Dictato can pull extra context from your screen and clipboard so the transcription understands *what* you're talking about. All capture toggles live in **Settings → Transcription**. ![Context capture settings](/screenshots/transcription.png) ## What can be captured - **Use screen text as context** - runs OCR on what's visible and feeds it to the transcription or [post-processing](/docs/post-processing) prompt. - **Capture text selection** - grabs whatever is currently highlighted in another app. - **Capture copy (⌘C)** - remembers your most recent clipboard copy. - **Capture paste (⌘V)** - remembers what you last pasted. - **Capture app switches** - notes when you change active app and window during the session. - **Circle gesture screenshots** - the [circle gesture](/docs/tools#circle-gesture-screenshot) detector. ## Vocabulary Add brand names, jargon, or oddly-spelled words in **Vocabulary** so the transcription engine knows they exist. Comma- or line-separated, both work. Example: `MyProduct, Smith Street, Acme Corp, Jürgen, kubectl` ## Custom dictation prompt A short free-form prompt that biases the [transcription model](/docs/models) itself. Use it to describe your speaking style, your domain, or conventions you want Dictato to respect while it's turning audio into text. ### How it differs from the post-processing prompt The **dictation prompt** is applied *during* transcription to guide how the model hears you. It's seen by the on-device speech model (Parakeet or WhisperKit) and helps with things like disambiguating homophones, picking the right spelling for names, and keeping filler words out. The [post-processing prompt](/docs/post-processing#prompt-template) runs *after* transcription on the finished text. It's seen by an LLM (on-device or Claude) and is used for rewriting, punctuation, formatting, and applying style rules you can't express to a speech model. Rule of thumb: if the rule is about *sound* (pronunciation, word choice, jargon), put it in the dictation prompt. If it's about *text* (tone, formatting, rewriting), put it in the post-processing prompt. --- ## Dictation **URL:** https://docs.dictato.ai/docs/dictation **Description:** Push-to-talk, hands-free lock, and stopping a recording Dictato has two recording modes: a quick push-to-talk for short notes, and a locked hands-free mode for longer sessions. ## Push-to-talk Hold **Fn** and speak. Release to transcribe and paste wherever your cursor is. ## Hands-free lock For longer dictations, **press Fn + Space** while holding Fn. The recording locks and you can let go of the keyboard. While locked, you can use your hands for anything: open another app, grab a screenshot, draw on screen with [tools](/docs/tools). The recording keeps going. ## Stopping a locked recording Any of these will stop and paste: - **⌘ Escape** - **Middle-click** (if [middle-click recording](/docs/shortcuts#middle-mouse-button) is enabled) - Click the **stop** button in the overlay ## Middle-click toggle Enable **Settings → [Shortcuts](/docs/shortcuts#middle-mouse-button) → Middle mouse button to record** to start and stop hands-free recording with a middle-click. Useful if you have a three-button mouse and don't love reaching for Fn. ## Pause media while recording By default, Dictato pauses whatever's playing (music, video) while you dictate and resumes after. Toggle this in **Settings → [Audio](/docs/audio)**. ## Shortcuts reference These are the factory defaults. Every one is remappable, and each action can have multiple bindings — see [Shortcuts](/docs/shortcuts) to customize. | Action | Default | |---|---| | Push-to-talk | Hold **Fn** | | Lock recording (hands-free) | **Fn + Space** | | Stop hands-free recording and paste | **⌘ Escape** | | Toggle hands-free (optional) | **Middle-click** | | [Region screenshot](/docs/tools#region-screenshot) | **⇧⌘4** | | [Draw on screen](/docs/tools#draw-on-screen) | Hold **⌥⌘** | | [Add text quote](/docs/tools#text-quotes) | **⌥T** | | Exit active tool | **Escape** | --- ## Folders **URL:** https://docs.dictato.ai/docs/folders **Description:** Where Dictato stores your transcriptions and config Dictato writes everything to regular folders on your Mac, with no proprietary database. Settings are in **Settings → Folders**. ![Folders settings](/screenshots/folders.png) ## Markdown folder Each finished [dictation](/docs/dictation) is saved as a Markdown file here. Attachments (screenshots, drawings, audio) live in a `attachments/` sibling folder. Change the location with **Choose…** or click **Reveal** to open it in Finder. ### What a dictation file looks like A single dictation with a screenshot and saved audio ends up as something like this: ```markdown --- date: 2026-04-11T14:26:10Z source_app: Notes duration: 18.3 language: en title: Weekly plan for the garden project audio: attachments/2026-04-11-14-26-10-a1b2c3.m4a --- # Weekly plan for the garden project Let's think about the schedule for this week. The back bed is ready for the new plants, so Saturday morning we can move the tomatoes in. The front path still needs clearing, and I want to finish that before we have people over on Sunday. Also, remember to order more mulch before Friday so it arrives in time. [^A]: ![Screenshot A](attachments/2026-04-11-14-26-10-a1b2c3-A.png) ``` The filename follows `YYYY-MM-DD HH-MM-SS --- ## History **URL:** https://docs.dictato.ai/docs/history **Description:** Browse, search, and re-use past dictations Every dictation is saved locally. Open the main Dictato window to browse them. ![History window](/screenshots/history.png) ## What you get - **Title** - auto-generated if [post-processing](/docs/post-processing#generate-title) title extraction is on. - **Text** - the final transcription (raw or post-processed). - **Source app** - which app you were dictating into. - **Duration** - how long you recorded. - **Attachments** - screenshots, drawings, or quoted text attached to the session with [tools](/docs/tools). ## Search Use the search field at the top to find a dictation by its text. The sidebar also groups entries by the app you were using, so you can click an app to filter. ## Re-using a dictation Each row has a **Copy** button to put the text back on your clipboard. ## Opening a dictation Double-click any row to open the full detail view. Screenshots and drawings from the session appear inline, exactly where you captured them in the flow of the transcription. The toolbar has a **Copy** and **Share** button for the whole thing. ![Opened dictation with inline screenshots](/screenshots/history-detail.png) ## Where is it stored? Transcriptions live in your **Markdown folder**, one file per dictation. See [Folders](/docs/folders) to change the location, and [Audio](/docs/audio) to turn audio saving on/off. --- ## Hooks **URL:** https://docs.dictato.ai/docs/hooks **Description:** Run shell commands, webhooks, or Claude prompts on Dictato events Hooks let you react to what Dictato is doing: transform the transcription before it pastes, log events to another tool, block a paste into a specific app, or anything else you can script. ![Hooks settings](/screenshots/hooks.png) ## Configure Create `hooks.json` in your [config folder](/docs/folders). Each key is an event name mapping to an array of matcher groups: ```json { "TranscriptionComplete": [ { "matcher": "en", "hooks": [ { "type": "command", "command": "hooks/spellcheck.sh", "timeout": 10 } ] } ] } ``` Relative command paths resolve from the config folder. ## Hook types ### `command` Runs a shell command. Receives the event JSON on stdin. Optionally writes replacement JSON to stdout. ```json { "type": "command", "command": "hooks/my-script.sh", "timeout": 10 } ``` ### `prompt` Sends the event payload plus your prompt to the Claude API (reuses the key from [Post-processing](/docs/post-processing#method)). The model's response becomes the new text. ```json { "type": "prompt", "prompt": "Fix spelling and grammar. Return only the corrected text.", "timeout": 30 } ``` ### `webhook` HTTP POST with the JSON payload as the body. - **200** - success; if the response is `{"text": "..."}` it replaces the current text - **204** - success, no changes - **422** - block the operation (for blockable events) - **other** - logged as error ```json { "type": "webhook", "url": "https://example.com/hooks/dictato", "timeout": 15 } ``` ## Events | Event | Matcher field | Can block? | Payload | |---|---|---|---| | `SessionStart` | - | - | - | | `RecordingStart` | `sourceApp` | yes | `sourceApp`, `sourceAppBundleId` | | `RecordingLocked` | - | - | `duration` | | `RecordingStop` | - | - | `duration`, `sampleCount` | | `ScreenshotCaptured` | `sourceApp` | - | `label`, `timestamp`, `sourceApp` | | `QuoteCaptured` | `sourceApp` | - | `label`, `text`, `sourceApp`, `sourceWindow`, `timestamp` | | `TranscriptionStart` | `model` | - | `engine`, `model`, `languages` | | `TranscriptionComplete` | `language` | yes | `text`, `language`, `duration` | | `BeforePaste` | `targetApp` | yes | `text`, `targetApp`, `targetAppBundleId`, `hasAttachments` | | `AfterPaste` | `targetApp` | - | `text`, `targetApp` | ## Protocol - **Input (`command`)** - JSON on stdin with `event`, payload keys, and `configFolder` - **Input (`webhook`)** - HTTP POST with the same JSON as body - **Input (`prompt`)** - payload is sent as context alongside your prompt to Claude - **Output** - JSON with `{"text": "..."}` replaces the current text. No output means no change. - **Exit codes (`command`)** - `0` success, `2` block (blockable events only), anything else is an error - Hooks with matching events run in parallel. ## Browse examples The **Examples** picker in the Hooks settings pane shows ready-to-copy scripts living in your config folder. Use them as a starting point and edit in place. --- ## Welcome to Dictato **URL:** https://docs.dictato.ai/docs **Description:** Context-aware voice dictation for macOS **Dictato** is a local, privacy-friendly dictation app for macOS. Hold **Fn** to dictate, release to paste. You can also attach screenshots, drawings, and quoted text to the same [dictation](/docs/dictation) so you can talk *about* what's on screen. ## Why Dictato? - **Local first.** Transcription runs on-device with [Parakeet or WhisperKit](/docs/models). Nothing leaves your Mac unless you enable cloud [post-processing](/docs/post-processing). - **Voice plus visuals.** Combine speech with screenshots, drawings, and [quoted text](/docs/tools#text-quotes) in a single dictation. - **Context-aware.** Optionally attach on-screen text, the active app, and your recent selection with [context capture](/docs/context). - **Scriptable.** [Hooks](/docs/hooks) let you run shell commands, webhooks, or Claude prompts on events like `BeforePaste` or `TranscriptionComplete`. --- ## Languages **URL:** https://docs.dictato.ai/docs/languages **Description:** Pick your dictation languages Dictato auto-detects language on every dictation, but you can narrow it down in **Settings → Languages** for faster, more accurate results. ![Languages settings](/screenshots/languages.png) ## How it works - **No languages selected** - Dictato auto-detects from every language the [model](/docs/models) supports. - **One language** - all dictation is forced to that language. Best speed and accuracy if you only speak one. - **Two or more** - Dictato auto-detects between the languages you've picked. If it can't tell, it falls back to the **Primary** (the first one in the list). Add a language from the **Add Language** picker. Remove with the red minus button. ## Model support Language availability depends on your [transcription model](/docs/models). Parakeet V3 covers 25 languages; Parakeet V2 is English-only; WhisperKit Medium/Base covers 100+. --- ## Transcription models **URL:** https://docs.dictato.ai/docs/models **Description:** Pick the on-device speech model that fits your Mac Dictato runs two model families, both entirely on-device. Pick one in **Settings → Model**. Unselected models download on demand. Once you've picked a model, see [Languages](/docs/languages) to narrow down what Dictato listens for. ![Model settings](/screenshots/models.png) ## Parakeet (recommended) NVIDIA's Parakeet family, running through Apple Neural Engine. Fastest on modern Apple Silicon. | Model | Size | Best for | |---|---|---| | **Parakeet V3** | ~600 MB | Multilingual (25 languages), ~210× real-time on M4 | | **Parakeet V2** | ~400 MB | English only, highest recall | ## WhisperKit OpenAI's Whisper models compiled for CoreML. Slower than Parakeet but more widely supported. | Model | Size | Best for | |---|---|---| | **Base** | ~150 MB | Lightweight, good for most languages | | **Distil Large V3** | ~600 MB | 6× faster than Large V3, within 1% accuracy | | **Medium** | ~1.5 GB | Higher accuracy across languages | | **Medium (English)** | ~1.5 GB | Higher accuracy for English only | ## Which should I pick? - **English only, fast machine:** Parakeet V2 - **Multilingual:** Parakeet V3 - **Older Mac or wide language support:** WhisperKit Base or Medium - **Best accuracy, don't mind size:** WhisperKit Medium or Distil Large V3 --- ## Post-processing **URL:** https://docs.dictato.ai/docs/post-processing **Description:** Clean up transcriptions with on-device or Claude AI Post-processing runs the raw transcription through an LLM to fix punctuation, fix obvious typos, and apply your own rules. It runs *after* the speech model has finished, so it's distinct from the [custom dictation prompt](/docs/context#custom-dictation-prompt) that biases transcription itself. Enable it in **Settings → Post-processing**. ![Post-processing settings](/screenshots/post-processing.png) ## Method - **On Device** - runs locally on macOS 26 or later, with no API key required. - **Anthropic** - sends the transcription to the Claude API. Paste your API key in the settings pane and choose a model (Haiku, Sonnet, Opus). ## Generate title When enabled, Dictato asks the model to write a short title for each transcription. Titles show up in [History](/docs/history) so you can find a dictation later without re-reading it. ## Prompt template The prompt you send to the model is fully editable. It supports a few placeholders that Dictato fills in at runtime: | Placeholder | Filled with | |---|---| | `{{transcription}}` | The raw transcription text | | `{{vocabulary}}` | Your custom [vocabulary](/docs/context#vocabulary) from Settings → Transcription | | `{{language}}` | The detected [language](/docs/languages) | | `{{app}}` | The app you were focused on when dictating | ![Prompt template editor](/screenshots/post-processing-prompt.png) Click **Reset to Default** to go back to the built-in prompt. --- ## Quick start **URL:** https://docs.dictato.ai/docs/quickstart **Description:** Install Dictato and record your first dictation ## Next steps --- ## Shortcuts **URL:** https://docs.dictato.ai/docs/shortcuts **Description:** Remap every keyboard shortcut and bind multiple triggers per action Every Dictato shortcut is remappable. Each action can have one or many shortcuts bound to it, so you can set both **Fn** and **⌃⌘R** for push-to-talk and either one works. Shortcuts live in **Settings → Shortcuts**. ![Shortcuts settings](/screenshots/shortcuts.png) ## Recording a shortcut 1. Click a shortcut field (it comes to life with an accent border). 2. Press the keys you want. The preview mirrors whatever's held — press **Shift**, see `⇧`; add **Cmd**, see `⇧ ⌘`. 3. A key-chord commits the moment you press a non-modifier key (e.g. **⇧ ⌘ L** commits `⇧ ⌘ L` on the `L` keydown). 4. A modifier-only shortcut (like **Fn** alone, or **⌃ ⌥**) commits when you release the last modifier. 5. Click **+ add** under an existing shortcut to bind a second trigger to the same action. 6. Click the **x** on a chip to remove that shortcut. If you remove the last one, the field collapses back to an empty recorder — no trigger is bound until you add one. To cancel without recording anything, press **Esc** with no modifiers held, or click outside the field. ## What you can bind - Any key on its own: `L`, `F5`, `Return`. - Any key with modifiers: `⌃ ⌘ R`, `⇧ ⌥ F7`. - Modifiers alone: `Fn`, `⌘ ⌥`, `⌃ ⌘`. Shortcuts with extra modifiers held at the time of press won't match unless those modifiers are part of the binding — it's exact-match, not subset-match. ## Hold vs tap Some actions are naturally "hold" actions: - **Push-to-talk** — recording starts on key-down, stops on key-up. - **Draw on screen** — drawing is active while held, captured on release. These work with any shortcut you bind — hold `⌃ ⌘ R` or hold `Fn`, same behavior. The rest of the actions fire once per press. ## Conflicts with macOS If a shortcut you bound is already registered as a macOS system shortcut (Spotlight, Mission Control, etc.), a yellow warning triangle appears next to the chip. macOS will usually intercept the keys before Dictato sees them — click the triangle for the full explanation and fix it in **System Settings → Keyboard → Keyboard Shortcuts**, or pick a different combination. Dictato can only detect *system* shortcuts. Collisions with other apps (Raycast, Alfred, Karabiner, etc.) are invisible — if a shortcut mysteriously doesn't fire, try a different one. ## Middle mouse button At the bottom of the **Recording** section there's a toggle to start and stop hands-free recording with a middle-click. Off by default. This lives with the shortcuts because it's just another trigger for the same action. ## Reset to defaults The **Reset to defaults** link below the panel restores the factory bindings across every action. Nothing else is touched. ## Defaults | Action | Default | |---|---| | Push-to-talk dictation | Hold **Fn** | | Lock recording (hands-free) | **Space** (while push-to-talk is active) | | Stop hands-free recording and paste | **⌘ Esc** | | Region screenshot | **⇧ ⌘ 4** | | Draw on screen | Hold **⌥ ⌘** | | Add text quote | **⌥ T** | | Exit active tool | **Esc** | ## Storage Shortcuts are saved to `shortcuts.json` in your [config folder](/docs/folders). See [Configuration files](/docs/config-files) for the format. --- ## Tools during recording **URL:** https://docs.dictato.ai/docs/tools **Description:** Screenshots, drawings, quoted text, and drag-and-drop attachments While you're dictating (especially in hands-free lock), you can attach things to the same recording. Everything you attach becomes part of the transcription's context and history. ## Region screenshot Press **⇧⌘4** to grab a region of the screen. The screenshot animates into the overlay and attaches to the current recording. ## Draw on screen Hold **⌥⌘** to turn the screen into a canvas. Sketch arrows or annotations anywhere, release to capture. The drawing is saved as an image attachment. ## Circle gesture screenshot Rough-circle something on screen with your mouse three times and Dictato grabs the region automatically, without any keyboard shortcut. - First two loops arm the gesture - Third loop captures the circled area as a screenshot - Move away from the circled area (or press **Escape**) to cancel Toggle in **Settings → Transcription → Circle gesture screenshots**. ## Text quotes Select text anywhere on your screen, then press **⌥T** to attach it as a quoted snippet. Handy when you want to say "rewrite *this* paragraph" without re-typing or switching windows. ## Drag and drop Drag anything onto the Dictato overlay to attach it: - **Files** - added as file attachments - **Images** - added as image attachments - **URLs** - added as links - **Text** - added as a text snippet - **Colors** - attached as a color swatch ## Exiting a tool Press **Escape** to exit the active tool without stopping the recording. ## Shortcut reference | Tool | Shortcut | |---|---| | Region screenshot | **⇧⌘4** | | Draw on screen | Hold **⌥⌘** | | Text quote | **⌥T** | | Circle gesture | Draw 3 circles | | Exit tool | **Escape** | ![Shortcuts settings in Dictato](/screenshots/shortcuts.png) --- ## Links - [Feedback](https://tally.so/r/PdX8o5)