Managing Word Lists

OCRR uses domain-specific word lists to improve OCR accuracy for specialized content. This feature is especially useful when working with industry-specific terminology, technical jargon, or proper names that might not be recognized by standard OCR processing.

Each domain (see "Understanding Domains" in the OCR Basics section) has its own word list, allowing you to customize terminology for different content types you work with.

Note: Word lists are different from correction rules. Word lists help OCRR recognize specialized terminology during the initial text recognition process, while correction rules fix errors after text has been recognized. For information about correction rules, see the Text Correction Rules page.

Word List Manager Word list editor showing domain words and controls

Accessing the Word List Editor

There are multiple ways to access the Word List Editor:

  • From OCR Settings: Click "OCR Settings" → "OCR Wordlists"
  • From Batch Processing: Select a domain, then click "Text Processing Options" → "Word List"

Warning: You must select a domain before accessing the Word List Editor. The "OCR Wordlists" button will be disabled if no domain is selected.

Adding Words to a Domain

Adding Words from OCR Results

OCRR can automatically extract unique words from your OCR results, allowing you to quickly build a domain-specific vocabulary:

  1. Process a document with OCR.
  2. Select a domain from the dropdown.
  3. Open the Word List Editor.
  4. You'll see extracted words in the "New Words" section.
  5. Select the words you want to add.
  6. Click "Add Words" to add them to your domain.

Manually Managing Words

You can also manually edit the word list for any domain:

  • Edit: Click the pencil icon next to any word to modify it.
  • Delete: Click the trash icon to remove a word from the domain.
  • Filter: Use the search field to filter words in the list.

Importing and Exporting Word Lists

OCRR allows you to import and export word lists as CSV files, making it easy to share lists between devices or back them up:

Exporting a Word List

  1. Open the Word List Editor for your domain.
  2. Click "Export Dictionary".
  3. Choose a location to save the CSV file.
  4. The exported file will contain all words from the current domain.

Importing a Word List

  1. Open the Word List Editor for your domain.
  2. Click "Import Dictionary".
  3. Select a CSV file containing your word list.
  4. OCRR will add all words from the file to the selected domain.
  5. Duplicate words will be automatically skipped.

Tip: CSV files should have one word per line. If your CSV has multiple columns, only the first column will be imported.

Using Domains with OCR

Once you've created a domain and added words to it, you can use it to improve OCR accuracy:

  1. Select your domain from the dropdown before processing a document.
  2. OCRR will use the domain's word list to improve recognition of specialized terms.
  3. If you've also set up correction rules for the domain, those will be applied according to your correction settings.

Tip: For best results, use both word lists and correction rules together. Word lists help with initial recognition, while correction rules fix any remaining errors.

Best Practices

  • Be Selective: Only add words that are specific to your domain and likely to be misrecognized by standard OCR.
  • Use Real Examples: Process typical documents first, then add specialized terms that were missed or incorrectly recognized.
  • Organize by Context: Create separate domains for different contexts rather than one large domain with mixed terminology.
  • Complement with Correction Rules: Use text correction rules alongside word lists for the best results. See the Text Correction Rules page for more information.

Navigation