Conversion and OCR

Not every document can be translated straight away. PDF files with complex graphic layouts, scans, and files without an extracted text layer require conversion before they reach the translator. That is why an automatic translator for scanned documents often fails to deliver the expected results, i.e. well-translated text with a layout that reflects the original.

To meet the needs of clients who encounter this challenge, we carry out manual and automated conversion as well as optical character recognition (OCR). We transform non-editable documents into files ready for translation or further processing.

OCR

Scope of work

Format conversion

  • PDF → DOCX, INDD, AI, XLSX, PPTX and others
  • Manual conversion and automated conversion
  • Preserving structure, tables and formatting
  • Preparing text for printing in new formats

OCR – optical character recognition

  • Extracting text from scans and image-based PDFs
  • OCR for Polish and other languages with diacritical marks
  • Recognising text from a scanner while preserving the page layout

DTP verification and correction

  • Manual checking of every file after conversion
  • Fixing recognition errors
  • Editing scanned text – corrections and formatting
  • Preparing for translation in CAT tools

Tools

Use

Example tool

PDF conversion

Adobe Acrobat Pro, MS Word

OCR

ABBYY FineReader

ABBYY FineReader – the highest recognition accuracy on the market. Supports 190+ languages, including Polish with full support for diacritical marks. Our DTP department works with professional, production-grade tools, not free online programmes.

Professional OCR vs free tools

Parameter

Free tools

Studio Gambit

Accuracy

85–90%

98–99%

Manual verification

None

Every file

Layout preservation

Partial

Full

Diacritical marks

Errors

Correct

Tables and graphics

Problematic

Recreated

What do you gain?

The ability to translate PDF files and scanned documents

Time savings – you do not have to prepare the text manually

High recognition accuracy (also in languages with diacritical marks)

The final file retains the original layout

Result:

an editable file, faithful to the original, ready for translation or further processing.

Why us?

We work using professional software such as Adobe Acrobat Pro and ABBYY FineReader (OCR software)

After conversion, we manually verify every file

We prepare files both for translation, and for printing and digital publication

Experience in technical, medical, legal and marketing projects

Conversion and OCR

Do you have files that cannot be edited, and an automatic PDF file translator does not meet your expectations?

Send them – we will check whether they can be converted, and prepare a sample free of charge.

Desktop publishing

Do you need comprehensive
DTP services?

We work in most formats, from preparing documents from scratch to the final version.

FAQ

It is converting a closed PDF file into a document that can be opened in a CAT tool (e.g. Trados, memoQ) and translated using translation memories or AI. This makes translation faster, cheaper and more terminologically consistent.

Free tools often “break” text into separate frames, wrap lines in the wrong places, lose formatting and misinterpret tables. The result: the file is not suitable for translation in CAT tools or with AI support. The translator loses time on corrections, and the consistency of the text decreases. Professional conversion in the right tool, e.g. Adobe Acrobat Pro, makes it possible to preserve the document structure and carry out precise corrections before translation.

Each time, we adapt the method of delivery to your file. For example, it might look like this: we open the PDF file in Adobe Acrobat Pro, export it to DOCX format, and then manually check and correct it. Manual correction includes: merging split paragraphs, correcting tables, removing unnecessary special characters, matching fonts and symbols, and aligning formatting. Only then does such a file go to translation.

The choice depends on the type of file:

  • Do you have a PDF file with editable text (e.g. generated from Word)?

The best option will be automated conversion in a professional tool, e.g. Adobe Acrobat Pro, then moving it to DOCX format and manual corrections that allow the file to be prepared for translation. Most of our assignments are exactly this: automated conversion with manual correction.

  • Do you have a scan or an image-based PDF (text as an image)?

Here, OCR will work well, e.g. in ABBYY FineReader with manual verification.

OCR (Optical Character Recognition) is the optical recognition of text from images. We use it when a PDF does not contain a text layer – i.e. with scans and photos of documents. It makes it possible to turn a scan or an image-based PDF into an editable file that can be translated in CAT tools and with the help of AI.

Free online tools have an accuracy of 85–90% and do not verify results. We use ABBYY FineReader (98–99% accuracy) and manually check every file.

Typical reasons in the case of automated conversion: text broken into separate frames, misinterpreted tables, loss of formatting, problems with multi-column layouts.

Typical difficulties in the case of OCR: low scan resolution, unusual fonts, text on a graphic background.

That is why we assess each file individually and select the appropriate method of preparing the file for editing.

Scan at a resolution of at least. 300 dpi in greyscale or 1200 dpi in black-and-white mode, with a plain background, without distortions or creases.

Do not flatten layers, do not save the file as an image-based PDF (the “print to PDF” option sometimes does this). If you have access to the source file (e.g. Word, InDesign), it is better to send the original – conversion will be faster and more accurate. It is worth remembering that regardless of the source file, PDF files are also useful. Because PDF is a closed format, by opening it we know how a given document is displayed to you – regardless of the fonts you have, colour spaces or the version of the source file.

A standard document (10–20 pages): 1 working day. For larger projects, we provide a deadline after analysing the materials.

For OCR we accept most graphic formats, e.g. PDF, JPG, PNG, TIFF, BMP, PDF (without a text layer). We carry out conversions on PDF files with a text layer.

What can we do for you?

Write now for a tailor-made offer.


This site uses cookies

The site uses cookies to improve its accessibility.

We collect information on site traffic and email addresses entered in our forms. You can decide whether to allow cookies by setting your browser accordingly. See our privacy policy for more information.

We value your privacy

The owner of this website collects and processes data about users in order to provide services through Studio Gambit Sp. z o.o. The data is processed in accordance with law and in compliance with security rules. Processed data is not tranferred to other companies.