Conversion and OCR

Not every document can be translated straight away. PDF files with complex graphic layouts, scans, and files without an extracted text layer require conversion before they reach the translator. That is why an automatic translator for scanned documents often fails to deliver the expected results, i.e. well-translated text with a layout that reflects the original.

To meet the needs of clients who encounter this challenge, we carry out manual and automated conversion as well as optical character recognition (OCR). We transform non-editable documents into files ready for translation or further processing.

Scope of work

Format conversion

PDF → DOCX, INDD, AI, XLSX, PPTX and others
Manual conversion and automated conversion
Preserving structure, tables and formatting
Preparing text for printing in new formats

OCR – optical character recognition

Extracting text from scans and image-based PDFs
OCR for Polish and other languages with diacritical marks
Recognising text from a scanner while preserving the page layout

DTP verification and correction

Manual checking of every file after conversion
Fixing recognition errors
Editing scanned text – corrections and formatting
Preparing for translation in CAT tools

Tools

Use

Example tool

PDF conversion

Adobe Acrobat Pro, MS Word

OCR

ABBYY FineReader

ABBYY FineReader – the highest recognition accuracy on the market. Supports 190+ languages, including Polish with full support for diacritical marks. Our DTP department works with professional, production-grade tools, not free online programmes.

Professional OCR vs free tools

Parameter

Free tools

Studio Gambit

Accuracy

85–90%

98–99%

Manual verification

None

Every file

Layout preservation

Partial

Full

Diacritical marks

Errors

Correct

Tables and graphics

Problematic

Recreated

What do you gain?

The ability to translate PDF files and scanned documents

Time savings – you do not have to prepare the text manually

High recognition accuracy (also in languages with diacritical marks)

The final file retains the original layout

Result:

an editable file, faithful to the original, ready for translation or further processing.

Why us?

We work using professional software such as Adobe Acrobat Pro and ABBYY FineReader (OCR software)

After conversion, we manually verify every file

We prepare files both for translation, and for printing and digital publication

Experience in technical, medical, legal and marketing projects

Conversion and OCR

Do you have files that cannot be edited, and an automatic PDF file translator does not meet your expectations?

Send them – we will check whether they can be converted, and prepare a sample free of charge.

Desktop publishing

Do you need comprehensive
DTP services?

We work in most formats, from preparing documents from scratch to the final version.

FAQ

What is converting a PDF into an editable format and can you save money by doing it?

It is converting a closed PDF file into a document that can be opened in a CAT tool (e.g. Trados, memoQ) and translated using translation memories or AI. This makes translation faster, cheaper and more terminologically consistent.

Does a free online converter recognise text well?

Free tools often “break” text into separate frames, wrap lines in the wrong places, lose formatting and misinterpret tables. The result: the file is not suitable for translation in CAT tools or with AI support. The translator loses time on corrections, and the consistency of the text decreases. Professional conversion in the right tool, e.g. Adobe Acrobat Pro, makes it possible to preserve the document structure and carry out precise corrections before translation.

What does professional conversion in Adobe look like?

Each time, we adapt the method of delivery to your file. For example, it might look like this: we open the PDF file in Adobe Acrobat Pro, export it to DOCX format, and then manually check and correct it. Manual correction includes: merging split paragraphs, correcting tables, removing unnecessary special characters, matching fonts and symbols, and aligning formatting. Only then does such a file go to translation.

When automated conversion, and when OCR?

The choice depends on the type of file:

Do you have a PDF file with editable text (e.g. generated from Word)?

The best option will be automated conversion in a professional tool, e.g. Adobe Acrobat Pro, then moving it to DOCX format and manual corrections that allow the file to be prepared for translation. Most of our assignments are exactly this: automated conversion with manual correction.

Do you have a scan or an image-based PDF (text as an image)?

Here, OCR will work well, e.g. in ABBYY FineReader with manual verification.

What is OCR and why is it used in translation?

OCR (Optical Character Recognition) is the optical recognition of text from images. We use it when a PDF does not contain a text layer – i.e. with scans and photos of documents. It makes it possible to turn a scan or an image-based PDF into an editable file that can be translated in CAT tools and with the help of AI.

Automatic OCR vs professional OCR – what is the difference?

Free online tools have an accuracy of 85–90% and do not verify results. We use ABBYY FineReader (98–99% accuracy) and manually check every file.

Why is it that not every file can be “easily converted into an editable one”?

Typical reasons in the case of automated conversion: text broken into separate frames, misinterpreted tables, loss of formatting, problems with multi-column layouts.

Typical difficulties in the case of OCR: low scan resolution, unusual fonts, text on a graphic background.

That is why we assess each file individually and select the appropriate method of preparing the file for editing.

How do you prepare a scan to avoid problems with text recognition?

Scan at a resolution of at least. 300 dpi in greyscale or 1200 dpi in black-and-white mode, with a plain background, without distortions or creases.

How do you prepare a PDF for conversion?

Do not flatten layers, do not save the file as an image-based PDF (the “print to PDF” option sometimes does this). If you have access to the source file (e.g. Word, InDesign), it is better to send the original – conversion will be faster and more accurate. It is worth remembering that regardless of the source file, PDF files are also useful. Because PDF is a closed format, by opening it we know how a given document is displayed to you – regardless of the fonts you have, colour spaces or the version of the source file.

How long do conversion and OCR take?

A standard document (10–20 pages): 1 working day. For larger projects, we provide a deadline after analysing the materials.

Which formats do you accept for conversion, and which for OCR?

For OCR we accept most graphic formats, e.g. PDF, JPG, PNG, TIFF, BMP, PDF (without a text layer). We carry out conversions on PDF files with a text layer.

What can we do for you?

Write now for a tailor-made offer.

Completion date*

File for quotation (max. 5 MB)

I want to translate from language:

To language:

0 / 300

Conversion and OCR

Scope of work

Format conversion

OCR – optical character recognition

DTP verification and correction

Tools

Use

Example tool

Professional OCR vs free tools

Parameter

Free tools

Studio Gambit

What do you gain?

Why us?

Do you have files that cannot be edited, and an automatic PDF file translator does not meet your expectations?

Do you need comprehensive
DTP services?

FAQ

What can we do for you?

This site uses cookies

We value your privacy

By using this site you agree to the use of cookies placed in your browser. Privacy Policy

Conversion and OCR

Scope of work

Format conversion

OCR – optical character recognition

DTP verification and correction

Tools

Use

Example tool

Professional OCR vs free tools

Parameter

Free tools

Studio Gambit

What do you gain?

Why us?

Do you have files that cannot be edited, and an automatic PDF file translator does not meet your expectations?

Do you need comprehensive DTP services?

FAQ

What can we do for you?

This site uses cookies

We value your privacy

By using this site you agree to the use of cookies placed in your browser. Privacy Policy

Do you need comprehensive
DTP services?