OCR Conversion and Processing
Convert your paper documents to fully editable electronic files using our OCR Solutions and Conversion
OCR (Optical Character Recognition) is the process of converting paper documents into full editable electronic files such as Microsoft Word files, Excel spreadsheets, XML, CSV, HTML, PDF searchable, databases etc. The process begins by scanning the hard-copy material (eg. books, novels, newspapers, documents, magazines, journals, directories etc.) to produce high-resolution images such as TIFF, PDF, JPEG etc., before converting the image to a machine-readable and editable format.
- About OCR Conversion
- The Process
- Conversion Languages
- Request a quote
About OCR Conversion
Our specialist OCR conversion services include scanning of various types and sizes of documents and converting those to our client’s desired format. The quality of the OCR recognised text depends on the quality of the source documents. For example, if the documents are printed on a fairly good quality printer and are clear/legible, the OCR conversion accuracy will be as high as 99.99%, however if your documents are old, faint prints, contains marks, scratches etc. the accuracy and the quality of the OCR recognised text will be effected.
For these types of documents, we provide the following further OCR services;
- OCR Data Cleansing
- OCR Data Proof-reading
- OCR Data Restructuring (layout, format, fonts, pagination etc.)
Our OCR to Excel conversion services can be applied to structured (fully formatted tables with table gridlines), semi-structured (text, tables, images etc.) or non-structured (loose formatted). For example, if you have documents which are printed from an Excel spread sheet, a CRM system, bank statements, directories containing addresses and contact details we can convert these to fully formatted, accurate Excel spread sheet format.
We can further process the data and convert it to file formats such as CSV, XML, Text Searchable PDF, Sharepoint import etc.
We typically convert:
- Books and Documents to Microsoft Word
- Catalogues to Microsoft Excel
- Document conversion to XML, HTML, CSV and SPSS.
The OCR Conversion Process
The first step in the process of OCR conversion is to assess the quality of the original documents to determine the layout and formatting. Once we have assessed the documents, scanning and OCR processing rules are configured. OCR tests are carried out and samples are created for your approval. We offer three levels of Optical Character Recognition conversion. OCR accuracy depends on the quality of the original documents, if they are of good definitoon, we can then achieve up to 99.99% accuracy.
There are three levels of OCR conversion available depending upon what you require:
- OCR level 1
This is for the simplest of files which are plain text documents to be converted to a Microsoft Word document.
- OCR level 2
This is for somewhat more complex file layouts which have data in tables, flow charts, differing fonts and / or graphics. If you need to keep the original layout fonts, page order etc. then we recommend level 2.
- OCR Level 3
OCR level 3 is the most in-depth level and includes manual proof-reading and correction of any errors that may occur through the OCR process. This ensures that specific areas are double-checked, corrected and cleansed as required.
Benefits of outsourcing data capture
Fully readable digital documents
Data capturing forms enables you to access information immediately rather than having to go through all the data yourself.
Instantly searchable files
Because you aren't going through each form yourself, you can save time which can be used for other important tasks.
Data provided in convenient, popular formats
Outsourcing data capture to us will give you the benefit of having your data returned in formats such as CSV, XML and Excel.
OCR Conversion Languages
With our OCR conversion service, we can process multi lingual documents such as English, French, German, Portuguese, Italian, Spanish, Urdu, Arabic, Russian etc.
Plus all other major world languages, subject to sample testing.