Mass Digitization at the Smithsonian – 30,000 Bees in 32 Days

The Smithsonian’s Museum of Natural History has the 2nd largest entomological collection in the world with approximately 34 million specimens representing over 60% of known insect families. When the time came to digitize a subset of this collection containing over 30,000 individual bees, speed, accuracy and efficiency were all critical factors, making Pixel Acuity’s experienced and technically advanced staff an ideal partner.

This video from the Smithsonian highlights some of the amazing work done, including the ability to digitize over 30,000 specimens in 32 days (at an average rate of one specimen every 30 seconds). Click below to watch this 1-minute clip, and see the collection online at https://collections.si.edu/

For a more detailed look at some of the unique challenges and solutions our team provided, be sure to read our article on the technology Pixel Acuity created to achieve these incredible results.

Automatic OCR & Multipage PDF Creation

Introduction

DT Pixel Acuity and History Factory have teamed up to provide holistic digitization related services. Part of these services include the digitization of bound material such as books, newspapers, annual reports, and magazines. The digitization of bound material calls for particular considerations around the fact that any given capture is part of a larger whole (for example, one page in a book, or one leaf of a newspaper). In order to add additional value to the digitized collection, DT PA and History Factory are providing OCR services. Our automation has improved throughput and turnaround time while simultaneously increasing the accuracy of the image set and OCR.

OCR and Multi Page PDF Compilation

Previously DT PA used an outside vendor to facilitate the compilation of individual pages of an object into a PDF; these PDFs also serve as an excellent vehicle for the OCR. 

DT PA has now brought the PDF compilation and OCR services in house and added them to a fully automated pipeline. The immediate and obvious benefit is increased productivity, as the digital assets no longer have to be sent off-site (with an additional lag caused by the 3rd party vendor), but can be processed immediately after capture. A less obvious benefit is a significant increase in accuracy. In evaluating OCR vendors we’ve found a significant race-to-the-bottom in terms of the cost of the service provided, which seems to be driving an environment in which the quality of the OCR provided is assumed to be unimportant. We’ve found that simply by bringing OCR processing in-house, taking first-party responsibility for the results, and working off of a quality-first mentality, we’ve been able to greatly increase the consistency and accuracy of the OCR we are providing.

Summary

DT PA is dedicated to leading the industry in adopting new technology and new workflows wherever it will help us deliver on our primary tenets of maintaining preservation-grade FADGI 4-star image quality, conservation-friendly material handling, production that can scale to any size collection big or small, and customer-first work practices that make us a vendor you are genuinely thrilled to be working with. These two case studies exemplify the benefits of automation, but are just a sliver of the investments we are currently making. To hear more about how DT PA can help you with digitization hardware, software, services, and consultation, please contact us.