Revolution Data Systems Channels Its Inner Dell EMC with Project RevoStore

February 4, 2019 Anthony Markezich

They say, “necessity is the mother of invention,” but how about innovation?

Here at Revolution Data Systems the necessity to improve technologies through leading applications and infrastructure is a key component in our ability to provide quality services to a diverse customer base. Our document scanning bureau is powered by OpenText Captiva, a leading enterprise capture platform.

Processing millions of digital images for clients puts a major strain on our internal infrastructure. Our double-digit growth during each of the last four years has forced us to increase our processing bandwidth faster than anticipated.

Instead of just adding more and more storage, we needed a way to scale quickly and economically. Born from this need was Project Revostore. Project RevoStore, in its simplest form, is the design, build and configuration of a robust hybrid storage array with the scanning bureau environment in mind. Our scanning bureau is where/how we provide document scanning and business process outsourcing services to our customer base.

There are countless storage devices on the market sold by some of the largest global technology brands like Dell EMC, NetApp, HPE, IBM, Oracle and Hitachi, to name a few. We did not create Project RevoStore to compete with the big boys; instead we created it with the idea that we could accomplish the same thing at a much lower price tag.

Goal

After our explosive growth over the past 4 years, RDS found itself spending more and more time shuffling data around our internal servers and workstations than producing it. Anyone running a service bureau will tell you the longer it takes to produce a job, the less profitable the job becomes. Our end goal was to shrink our overall project completion time.

Process

Most document scanning projects go through the same process:

1. Pickup of records

2. Check in records to scanning bureau

3. Logging

4. Document preparation

5. Document scanning

6. Quality control

7. Document Indexing

8. Processing images to output for final delivery

Steps 1-7 all require human intervention, but our many years of experience and documented methodologies have made these steps extremely efficient. The last step, however, requires computer processing time and someone to monitor the process to ensure everything is going as planned. The lion’s share of processing time comes from image clean up, metadata analysis, formatting and creation of the final delivery - all of which requires storage and CPU processing power.

Design

Fifteen to twenty-five years ago any project that was over a few hundred thousand pages was considered a large project. Today it’s not uncommon for projects to be in the millions of pages while data conversations can easy reach into billions of objects. Thankfully the technology we use has also advanced. With the right planning and a little programming magic, data can now be processed asynchronously in a fraction of the time. To take advantage of the new technologies available to us, we want a system capable of doing many things at once. For this we need as many threads as possible.

Since we are working with a larger volume of data, we also must have plenty of storage and a way to get all this data to and from the system quickly.

The RevStore v.1.0 Specs:

- U4 Rackmount w/16 Hot-Swappable SATA/SAS Drive Bays and redundant power supply

- CPU 16 core 32 threads

- Memory DDR 32GB Low Latency Quad Channel configuration

- Boot Drive M2 512GB

- Raid Controller 16 port, Dual Core 1.2 GHz with 2GB DDR3 Cache

- Hard Drives 8 X 8TB SATA Helium 7200RPM Enterprise class

- Network 10G Intel X550-T1

- 10 port 10G Managed switch

- UPS 2200VA Rack Mounted

Configuration and Testing

With testing still ongoing I will keep this section short. The 8 X 8TB hard drives are in a raid 6 configuration. We are currently testing different strip sizes to determine the best strip size (4K, 8K, 16K) as our file sizes vary greatly from 2k all the way to 100GB+. With about 75 percent being under 500k regardless of what we choose, the 40+ TB of storage and 10GB network will ensure we spend less time shuffling data around than we did in the past.

We are currently testing different operating systems. With most of our custom software written in C# for Windows, I’m a little limited on what I can use for my operating system. Currently we run Windows 10 Enterprise. I am also looking into running FreeNAS with a Windows VM, but I’m a little concerned about the overhead and how many threads it will eat up.

Test 1

I have 3,959 files comprised of multi-page Tiffs and PDFs, containing a total of 138,455 pages scanned in 24 bit color and gray-scale. Page sizes range from A0 to A6. This test simply converts them all to single page bi-tonal Tiffs no larger than 8.5 X 14. To accomplish this each page needs to be resized, re-sampled, smoothed, adaptive threshed on my 8 cores 16 thread I7 Dell Workstation. With 16G ram 1TB SSD hard drive I could process at a rate of 87 images\pages a minute to and from my local drive.

I just ran the test on our new hybrid storage array, and it completed the job so fast I thought something went wrong. Thankfully I was mistaken. All the images were processed as expected at an astounding 4,448 images a minute!

We have plenty more testing to do but I like what I’m seeing so far. I will be putting out another article once it’s done and in production sometime in the near future.

Channeling my inner Michael Dell has been a blast and great learning experience. I am excited about our future and look forward to the continued evolution within our organization as technology constantly advances.