Many, many images and videos
It is a common problem: You have hundreds of thousands, or millions of images, videos and documents in general. They might be neatly organized into folders, but most often they aren't. Search in a database of that size is hard enough, let alone freely browsing, and "discovery" of value within the collection.
The Zorroa Platform
We are very excited about the product we are building at Zorroa. Our Platform allows companies to manage very large collections of visual assets, like images, video, PDFs, etc. Zorroa makes it very easy to deploy machine learning algorithms into a collection of these types of assets, and then it provides modern search, navigation and discovery on the results.
Here is a simplified overview of the Zorroa Platform:
Documents are imported via an analysis pipeline that is completely modular. The processing pipeline is made up of what we call processors, or snippets of Python or Java code, that take each asset, do something to it, and export a little bit of JSON.
Some examples of processors:
- Proxy (makes a thumbnail of the document, used as a visual representation of it)
- Image (read EXIF/IPTC data along with other image metadata)
- Color Analysis (generates statistics based on the colors found on images and video)
- Face Recognition (find faces and decide if they are known, if so tag the asset with a name)
- Image Classification (use a neural network to classify the image into a set of predefined categories)
You can arrange processors into a pipeline in order to achieve complex results. For example, a processor that detects text in scanned images could write out bounding boxes for each paragraph into the asset. Later in the pipeline, another processor could use these bounding boxes to optimize OCR.
The processing of assets is done by what we call an Analyst. You can set up as many Analysts as machines you have available in your network, thus parallelizing the import of assets and allowing to scale up to huge datasets.
The Archivist controls a job queue, and sends jobs to each available Analyst. The Archivist also manages an Elastic Search database of all the assets, and provides a REST interface for implementing a user front end.
We have two different front ends. One is Curator, a web interface that allows search, organization and discovery of assets. The other is a Python SDK, which talks with Archivist's REST interface directly.
In future posts we'll look into these building blocks in more detail. We'll also show how the platform becomes a great way to develop, implement and deploy vision algorithms.
Get Zorroa working on your images, videos and documents. Request a demo ›