Herd - a managed data lake for the cloud
The Herd unified data catalog helps separate storage from compute in the cloud. Manage petabytes of data and make it accessible for data processing and analytical purposes by any cloud compute platform.

Meta-data Catalog

Unified Data Catalog

A centralized, auditable catalog for operational usage and data governance.

Decreased Manual Effort

Track Lineage

Capture data ancestry for regulatory, forensic, and analytical purposes

Increased Coverage

Manage Clusters

Create and launch clusters; load data into clusters from catalog entries

Increased Coverage

Orchestrate Jobs

Orchestrate clusters and catalog services to automate processing jobs

Just released! Herd-UI, a search and discovery tool for business and technical users.

Getting Started

      What is herd?

      Demo Installation

      Quick Start to Registering Data

Get Involved


  1. We encourage everyone who has an idea to fork the code, experiment and share their experiences with us through our GitHub Issues.

  2. If you believe that you have a worthwhile contribution, please open an issue on GitHub and explain your idea.

  3. The herd team will review your idea and prioritize or start a discussion about the issue.

  4. If the issue is agreed upon, start coding.

    • Remember to write unit tests to maintain our code coverage.

    • But make sure you don't have any passwords or encryption keys from your environment in your code!

  5. Once you have written your code, please make sure to sign off your work when you commit it.

    git commit -s -m 'YOUR COMMIT DESCRIPTION'

    git commit --signoff -m 'YOUR COMMIT DESCRIPTION'
    When you signoff, you are agreeing to the following:
    Developer's Certificate of Origin (adapted from the linux kernel)
    By making a contribution to this project, I certify that:
    (a). The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or
    (b). The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; or
    (c). The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it.
    (d). I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the open source license(s) involved.

  6. Once you have completed the commit, you can make a pull request referencing the initial issue created for the work.

  7. The herd team will process your pull request in a timely fashion and may perform additional testing. You will receive feedback or a notification that your code will be accepted with an indication of the timeframe for acceptance.

  8. All activity regarding the issue including contributor and herd team discussions will occur within the GitHub issues system, so check back frequently and watch issues that interest you.

   Report Bugs

    When you encounter bugs, please try to find out if the issue has already been reported by searching here. Once you identify that your issue is new, create a new ticket. Please make sure you provide a clear description of the issue along with a testable scenario.

    If you want to get involved in fixing bugs, first read the steps outlined above under "Contribute". Then, you can assign yourself to the bug as the owner and start developing your code. Once you complete build and test of your code, please make a pull request for review. Once the review is complete, your code fix will be merged into master branch within the herd repository.


    We are actively seeking organizations and individuals that are interested in adopting herd and contributing to the development effort. If you want to get involved, you can start by getting to know herd on the wiki or the GitHub project. Or don't hesitate to reach out to herd@finra.org.

    If you have any questions or discussion topics, please post them on GitHub Issues.

    GitHub Help

    Evan 'Levi' Allen
    Kapil Agarwal
    Mona Annaparthi
    David Balash
    Michael Chao
    Aniruddha Das
    Sundari Diwakarla
    Shane Ebersole
    Arthur Felde
    Man of Mystery
    Thomas Frank
    Pragnya Gandhi
    Tim Griesbach
    Patricia Hu
    Mahesh Kambli
    Taras Katkov
    Karishma Patel
    Andrew Pach
    Max Seo
    Kumar Siddhartha
    Val Sorokine
    Keni Steward
    Bala Sundaramoorthy
    Sai Suryanarayanan
    Wayne Wang
    Nate Weisz
    Jen Wenner
    Greg Wolff
    Jim Zhang

    Increased Coverage 

    The FINRA developer community is actively supporting the herd project. FINRA has graciously allocated time for their internal development resources to enhance herd –and encourages participation in the open source community.

    http://www.finra.org | http://technology.finra.org

    Want to join FINRA? Visit finra.org/careers