Herd - a managed data lake for the cloud
The Herd unified data catalog helps separate storage from compute in the cloud. Manage petabytes of data and make it accessible for data processing and analytical purposes by any cloud compute platform.


Meta-data Catalog

Unified Data Catalog

A centralized, auditable catalog for operational usage and data governance.

Decreased Manual Effort

Track Lineage

Capture data ancestry for regulatory, forensic, and analytical purposes

Increased Coverage

Manage Clusters

Create and launch clusters; load data into clusters from catalog entries

Increased Coverage

Orchestrate Jobs

Orchestrate clusters and catalog services to automate processing jobs


Just released! Herd-UI, a search and discovery tool for business and technical users.

 
Getting Started

      What is herd?

      Demo Installation

      Quick Start to Registering Data

 
Get Involved

   Contribute

  1. We encourage everyone who has an idea to fork the code, experiment and share their experiences with us through our GitHub Issues.

  2. If you believe that you have a worthwhile contribution, please open an issue on GitHub and explain your idea.

  3. The herd team will review your idea and prioritize or start a discussion about the issue.

  4. If the issue is agreed upon, start coding.

    • Remember to write unit tests to maintain our code coverage.

    • But make sure you don't have any passwords or encryption keys from your environment in your code!

  5. Once you have written your code, please make sure to sign off your work when you commit it.

    git commit -s -m 'YOUR COMMIT DESCRIPTION'

    git commit --signoff -m 'YOUR COMMIT DESCRIPTION'
    When you signoff, you are agreeing to the following:
    Developer's Certificate of Origin (adapted from the linux kernel)
    By making a contribution to this project, I certify that:
    (a). The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or
    (b). The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; or
    (c). The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it.
    (d). I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the open source license(s) involved.

  6. Once you have completed the commit, you can make a pull request referencing the initial issue created for the work.

  7. The herd team will process your pull request in a timely fashion and may perform additional testing. You will receive feedback or a notification that your code will be accepted with an indication of the timeframe for acceptance.

  8. All activity regarding the issue including contributor and herd team discussions will occur within the GitHub issues system, so check back frequently and watch issues that interest you.

   Report Bugs

    When you encounter bugs, please try to find out if the issue has already been reported by searching here. Once you identify that your issue is new, create a new ticket. Please make sure you provide a clear description of the issue along with a testable scenario.

    If you want to get involved in fixing bugs, first read the steps outlined above under "Contribute". Then, you can assign yourself to the bug as the owner and start developing your code. Once you complete build and test of your code, please make a pull request for review. Once the review is complete, your code fix will be merged into master branch within the herd repository.

   Communicate

    We are actively seeking organizations and individuals that are interested in adopting herd and contributing to the development effort. If you want to get involved, you can start by getting to know herd on the wiki or the GitHub project. Or don't hesitate to reach out to herd@finra.org.

    If you have any questions or discussion topics, please post them on GitHub Issues.

    GitHub Help


About
    Team
    Evan 'Levi' Allen
    Linkedin
    Kapil Agarwal
    Linkedin
    Mona Annaparthi
    David Balash
    Linkedin
    Michael Chao
    Linkedin
    Aniruddha Das
    Linkedin
    Sundari Diwakarla
    Linkedin
    Shane Ebersole
    Linkedin
    Arthur Felde
    Man of Mystery
    Thomas Frank
    Linkedin
    Pragnya Gandhi
    Tim Griesbach
    Linkedin
    Patricia Hu
    Linkedin
    Mahesh Kambli
    Taras Katkov
    Linkedin
    Karishma Patel
    Linkedin
    Andrew Pach
    Linkedin
    Max Seo
    Linkedin
    Kumar Siddhartha
    Linkedin
    Val Sorokine
    Linkedin
    Keni Steward
    Linkedin
    Bala Sundaramoorthy
    Sai Suryanarayanan
    Linkedin
    Wayne Wang
    Linkedin
    Nate Weisz
    Linkedin
    Jen Wenner
    Linkedin
    Greg Wolff
    Linkedin
    Jim Zhang
    Linkedin

    Sponsor
    Increased Coverage 

    The FINRA developer community is actively supporting the herd project. FINRA has graciously allocated time for their internal development resources to enhance herd –and encourages participation in the open source community.

    http://www.finra.org | http://technology.finra.org

    Want to join FINRA? Visit finra.org/careers