Docs

Community Data is an open & extensible repository of over 1M+ people & companies, focused primarily on the startup ecosystem. The data is collaboratively maintained in a public git repository and stored as JSON documents that conform to a collection of JSON Schema definitions.

Data repositoryGitHub

The central data repository is a collection of folders for each of the top-level data models supported in the platform. Each top-level folder is a single level collection of JSON documents that conform to a set of canonical schema definiton.

Schemas

  • A unique individual who has associations with other Community Data entities.
  • A company is a legal entity formed by a group of individuals to engage in and operate a business enterprise in a commercial or industrial capacity.
  • The geographic location of the associated person or company entity according to a predefined list of cities, regions, countries and continents.
  • A company sector is a broad category of companies that produce similar products or services.
  • The company's funding stage according to traditional venture capital funding rounds.
  • A curated list of other entities on Community Data.

Data validation

Data validation is performed against its corresponding JSON Schema on each commit to the repo. Any document that fails validation against the schema will reject the commit. You can find the JSON Schema definitions for each of the canonical data models in the Schemas section.

Data contributions

Anyone can currently contribute data to the repository by simply creating or editing a JSON file. While it is possible to edit the raw JSON in GitHub or a code/text editor, conforming the data to the appropriate schema definitions to ensure it will validate would be difficult. As such, we have created a few mechanisms for end-users to contribute new data to the respository.

Data editorGitHub

The Community Data Editor is a proof-of-concept git interface written as a Next.js application that interacts with the GitHub API. Contributors are required to authenticate with their GitHub accounts to make contributions. The editor provides a simple form for each data model that is generated using the React JSON Schema Form Library. This enables the editor to update automatically as the core JSON Schema definitions evolve.

Data enhancersGitHub

Data enhancers are scripts that consume public APIs or websites, structure the information according to the canonical JSON schemas, and perform mass imports to the repository. For example, there are enhancers that scrape the portfolio pages of venture capital websites to discover new investments and add them to the "investments" collection in the corresponding company JSON document for the investor (or person JSON document if it was an angel investment). Anyone can write data enhancers as standalone private or public scripts or post them to the enhancer repository for further collaboration.

Data ownership

One of the core principles of the Community Data project is that individual data documents can be maintained and owned by select individuals or groups of individuals. Proving ownership over data documents is performed by submitting a verification proof to any of the core disambiguation attributes for the object type. For example, a Person document would require posting a proof to Twitter, LinkedIn or a personal website. The Verification Service will add an owners attribute to the data document that contains the GitHub username for the user who posted the proof. Any pull-request to the data document by any GitHub user in the owners array be merged automatically without need for review.

Data consumption

There are a growing collection of interfaces to consume data from the repository. Since the data is open-source, anyone is free to create their own API on top of the data. The core project contributors also provide a few public and private interfaces for convenience.

APIGitHub

The core contributors maintain a GraphQL API for developers to consume for public or private use. Please see the GraphQL API documentation for more details.

3rd-party applications

There is a growing list of 3rd-party applications built upon Community Data. We encourage any developers to check out the GraphQL API for details on how to consume data for your application. We also encourage any apps to list their projects in the apps repository.

Disambiguation

Disambiguation is handled via any required, unique attributes for any Community Data entity. For Company entities, the website, twitter and linkedin attributes all serve as disambiguation attributes as they identify unique Community Data entities.

Governance

Requested changes to any schemas should be made as pull requests to this repository. The Community Data core contributors will review proposals on a monthly basis during monthly governance calls and will publish meeting minutes within a few days to this repository in the meetings folder.

License

MIT License

Copyright (c) 2024

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.