--- title: Normalized ConceptNet Explorer emoji: ⚡ colorFrom: green colorTo: blue sdk: gradio sdk_version: 5.49.1 app_file: app.py pinned: true license: cc-by-sa-4.0 tags: - conceptnet - knowledge-graph - sqlite - normalized - gradio - fast-queries --- # ⚡ Normalized ConceptNet Explorer (V7) This application is a high-performance explorer for a normalized, filtered, and optimized version of the ConceptNet 5.5 knowledge graph. It is designed to be **extremely fast**, returning queries in milliseconds instead of minutes. It queries a 1.78 GB optimized SQLite database with integer-based joins, not the 23.6 GB un-normalized file. ## Features This app provides a full suite of tools to explore the normalized database: - **⚡ Semantic Profile**: Explore relations for any word in real-time. This now runs in ~4 fast SQL queries instead of 24+ slow ones. - **⚡ Query Builder**: Build custom queries (start node, relation, end node) that are executed with fast, integer-based joins. - **⚡ Raw SQL**: Execute SQL queries directly against the new, normalized database schema (see schema below). - **⚡ Schema**: Browse the new, efficient database schema, including all tables, indexes, and row counts. ## How It Works: The Normalized Database This app's speed and correctness come from the new database it queries: [cstr/conceptnet-normalized-multi](https://huggingface.co/datasets/cstr/conceptnet-normalized-multi). This database was created by a V7 normalization script that fixed critical issues found in the original data: 1. **Normalization (Speed & Size)**: The original 23.6 GB `edge` table (34M rows) was bloated with text URLs. The new 1.78 GB `edge_norm` table replaces these with tiny integer foreign keys. 2. **Data Correctness (V7 Fix)**: The original `node` table (28M rows) was used as the source of truth. We migrated all 28M nodes and their authoritative `language` columns. 3. **Preserves Cross-Language Links**: The 34M edges were filtered to keep any edge where at least one node (start or end) was in our 11 target languages (`en`, `de`, `fr`, `it`, `es`, `ar`, `fa`, `grc`, `he`, `la`, `hbo`). This is critical, as it correctly preserves cross-language connections (e.g., `犬 (ja) -> hund (de)`), which were broken in previous attempts. The result is a clean, fast, and data-correct database that contains all relevant connections for our target languages. ## Supported Languages This normalized version includes edges for 11 languages: - English (en) - German (de) - French (fr) - Italian (it) - Spanish (es) - Arabic (ar) - Persian (fa) - Ancient Greek (grc) - Hebrew (he) - Latin (la) - Biblical Hebrew (hbo) Cross-language connections from other languages to these target languages are preserved. ## Original Dataset Information This work includes data from ConceptNet 5, which was compiled by the Commonsense Computing Initiative. ConceptNet 5 is freely available under the Creative Commons Attribution-ShareAlike license (CC BY SA 4.0) from http://conceptnet.io. For a full list of licenses and attributions for included resources such as WordNet, Open Multilingual WordNet, and Wikimedia projects, please see the original dataset card. ## Citation Information If you use this data in your work, please cite the original ConceptNet 5.5 paper: ```bibtex @inproceedings{speer2017conceptnet, author = {Robyn Speer and Joshua Chin and Catherine Havasi}, title = {ConceptNet 5.5: An Open Multilingual Graph of General Knowledge}, booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence}, year = {2017}, pages = {4444--4451}, url = {http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14972} } ```