Spaces:
Sleeping
Sleeping
HarleyCoops
commited on
Commit
·
a299ff0
1
Parent(s):
6500f68
Clarify Grammar Gym roadmap
Browse files
README.md
CHANGED
|
@@ -46,3 +46,19 @@ This interface provides insights into Christian H. Cooper's groundbreaking work
|
|
| 46 |
## Updates
|
| 47 |
|
| 48 |
**March 8th, 2025**: Updated the Gemini model name to the latest version and refreshed the API key for improved performance and reliability.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
## Updates
|
| 47 |
|
| 48 |
**March 8th, 2025**: Updated the Gemini model name to the latest version and refreshed the API key for improved performance and reliability.
|
| 49 |
+
|
| 50 |
+
## Grammar Gym & RL Training Roadmap
|
| 51 |
+
|
| 52 |
+
Work on the Stoney Grammar Gym is in the research-and-design stage. The latest design notes summarize how the existing dictionary work will eventually flow into a reinforcement-learning loop based on verifier-style reward functions:
|
| 53 |
+
|
| 54 |
+
- **Pipeline concept** – Extract rules from the grammar PDF, curate them, and generate task datasets for GRPO-style training with custom verifiers.
|
| 55 |
+
- **Reward coverage** – Plan for multi-dimensional rewards (letter accuracy, word accuracy, semantic similarity, edit distance) to reflect cultural nuance rather than single-score grading.
|
| 56 |
+
- **Integration target** – Re-use the same bilingual dataset plumbing so Grammar Gym training artifacts can live alongside the fine-tuning JSONL files published to the community.
|
| 57 |
+
|
| 58 |
+
Although no runnable Grammar Gym scripts ship with this Space yet, the specification is ready for implementation. The next development sprint should focus on:
|
| 59 |
+
|
| 60 |
+
1. Building the extraction tooling (`pdf_ingest.py`, `rule_extractor.py`, `rule_organizer.py`, `task_generator.py`) exactly as defined in the design document.
|
| 61 |
+
2. Wiring the generated tasks into a verifiers-compatible environment and standing up GRPO training experiments.
|
| 62 |
+
3. Publishing artifacts (rules, tasks, training telemetry) back into the public dataset so community reviewers can audit each stage.
|
| 63 |
+
|
| 64 |
+
Once those deliverables are in place, we can expand the README with concrete execution instructions and add automation hooks so the Space surfaces the latest RL progress inside the UI.
|