update: readme
Browse files
README.md
CHANGED
|
@@ -3,7 +3,7 @@
|
|
| 3 |
<div align="center">
|
| 4 |
<table>
|
| 5 |
<tr>
|
| 6 |
-
<td><img src="
|
| 7 |
<td><h1>Semantic Motion Generation (SMoG): <br>A PyTorch Implementation</h1></td>
|
| 8 |
</tr>
|
| 9 |
</table>
|
|
@@ -39,7 +39,7 @@ This implementation:
|
|
| 39 |
|
| 40 |
## Results
|
| 41 |
|
| 42 |
-
 or dissimilarity (negative), maximizing positive pair similarity and minimizing negative pair similarity.
|
|
@@ -138,7 +138,7 @@ During the forward pass, inputs are processed by both branches: the base branch
|
|
| 138 |
|
| 139 |
| Wings | Swan Lake | Running | Opened the door and walked in | Lift the weights |
|
| 140 |
|:------------------------------------------------------------------:|:--------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------------------------------:|----------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------|
|
| 141 |
-
| <img src="
|
| 142 |
|
| 143 |
|
| 144 |
# Usage
|
|
|
|
| 3 |
<div align="center">
|
| 4 |
<table>
|
| 5 |
<tr>
|
| 6 |
+
<td><img src="visuals/smog_logo.png" width="600"></td>
|
| 7 |
<td><h1>Semantic Motion Generation (SMoG): <br>A PyTorch Implementation</h1></td>
|
| 8 |
</tr>
|
| 9 |
</table>
|
|
|
|
| 39 |
|
| 40 |
## Results
|
| 41 |
|
| 42 |
+

|
| 43 |
</div>
|
| 44 |
|
| 45 |
|
|
|
|
| 101 |
## 🚶 SMoG Model
|
| 102 |
|
| 103 |
|
| 104 |
+

|
| 105 |
|
| 106 |
|
| 107 |
**SMoG** is built on MotionCLIP — a 3D motion autoencoder trained to reconstruct poses using natural language. It employs a latent space representing abstract, compressed data features non-trivially present in the input space. Visualizing the latent space reveals points clustered by similarity. This approach reduces reliance on classical data labeling by using contrastive learning to distinguish similarity, identity, or difference between text-motion pairs. During training, action-text pairs are matched for similarity (positive) or dissimilarity (negative), maximizing positive pair similarity and minimizing negative pair similarity.
|
|
|
|
| 138 |
|
| 139 |
| Wings | Swan Lake | Running | Opened the door and walked in | Lift the weights |
|
| 140 |
|:------------------------------------------------------------------:|:--------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------------------------------:|----------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------|
|
| 141 |
+
| <img src="visuals/gen_0_0.gif"/> | <img src="visuals/gen_0_3.gif"/> | <img src="visuals/gen_2_2.gif"/> | <img src="visuals/gen_4_2.gif"/> | <img src="visuals/gen_5_2.gif"/> |
|
| 142 |
|
| 143 |
|
| 144 |
# Usage
|