Philip Kehl commited on
Commit
6a05eb6
·
1 Parent(s): bdb093b

edit readme with dataset, add gitignore

Browse files
Files changed (2) hide show
  1. .gitignore +53 -0
  2. README.md +1 -1
.gitignore ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Project specific
2
+ drugs.csv
3
+
4
+ # macOS system files
5
+ .DS_Store
6
+ .AppleDouble
7
+ .LSOverride
8
+ ._*
9
+
10
+ # Node.js
11
+ node_modules/
12
+ npm-debug.log
13
+ yarn-debug.log*
14
+ yarn-error.log*
15
+
16
+ # Python
17
+ __pycache__/
18
+ *.py[cod]
19
+ *$py.class
20
+ .env
21
+ .venv
22
+ env/
23
+ venv/
24
+ ENV/
25
+ *.egg-info/
26
+ dist/
27
+ build/
28
+
29
+ # IDE specific files
30
+ .idea/
31
+ .vscode/
32
+ *.swp
33
+ *.swo
34
+ *.swn
35
+ *.bak
36
+
37
+ # Logs and databases
38
+ *.log
39
+ *.sqlite
40
+ *.db
41
+
42
+ # Environment variables
43
+ .env
44
+ .env.local
45
+ .env.*.local
46
+
47
+ # Compiled files
48
+ *.com
49
+ *.class
50
+ *.dll
51
+ *.exe
52
+ *.o
53
+ *.so
README.md CHANGED
@@ -5,7 +5,7 @@ Project of the Modeling and Scaling of Generative AI Systems lecture at the Univ
5
  The project aims to transform images of analog medication lists (e.g., handwritten or printed lists) into structured digital formats.
6
  This involves several key steps:
7
  - Image to Text Conversion: Utilizing a pre-trained docling model to extract text and tables from images.
8
- - Mapping to Vocabulary: Converting the extracted text into a predefined vocabulary of medications.
9
  - Transform to Structured Format: Organizing the mapped data into a structured format such as JSON or CSV for further processing.
10
 
11
  The project is oriented on the Granit Docling WebGPU demo on huggingface (https://huggingface.co/spaces/ibm-granite/granite-docling-258M-WebGPU).
 
5
  The project aims to transform images of analog medication lists (e.g., handwritten or printed lists) into structured digital formats.
6
  This involves several key steps:
7
  - Image to Text Conversion: Utilizing a pre-trained docling model to extract text and tables from images.
8
+ - Mapping to Vocabulary: Converting the extracted text into a predefined vocabulary of medications. As a predefined vocabulary we use a csv-file with all FDA Drugs, available at https://www.kaggle.com/datasets/protobioengineering/united-states-fda-drugs-feb-2024.
9
  - Transform to Structured Format: Organizing the mapped data into a structured format such as JSON or CSV for further processing.
10
 
11
  The project is oriented on the Granit Docling WebGPU demo on huggingface (https://huggingface.co/spaces/ibm-granite/granite-docling-258M-WebGPU).