Struggling to Build an AI That Can Detect “Not Good” vs “Good” Environment Photos

ThirzaOPPO258 · August 11, 2025, 8:33am

Hi everyone,

I’m building an AI application where a user can take a photo of their environment, and the AI should:

Determine if the environment is “Good” or “Not Good”
If “Not Good,” detect and localize the object(s) causing it

However, I’m facing several challenges since my dataset is highly diverse, which is expected with real-world environmental images.
What I’ve Tried

1. Vision-Language Models (Gemma3, LLaVA)

I input the image and ask the VL model: “Is this Good or Not Good?”
I also provide prompts describing what a Good image looks like vs. Not Good.
Result:
- The model can describe the image well.
- But classification is terrible — it almost always says “Not Good,” even when the image is actually Good.
- It feels like the model is “overly cautious” or unable to map the description rules to a binary decision.

2. Image Classification (ConvNeXt)

Built a binary classification dataset: Good / Not Good.
Training loss and accuracy look good.
Result:
- Works in some cases (e.g., empty table = Good).
- But fails in others (e.g., full table that’s still acceptable = classified as Not Good).
- Seems to overfit to simple visual cues like clutter = bad.

3. Object Detection (YOLO)

Labeled Not Good examples with bounding boxes showing the issues.
Trained YOLO to only detect Not Good objects (no detection = Good).
Result:
- Very poor training accuracy.
- I think the main problem is inconsistent bounding boxes — varied size, position, and coverage across images.
- The dataset is too inconsistent for the model to learn clear patterns.

My Challenges

Data variability: “Not Good” situations can look very different.
Subtlety in rules: Some environments are “full” but still acceptable, which confuses binary classifiers

What I’m Looking For

Advice on which model architecture or processing pipeline I should try, along with examples, so that both classification and detection can work effectively.

leahs-model-dump · October 8, 2025, 8:40am

Came across similar use case and i found the best solution for me is set up two task-specific Peft adapters instead.

The first adapter performs binary classification task that gives you good/ no good, and the second adapter do whatever processing you’d need circling out why it’s bad.

Topic		Replies	Views
Combining Multi-Output Image Classification with Object Detection for Environment Inspection Intermediate	1	39	August 16, 2025
[Model Release] AIRealNet — Detecting AI‑Generated vs Real Images Models	0	42	September 30, 2025
Top performer for image classification Beginners	3	50	June 6, 2025
Image analysis and comparison of objects with the database 🤗Transformers	2	156	October 22, 2024
ObjectDetectionOutput 🤗Transformers	0	144	July 4, 2023

Struggling to Build an AI That Can Detect “Not Good” vs “Good” Environment Photos

My Challenges

What I’m Looking For

Related topics