GET /files/{task_id} returning 404 for every task_id with file_name

https://huggingface.co/proxy/agents-course-unit4-scoring.hf.space/files/cca530fc-4052-43b2-b130-b30968d8aa44.png

I am encountering a persistent 404 Not Found error when trying to download a file required for a specific task in the Agents Course, despite having confirmed multiple troubleshooting steps.

  • The error occurs when attempting to access the file via the API (GET /files/{task_id}) and when attempting to download it using an external web browser utility.

  • My personal Hugging Face Space is correctly set to public, confirming that the issue is not related to my local access permissions or authentication."

    Thanks!

1 Like

I think the issue that came up a little while ago has probably been resolved… In this case, adding .png might be causing the error?


1. What the /files/{task_id} endpoint is supposed to do

From the official Unit 4 Hands-On page, the scoring API has: (Hugging Face)

  • GET /questions – returns questions and their task_id.
  • GET /files/{task_id} – returns the file attached to that question (image, audio, spreadsheet, etc.).
  • POST /submit – used only to send answers.

Internally, the scoring Space loads the GAIA dataset (gaia-benchmark/GAIA) and uses its task_id, file_name, and file_path fields. GAIA’s docs show that file_path is a relative path and must be joined with the dataset root: (Hugging Face)

data_dir = snapshot_download("gaia-benchmark/GAIA", repo_type="dataset")
dataset = load_dataset(data_dir, "2023_level1", split="test")
for example in dataset:
    file_path = os.path.join(data_dir, example["file_path"])

So, conceptually:

  1. GAIA: task_id → file_path (relative, inside the dataset).
  2. Scoring Space: converts that to a local path, then serves it via GET /files/{task_id}.

2. Why you get 404 – two layers of causes

You see 404 both from your code and from a browser. With your constraints, the underlying causes are:

Cause A – Using file_name instead of task_id in the URL

  • The path is defined as /files/{task_id} (one segment).
  • Your example URL uses:
    .../files/cca530fc-4052-43b2-b130-b30968d8aa44.png
    which is the file_name, not the pure task_id.
  • The correct URL is:
    .../files/cca530fc-4052-43b2-b130-b30968d8aa44
    (no .png, .mp3, etc.).

If you put the filename in the path instead of the task_id, 404 is expected.

Cause B – Current server-side bug in the scoring Space

Even if you correct the URL to use the bare task_id, there is a separate problem:

  1. Multiple users report that even GET /questions is currently returning 404 from the same scoring Space. That is tracked in a public discussion “404 Error on Agents Course Unit 4 Scoring Page”. (Hugging Face)
  2. GAIA’s maintainers have recently updated the dataset format and explicitly kept file_path as a relative column, intended to be joined with data_dir. (Hugging Face)
  3. The scoring Space logic (from various student templates and mirrors) still assumes it can use file_path directly as a local path. If it checks os.path.exists(file_path) without joining it to the dataset root, it fails to register any mapping task_id → file_path. Then GET /files/{task_id} always returns 404 (“no file path associated for this task”).

So there is substantial external evidence that:

  • The scoring Space is currently mis-resolving GAIA file paths.
  • 404 on /files/{task_id} is happening for many users and many valid tasks, not just for you.

Your own Space being public and your authentication setup do not affect this endpoint. The files are served entirely from the course Space.


3. Clear mapping from symptoms to causes

Your situation, step by step:

  1. You call GET /files/{task_id}:

    • First you tried with ...aa44.png in the URL. That fails because the API expects the pure task_id.
    • When you use only the task_id, you still get 404.
  2. You try with an external HTTP tool or browser:

    • Same 404, showing it is not a client library or CORS issue.
  3. You confirm your own Space is public:

    • That only matters for /submit reading your agent code, not for file download.

Given the current state of the scoring Space and the public reports, the most realistic explanation is:

  • You are calling the route correctly (with task_id), but the scoring Space cannot find any local file path for that task_id because its GAIA file-path handling is out of date.

4. What you can do: concrete, easy-to-apply solutions

Solution 1 – Fix your URL usage (necessary but not sufficient)

Always call the endpoint with the bare task_id:

# correct
GET https://huggingface.co/proxy/agents-course-unit4-scoring.hf.space/files/cca530fc-4052-43b2-b130-b30968d8aa44

If you still get 404 after this change, you are hitting the upstream bug, not a mistake on your side.

Solution 2 – For development: bypass /files and use GAIA directly

Because /files/{task_id} is unreliable right now, the simplest robust approach is:

  1. Accept the GAIA dataset terms.

  2. Download GAIA and build your own mapping:

    import os
    from huggingface_hub import snapshot_download  # https://huggingface.co/datasets/gaia-benchmark/GAIA
    from datasets import load_dataset
    
    data_dir = snapshot_download("gaia-benchmark/GAIA", repo_type="dataset")
    ds = load_dataset(data_dir, "2023_level1", split="validation")
    
    id_to_path = {}
    for ex in ds:
        if ex["file_path"] and ex["file_name"]:
            full_path = os.path.join(data_dir, ex["file_path"])
            if os.path.exists(full_path):
                id_to_path[ex["task_id"]] = full_path
    
  3. In your agent:

    • Use id_to_path[task_id] to open attachments directly (images, audio, spreadsheets, etc.).
    • Do not depend on the scoring Space to serve files.

You can still use POST /submit to get scored; file serving is only needed for your agent’s reasoning, not for the scoring endpoint itself. (Hugging Face)

Solution 3 – Optional: host your own copies

If you want HTTP URLs in your agent instead of local paths:

  1. From GAIA, collect just the attachment files referenced in the 20 questions used by the course.

  2. Put those files in:

    • Your Space’s data/ folder, or
    • A gated/private dataset you control, respecting GAIA’s terms. (Hugging Face)
  3. Have your agent fetch from these URLs instead of the broken /files/{task_id} route.

Solution 4 – Treat the 404 as an upstream issue, not your bug

Given the public “404 Error” discussion and GAIA’s recent changes, this is very likely a temporary course-infrastructure problem. (Hugging Face)

Practically:

  • Log the 404 in your agent as “file not available from official API”.
  • Fall back to local GAIA files or your own hosted copies.
  • Optionally, watch or comment in the Unit 4 discussion so you know when the official /files/{task_id} is fixed.

5. Very short recap

  • Your permissions and Space visibility are not the problem.
  • You must use GET /files/{task_id} with the bare task_id, not the file_name.
  • Even with the correct task_id, the official scoring Space is currently mis-configured and often returns 404 for valid tasks, because of how it handles GAIA’s file_path.
  • Work around it by loading GAIA directly (or hosting the needed files yourself) and only using the scoring API for /submit.