There are 4 Laravel fine-tuned models on HuggingFace. Zero published benchmarks. I tested them all against a base model with no fine-tuning. The base model won.

I searched HuggingFace for every published model fine-tuned on Laravel data. Found 4 real ones (plus a dozen empty placeholder repos). Downloaded them. Tested them on 4 Laravel code generation tasks. Then tested a general-purpose Qwen2.5-Coder-14B with zero Laravel-specific training.

The general model won. Not by a little — it scored 10/12 where the best fine-tuned model scored 6/12.

Nobody had published this comparison before. Here's the full data.

The Models

The Laravel fine-tuning landscape on HuggingFace is Ryan Yannelli's work — he created both the training datasets and the models:

+------------------------------+--------------+---------------------+-----------+
| Model                        | Base         | Training Data       | Downloads |
+------------------------------+--------------+---------------------+-----------+
| Laravel-11-Llama-3.1-8B      | Llama 3.1 8B | 14K doc Q&A pairs   | 508       |
| Laravel-11-Llama-3.2-1B      | Llama 3.2 1B | 14K doc Q&A pairs   | 264       |
| Qwen2.5-Coder-14B (baseline) | —            | No Laravel training | —         |
+------------------------------+--------------+---------------------+-----------+

The training data (yannelli/laravel-11-qa) is 12,649 Q&A pairs generated from Laravel 11 documentation using OpenAI and Anthropic models. Plus 1,450 long-form pairs.

The Test

Four prompts, each a common Laravel task:

  1. Eloquent Model: Order with User belongsTo, OrderItems hasMany, soft deletes, casts
  2. Form Request: Blog post validation with exists rules, array validation, future date
  3. Middleware: Subscription check, 403 JSON response
  4. Migration: Products table with FK, soft deletes, timestamps

System prompt: "You are a senior Laravel developer. Output only code, no explanations."

Temperature: 0.3. Max tokens: 2000.

Full Results with Code

Eloquent Model

Laravel 1B (fine-tuned on 14K Q&A):

class Order extends Model
{
    public function belongsTo(): BelongsTo  // WRONG - method name IS the relationship name
    {
        return $this->belongsTo(User::class);
    }
    // Missing: SoftDeletes, casts, fillable
    // Hallucinated: Decimal::of('price', 2)->toFloat()
}

Laravel 8B (fine-tuned on 14K Q&A):

class Order extends Model
{
    // Hallucinated: morphOne(Cashier::class, 'totalable')->total()
    // Hallucinated: properties() method that doesn't exist in Laravel
    // Missing: SoftDeletes, casts, fillable
}

Qwen 14B (no fine-tuning):

class Order extends Model
{
    use HasFactory, SoftDeletes;
    protected $fillable = ['user_id', 'total_cents', 'metadata'];
    protected $casts = ['total_cents' => 'integer', 'metadata' => 'array'];
    public function user(): BelongsTo { return $this->belongsTo(User::class); }
    public function orderItems(): HasMany { return $this->hasMany(OrderItem::class); }
}

Winner: Qwen 14B. Every feature requested, correct syntax, correct patterns.

Form Request

Laravel 1B: Hallucinated $this->emaker()->validator(), invented validation rules like mustExistInCategories.

Laravel 8B: Used 'future' as a validation rule (doesn't exist in Laravel), missing authorize() method.

Qwen 14B: Correct authorize(), correct array syntax rules, custom closure for tags validation, proper after:today rule.

Winner: Qwen 14B.

Middleware

Laravel 1B: Wrote the middleware correctly, then filled the remaining 1,800 tokens with the same code block repeated 20+ times. Classic degenerate repetition.

Laravel 8B: Clean, concise, correct. Used nullsafe operator (?->), proper 403 JSON response.

Qwen 14B: Correct but more verbose. Included kernel registration code.

Winner: Laravel 8B — cleanest output.

Migration

Laravel 1B: Wrong $table->unique('slug') syntax, description not nullable, used BigInteger instead of Integer.

Laravel 8B: Correct. Modern anonymous class, proper ->constrained() FK, even added a price comment.

Qwen 14B: Correct but used older named class style.

Winner: Laravel 8B.

The Scorecard

+----------------+------------+------------+-----------+
| Task           | Laravel 1B | Laravel 8B | Qwen 14B  |
+----------------+------------+------------+-----------+
| Eloquent Model | 0/3        | 0/3        | **3/3**   |
| Form Request   | 0/3        | 0/3        | **3/3**   |
| Middleware     | 0/3        | **3/3**    | 2/3       |
| Migration      | 1/3        | **3/3**    | 2/3       |
| **Total**      | **1/12**   | **6/12**   | **10/12** |
+----------------+------------+------------+-----------+

Why the Fine-Tuned Models Lost

The fine-tuned models were trained on documentation Q&A — questions about Laravel concepts with tutorial-style answers. They learned to explain Laravel, not to write Laravel code.

I confirmed this by prompting the 1B in its native training format:

  • "How do I define a hasMany relationship?" → Good tutorial with correct embedded code (7–8/10)
  • "Create a Post model with hasMany Comments" → Sometimes perfect, sometimes hallucinated (3–10/10)

It's a documentation assistant, not a code generator. When prompted the way it was trained (short questions), it performs well. When asked to generate production code, it reverts to doc patterns and hallucinates.

The RAG Discovery

Then I tried something: injecting Laravel documentation directly into the system prompt before asking for code.

Laravel 8B — Form Request WITHOUT docs:

  • Used invalid 'future' rule, missing authorize(), missing tags.*

Laravel 8B — Form Request WITH docs:

  • Perfect — correct authorize(), array syntax, tags.*, after:now

Laravel 1B — Form Request WITHOUT docs:

  • Hallucinated everything

Laravel 1B — Form Request WITH docs:

  • Nearly perfect — mirrored the documentation examples correctly

Docs in context turned a broken 1B into a functional code generator. RAG may be a viable alternative to fine-tuning for these models — inject the relevant docs and let the model copy the pattern.

What This Means For You

Don't assume "fine-tuned for X" means "good at X." The training data format matters more than the label on the model card. A model fine-tuned on Q&A won't write code. A model fine-tuned on code won't explain concepts.

Base model quality matters enormously. Qwen2.5-Coder-14B's code-specialized pretraining gave it better Laravel code than Laravel-specific fine-tuning on a general chat model.

If you need framework-specific code, try RAG first. Inject the relevant documentation into the prompt. It's faster than fine-tuning and works surprisingly well.

All raw outputs available at huggingface.co/fchis. Models tested: yannelli/Laravel-11-Llama-3.1–8B-Instruct-GGUF (Q4_K_M), yannelli/Laravel-11-Llama-3.2–1B-Instruct-GGUF (Q4_K_M), Qwen2.5-Coder-14B-Instruct (Q4_K_M). Hardware: M2 Pro 16GB, llama.cpp.