evaluating-code-models by davila7

Evaluates code generation models across HumanEval, MBPP, MultiPL-E, and 15+ benchmarks with pass@k metrics. Use when benchmarking code models, comparing coding abilities, testing multi-language support, or measuring code generation quality. Industry standard from BigCode Project used by HuggingFace leaderboards.

Coding

15.7K Stars

1.4K Forks

Updated Jan 12, 2026, 05:31 AM

Why Use This

This skill provides specialized capabilities for davila7's codebase.

Use Cases

Developing new features in the davila7 repository
Refactoring existing code to follow davila7 standards
Understanding and working with davila7's codebase structure

Install Guide

2 steps

1

Download Ananke

Skip this step if Ananke is already installed.
2

Install inside Ananke

Click Install Skill, paste the link below, then press Install.

https://github.com/davila7/claude-code-templates/tree/main/cli-tool/components/skills/ai-research/evaluation-bigcode-evaluation-harness

Skill Snapshot

Auto scan of skill assets. Informational only.

Valid SKILL.md

Checks against SKILL.md specification

Source & Community

Repository claude-code-templates

Skill Version

main

Community

15.7K 1.4K

Updated At Jan 12, 2026, 05:31 AM

Skill Stats

SKILL.md 406 Lines

Total Files 1

Total Size 0 B

License MIT

Source

GitHub Repository ↗ Commit main ↗ skill.extrachatgpt.com ↗