Greetings and thank you for visiting our job post.
Supercoder is an AI-powered career development platform connecting developers worldwide to remote job opportunities with competitive payment.
- Type of work: 100% Remote
Overview:
The client is hiring Python/Linux Engineers to design complex system-level evaluation tasks for LLMs. Design advanced benchmark tasks that evaluate the capabilities of modern Large Language Models (LLMs) such as ChatGPT, Claude, and other AI systems.
This role focuses on building realistic, technically challenging engineering scenarios that test model reasoning, debugging, and problem-solving abilities.
What You Will Do
- Design complex, realistic engineering tasks to evaluate LLM reasoning, coding, debugging, and system understanding.
- Build Python- and Linux-based workflows, pipelines, and multi-step scenarios.
- Create reproducible environments using Python, Shell, and CLI tools.
- Develop tasks that measure code comprehension, debugging, refactoring, and optimization.
- Write clear technical documentation: problem statements, constraints, expected outputs, and detailed edge cases.
- Use LLM tools (ChatGPT, Claude, etc.) to validate tasks and analyze model performance.
Must-Have Qualifications
- 5+ years of professional software development experience.
- Strong Python: modular code design, debugging complex programs, structured codebases.
- Proficiency with Linux, Shell scripting, Bash, and command-line tools.
- Solid technical English writing ability.
- Strong reasoning, analytical thinking, and problem-solving skills.
- Ability to design logical multi-step engineering scenarios.
Nice-to-Have Skills
- Experience creating benchmark datasets, online judge problems, coding tests, or technical challenges.
- Background with ICPC, Codeforces, Kaggle, or competitive programming.
- Familiarity with Docker, Git, and CI/CD pipelines.
- Experience with ML/AI or data-intensive engineering environments.
Who Will Excel in This Role
- Engineers who enjoy designing difficult problems rather than simple feature development.
- Developers who are strong at debugging, identifying subtle issues, and understanding complex system interactions.
- Engineers who work well independently and can define their own approach.
- Individuals interested in LLM evaluation, AI reliability, and technical task design.