> Coding Benchmarks
- BigCodeBench
- Provide a continuously updated, contamination-free benchmark by evaluating models on coding problems released after their training cutoffs to avoid data leakage
- Tests: Code generation, automated self-repair, code execution, and prediction of test outputs on real contest problems from LeetCode, AtCoder, and CodeForces
- LiveCodeBench
- Assess LLMs on practical, challenging programming tasks at the function level to reflect real-world coding complexity beyond synthetic examples
- Tests: Instruction following for detailed docstring requirements, composition of multiple library calls across 139 libraries, complex reasoning with an average of 5.6 test cases per task ensuring 99% branch coverage
- SWE-Bench ( _ , multimodal, agent)
> Why do need static analysis of code
> Special Features of Rust
- Ownership and Borrowing
- Ownership: This feature ensures every value in rust has a single owner and when the owner goes out of scope, the value is dropped (memory freed)
- Borrwoing: Temporarily “lend” access to a value via Immutable borrows and Mutable borrows
- These rules together guarantee things like no null/dangling pointers and no data races: all without a garbage collector—but they also produce some of Rust’s most challenging compiler errors (especially around lifetimes and overlapping borrows)
> How to ensure memory safety and reliability - how do you enforce those quality in your translated code
> How to solve ownership and borrowing
- Rich Context Extraction
- The reflection loop dynamically selects the context extraction based on the error type. For example: some error which are syntactical can be fixed by passing only the erroneous line of code, while other required the entire function or 20 lines of code above and below it
- Change log: This section of the pipeline handles the correction in the code
- Finetuning
> Difference between cargo check vs rustc
Cargo: Performs only semantic analysis (parsing, name resolution, borrow-checking, type-checking, macro expansion, MIR generation) but stops before code generation and linking. It verifies that your entire Cargo project (including dependencies, feature flags, build scripts, and workspace members) is correct and up-to-date.
Rustc: It is a low-level compiler driver which runs all compilation stages
- Development: cargo check for rapid feedback
- Runnable binaries: cargo build (which under the hood invokes
rustc
)
- One-off compilation of single file:
rustc file.rs
> Structured Output