# proteasome

**Repository Path**: mirrors_alibaba/proteasome

## Basic Information

- **Project Name**: proteasome
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-05-29
- **Last Updated**: 2025-09-13

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# 1. Testing The Framework

This section is intended for framework developers.
After modifying the framework's core code, you can test it using mocked datasets by executing these commands:

```shell
# Generate mocked datasets:
# - Creates "_mocked_benchmark" case set in "data/case_set/"
# - Creates "_mocked_samples" sample set in "data/sample_set/"
# - Creates "_mocked_eval_task" evaluation task in "data/eval_task/"
python mock.py -case -sample -task

# Execute evaluation (mock mode):
# 1. Tasks prefixed with "_mocked_" will run in mock mode
# 2. Runtime profiles will be generated in "data/eval_task/task_running/"
# 3. Final results will be stored in "data/result/"
python evaluate.py _mocked_eval_task

# Cleanup mocked assets (optional):
# Note: This removes only the mocked datasets/tasks, not runtime profiles/results
python mock.py clean -case -sample -task
```

# 2. Creating The Case Set

Before evaluating, a case set must be created unless you have a case set already.
Create a case set by the following command:

```shell
# The name "my_benchmark" should be replaced with an appropriate name.
# Note: This command will ask you for the tile and description of the new case set, you can input any text for them.
python create_case_set.py my_benchmark
```

# 3. Making A Case

This section provides an example of how to create cases of creating resource.

## 3.1. Creating Workshop Directory

Create a directory named `workshop_of_case`.

## 3.2. Initializing Environment

Navigate into the `workshop_of_case` directory and create an HCL source file - e.g., `main.tf` as shown below:

```hcl
# A provider block is required. You can specify the provider requirement block if needed.
provider "alicloud" {
  region = "cn-shanghai"
}
```

Next, run the `init` and `apply` commands to initialize the environment:

```shell
# The init command downloads the provider plugins and creates the .terraform.lock.hcl file.
tofu init

# The apply command generates the Terraform state file.
tofu apply
```

At this point, the environment is initialized and ready for resource creation.

## 3.3. Saving The Pre-Operation Environment

Run the `save_env.py` script to capture the environment state before performing any operations:

```shell
python save_env.py pre_op
```

This will create a directory named `pre_op` inside the workshop directory,
containing the necessary files to represent the pre-operation environment state.

## 3.4. Creating An OSS Bucket

Update the `main.tf` file as follows:

```hcl
provider "alicloud" {
  region = "cn-shanghai"
}

# Declare an OSS bucket.
resource "alicloud_oss_bucket" "demo_bucket" {
  bucket = "demo-bucket-2025-0507-1348"
  acl = "public-read"
}
```

Then, run the `apply` command to create the OSS bucket:

```shell
tofu apply
```

## 3.5. Saving The Post-Operation Environment

Run the `save_env.py` script to capture the environment state after performing any operations:

```shell
python save_env.py post_op
```

This will create a directory named `post_op` inside the workshop directory,
containing the necessary files to represent the post-operation environment state.

## 3.6. Preparing A Manifest File

Create a file named manifest.yaml in the workshop directory to define the test cases. Below is an example:

```yaml
resource_type: alicloud_oss_bucket
operation_type: create
singularity: s0
essentials:
  - >
    count(alicloud_oss_bucket) == 1
  - >
    opt("alicloud_oss_bucket_acl") is None or count(alicloud_oss_bucket_acl) == 1
  - >
    provider_of(first_of(alicloud_oss_bucket)).region == "cn-shanghai"
  - >
    opt("alicloud_oss_bucket_acl") is None or
    provider_of(first_of(alicloud_oss_bucket_acl)).region == "cn-shanghai"
  - >
    first_of(alicloud_oss_bucket).bucket == "demo-bucket-2025-0507-1348"
  - >
    len(list(filter(
        lambda x: x is not None,
        {
            first_of(alicloud_oss_bucket).opt("acl"),
            first_of(opt("alicloud_oss_bucket_acl"))
        }
    ))) == 1
  - >
    first_of(alicloud_oss_bucket).opt("acl") == "public-read" or
    opt("alicloud_oss_bucket_acl") is not None and
    first_of(alicloud_oss_bucket_acl).opt("acl") == "public-read"
  - >
    opt("alicloud_oss_bucket_acl") is None or first_of(alicloud_oss_bucket_acl).bucket ==
    "${" + first_of(alicloud_oss_bucket).self_path() + ".bucket}"
  - >
    first_of(alicloud_oss_bucket).opt("storage_class") in {None, "Standard"}
  - >
    first_of(alicloud_oss_bucket).opt("lifecycle_rule") is None
actions:
  side_effect: deny
  required:
    - action: create
      pattern: alicloud_oss_bucket.*
      count: 1
  optional:
      - action: create
        pattern: alicloud_oss_bucket_acl.*
        count: 1
      - action: create
        pattern: alicloud_oss_bucket_versioning.*
        count: 1
security: []
misc:
  - >
    alicloud_oss_bucket.opt("demo_bucket") is not None
  - >
    alicloud_oss_bucket.demo_bucket.opt("versioning") is None or
    alicloud_oss_bucket.demo_bucket.versioning.opt("status") == "Suspended"
  - >
    opt("alicloud_oss_bucket_versioning") is None or
    first_of(alicloud_oss_bucket_versioning).bucket ==
    "${" + first_of(alicloud_oss_bucket).self_path() + ".bucket}" and
    first_of(alicloud_oss_bucket_versioning).opt("status") in {None, "Suspended"}
  - >
    alicloud_oss_bucket.demo_bucket.opt("cors_rule") is None
  - >
    alicloud_oss_bucket.demo_bucket.opt("website") is None
  - >
    alicloud_oss_bucket.demo_bucket.opt("logging") is None
  - >
    alicloud_oss_bucket.demo_bucket.opt("server_side_encryption_rule") is None
  - >
    alicloud_oss_bucket.demo_bucket.opt("transfer_acceleration") is None
  - >
    alicloud_oss_bucket.demo_bucket.opt("redundancy_type") is None or
    alicloud_oss_bucket.demo_bucket.redundancy_type == "LRS"
  - >
    alicloud_oss_bucket.demo_bucket.opt("access_monitor") is None or
    alicloud_oss_bucket.demo_bucket.access_monitor.opt("status") == "Disabled"
user_inputs:
  i3: >
    Create an Alibaba Cloud OSS bucket in the cn-shanghai region.
    Name the bucket demo-bucket-2025-0507-1348, and set the resource block name to demo_bucket.
    Set the ACL to public-read.
    Do not enable versioning or lifecycle management.
  i2: >
    Create an Alibaba Cloud OSS bucket in the cn-shanghai region.
    Name the bucket demo-bucket-2025-0507-1348.
    Set the ACL to public-read.
  i1: >
    Create an OSS bucket named demo-bucket-2025-0507-1348 in Shanghai that anyone can read.
  i0: >
    Create an OSS bucket.
clarity: [c1, c0]
```

Each case is defined by a combination of five dimensions:
**resource type**, **operation type**, **integrity**, **clarity**, and **singularity**.
Given the combinations specified in this `manifest.yaml`, up to 8 unique cases can be described.

## 3.7. Saving The Cases To The Case Set

Use the `save_cases.py` script to save the cases defined in the workshop directory to a specified case set:

```shell
# "my_benchmark" refers to the name of an existing case set.
python save_cases.py my_benchmark
```

After saving the case, restore the real environment to the state captured in the "pre_op" snapshot.

Now that the cases have been created, you can repeat the steps above to add more cases as needed.

# 4. Setting Samples and Evaluation Task

Create a sample set file in `data/sample_set`. e.g. `samples_testing.yaml`:

```yaml
kind: sample_set
sample_set:
  title: Sample For Testing
  description: This is the sample set is for testing.
  samples:
    - [ qwen-max-2025-01-25, builtin_agent.d1.e1 ]
    - [ deepseek-chat, builtin_agent.d0.e1 ]
```

In the samples section of the YAML file, each sample is a pair consisting of a model name and a decorated agent name.

- The model name refers to a specific LLM.
- The decorated agent name includes the agent's name and two feature flags:
    - "d0": the agent does not support documentation.
    - "d1": the agent does support documentation.
    - "e0": the agent is not aware of the IaC environment.
    - "e1": the agent is aware of the IaC environment.
    - "es": the agent is only aware of the IaC state.
    - "ec": the agent is only aware of the IaC code (HCL).

The `builtin_agent` is a special type of agent that can enable or disable specific features using feature flags.

At last, create an evaluation task file in `data/eval_task`, e.g. `eval_testing.yaml`:

```yaml
kind: eval_task
eval_task:
  case_set: my_benchmark
  sample_set: samples_testing
  repeats: 3
  iac_tool:
    name: tofu
    version: 1.9.0
    path: /opt/homebrew/bin/tofu
```

Key fields explained:

- case_set: The name of the case set to be evaluated.
- sample_set: The name of the sample set to be used in this evaluation.
- repeats: The number of times each case will be evaluated.
- iac_tool: Configuration for the Infrastructure-as-Code (IaC) tool, including:
  - name: The name of the IaC tool (e.g., tofu)
  - version: The version to be used
  - path: The full path to the executable

# 5. Before Evaluating

Before running the evaluation, make sure the following preparations are complete:

- **Provider credentials**: Store the credentials for the provider plugins in environment variables (e.g., ALICLOUD_ACCESS_KEY, ALICLOUD_SECRET_KEY).

- **Configuration file**: Create or update the config.yaml file. Example:
  ```yaml
  eval_concurrency: 4
  models:
    - name: qwen-max-2025-01-25
      endpoint: https://dashscope.aliyuncs.com
      base_url_path: /compatible-mode/v1
      secret_key_env_var: LLM_SK
    - name: qwen-plus-2025-01-25
      endpoint: https://dashscope.aliyuncs.com
      base_url_path: /compatible-mode/v1
      secret_key_env_var: LLM_SK
    - name: qwen-turbo-2025-02-11
      endpoint: https://dashscope.aliyuncs.com
      base_url_path: /compatible-mode/v1
      secret_key_env_var: LLM_SK
    - name: deepseek-chat
      endpoint: https://api.deepseek.com
      base_url_path: /v1
      secret_key_env_var: DEEPSEEK_SK
    - name: qwq-plus
      endpoint: https://dashscope.aliyuncs.com
      base_url_path: /compatible-mode/v1
      secret_key_env_var: LLM_SK
  builtin_agent:
    default_model: qwen-max-2025-01-25
  informer_agent:
    c1_model: qwq-plus
    c0_model: qwq-plus
  ```
- **Custom agents**: If you're using agents other than `builtin_agent`, you must implement corresponding adapters and register them in `adapter/factory.py`.

# 6. Evaluating

To start the evaluation, run `evaluate.py` with the task name:

```shell
python evaluate.py eval_testing
```

If the evaluation is interrupted, you can resume it by running `resume.py` with the same task name:

```shell
python resume.py eval_testing
```

# 7. Analyzing The Result

After the evaluation completes, a result file will be generated in the `data/result` directory.
You can analyze the results as needed based on your specific criteria.