# proteasome **Repository Path**: mirrors_alibaba/proteasome ## Basic Information - **Project Name**: proteasome - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-05-29 - **Last Updated**: 2025-09-13 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # 1. Testing The Framework This section is intended for framework developers. After modifying the framework's core code, you can test it using mocked datasets by executing these commands: ```shell # Generate mocked datasets: # - Creates "_mocked_benchmark" case set in "data/case_set/" # - Creates "_mocked_samples" sample set in "data/sample_set/" # - Creates "_mocked_eval_task" evaluation task in "data/eval_task/" python mock.py -case -sample -task # Execute evaluation (mock mode): # 1. Tasks prefixed with "_mocked_" will run in mock mode # 2. Runtime profiles will be generated in "data/eval_task/task_running/" # 3. Final results will be stored in "data/result/" python evaluate.py _mocked_eval_task # Cleanup mocked assets (optional): # Note: This removes only the mocked datasets/tasks, not runtime profiles/results python mock.py clean -case -sample -task ``` # 2. Creating The Case Set Before evaluating, a case set must be created unless you have a case set already. Create a case set by the following command: ```shell # The name "my_benchmark" should be replaced with an appropriate name. # Note: This command will ask you for the tile and description of the new case set, you can input any text for them. python create_case_set.py my_benchmark ``` # 3. Making A Case This section provides an example of how to create cases of creating resource. ## 3.1. Creating Workshop Directory Create a directory named `workshop_of_case`. ## 3.2. Initializing Environment Navigate into the `workshop_of_case` directory and create an HCL source file - e.g., `main.tf` as shown below: ```hcl # A provider block is required. You can specify the provider requirement block if needed. provider "alicloud" { region = "cn-shanghai" } ``` Next, run the `init` and `apply` commands to initialize the environment: ```shell # The init command downloads the provider plugins and creates the .terraform.lock.hcl file. tofu init # The apply command generates the Terraform state file. tofu apply ``` At this point, the environment is initialized and ready for resource creation. ## 3.3. Saving The Pre-Operation Environment Run the `save_env.py` script to capture the environment state before performing any operations: ```shell python save_env.py pre_op ``` This will create a directory named `pre_op` inside the workshop directory, containing the necessary files to represent the pre-operation environment state. ## 3.4. Creating An OSS Bucket Update the `main.tf` file as follows: ```hcl provider "alicloud" { region = "cn-shanghai" } # Declare an OSS bucket. resource "alicloud_oss_bucket" "demo_bucket" { bucket = "demo-bucket-2025-0507-1348" acl = "public-read" } ``` Then, run the `apply` command to create the OSS bucket: ```shell tofu apply ``` ## 3.5. Saving The Post-Operation Environment Run the `save_env.py` script to capture the environment state after performing any operations: ```shell python save_env.py post_op ``` This will create a directory named `post_op` inside the workshop directory, containing the necessary files to represent the post-operation environment state. ## 3.6. Preparing A Manifest File Create a file named manifest.yaml in the workshop directory to define the test cases. Below is an example: ```yaml resource_type: alicloud_oss_bucket operation_type: create singularity: s0 essentials: - > count(alicloud_oss_bucket) == 1 - > opt("alicloud_oss_bucket_acl") is None or count(alicloud_oss_bucket_acl) == 1 - > provider_of(first_of(alicloud_oss_bucket)).region == "cn-shanghai" - > opt("alicloud_oss_bucket_acl") is None or provider_of(first_of(alicloud_oss_bucket_acl)).region == "cn-shanghai" - > first_of(alicloud_oss_bucket).bucket == "demo-bucket-2025-0507-1348" - > len(list(filter( lambda x: x is not None, { first_of(alicloud_oss_bucket).opt("acl"), first_of(opt("alicloud_oss_bucket_acl")) } ))) == 1 - > first_of(alicloud_oss_bucket).opt("acl") == "public-read" or opt("alicloud_oss_bucket_acl") is not None and first_of(alicloud_oss_bucket_acl).opt("acl") == "public-read" - > opt("alicloud_oss_bucket_acl") is None or first_of(alicloud_oss_bucket_acl).bucket == "${" + first_of(alicloud_oss_bucket).self_path() + ".bucket}" - > first_of(alicloud_oss_bucket).opt("storage_class") in {None, "Standard"} - > first_of(alicloud_oss_bucket).opt("lifecycle_rule") is None actions: side_effect: deny required: - action: create pattern: alicloud_oss_bucket.* count: 1 optional: - action: create pattern: alicloud_oss_bucket_acl.* count: 1 - action: create pattern: alicloud_oss_bucket_versioning.* count: 1 security: [] misc: - > alicloud_oss_bucket.opt("demo_bucket") is not None - > alicloud_oss_bucket.demo_bucket.opt("versioning") is None or alicloud_oss_bucket.demo_bucket.versioning.opt("status") == "Suspended" - > opt("alicloud_oss_bucket_versioning") is None or first_of(alicloud_oss_bucket_versioning).bucket == "${" + first_of(alicloud_oss_bucket).self_path() + ".bucket}" and first_of(alicloud_oss_bucket_versioning).opt("status") in {None, "Suspended"} - > alicloud_oss_bucket.demo_bucket.opt("cors_rule") is None - > alicloud_oss_bucket.demo_bucket.opt("website") is None - > alicloud_oss_bucket.demo_bucket.opt("logging") is None - > alicloud_oss_bucket.demo_bucket.opt("server_side_encryption_rule") is None - > alicloud_oss_bucket.demo_bucket.opt("transfer_acceleration") is None - > alicloud_oss_bucket.demo_bucket.opt("redundancy_type") is None or alicloud_oss_bucket.demo_bucket.redundancy_type == "LRS" - > alicloud_oss_bucket.demo_bucket.opt("access_monitor") is None or alicloud_oss_bucket.demo_bucket.access_monitor.opt("status") == "Disabled" user_inputs: i3: > Create an Alibaba Cloud OSS bucket in the cn-shanghai region. Name the bucket demo-bucket-2025-0507-1348, and set the resource block name to demo_bucket. Set the ACL to public-read. Do not enable versioning or lifecycle management. i2: > Create an Alibaba Cloud OSS bucket in the cn-shanghai region. Name the bucket demo-bucket-2025-0507-1348. Set the ACL to public-read. i1: > Create an OSS bucket named demo-bucket-2025-0507-1348 in Shanghai that anyone can read. i0: > Create an OSS bucket. clarity: [c1, c0] ``` Each case is defined by a combination of five dimensions: **resource type**, **operation type**, **integrity**, **clarity**, and **singularity**. Given the combinations specified in this `manifest.yaml`, up to 8 unique cases can be described. ## 3.7. Saving The Cases To The Case Set Use the `save_cases.py` script to save the cases defined in the workshop directory to a specified case set: ```shell # "my_benchmark" refers to the name of an existing case set. python save_cases.py my_benchmark ``` After saving the case, restore the real environment to the state captured in the "pre_op" snapshot. Now that the cases have been created, you can repeat the steps above to add more cases as needed. # 4. Setting Samples and Evaluation Task Create a sample set file in `data/sample_set`. e.g. `samples_testing.yaml`: ```yaml kind: sample_set sample_set: title: Sample For Testing description: This is the sample set is for testing. samples: - [ qwen-max-2025-01-25, builtin_agent.d1.e1 ] - [ deepseek-chat, builtin_agent.d0.e1 ] ``` In the samples section of the YAML file, each sample is a pair consisting of a model name and a decorated agent name. - The model name refers to a specific LLM. - The decorated agent name includes the agent's name and two feature flags: - "d0": the agent does not support documentation. - "d1": the agent does support documentation. - "e0": the agent is not aware of the IaC environment. - "e1": the agent is aware of the IaC environment. - "es": the agent is only aware of the IaC state. - "ec": the agent is only aware of the IaC code (HCL). The `builtin_agent` is a special type of agent that can enable or disable specific features using feature flags. At last, create an evaluation task file in `data/eval_task`, e.g. `eval_testing.yaml`: ```yaml kind: eval_task eval_task: case_set: my_benchmark sample_set: samples_testing repeats: 3 iac_tool: name: tofu version: 1.9.0 path: /opt/homebrew/bin/tofu ``` Key fields explained: - case_set: The name of the case set to be evaluated. - sample_set: The name of the sample set to be used in this evaluation. - repeats: The number of times each case will be evaluated. - iac_tool: Configuration for the Infrastructure-as-Code (IaC) tool, including: - name: The name of the IaC tool (e.g., tofu) - version: The version to be used - path: The full path to the executable # 5. Before Evaluating Before running the evaluation, make sure the following preparations are complete: - **Provider credentials**: Store the credentials for the provider plugins in environment variables (e.g., ALICLOUD_ACCESS_KEY, ALICLOUD_SECRET_KEY). - **Configuration file**: Create or update the config.yaml file. Example: ```yaml eval_concurrency: 4 models: - name: qwen-max-2025-01-25 endpoint: https://dashscope.aliyuncs.com base_url_path: /compatible-mode/v1 secret_key_env_var: LLM_SK - name: qwen-plus-2025-01-25 endpoint: https://dashscope.aliyuncs.com base_url_path: /compatible-mode/v1 secret_key_env_var: LLM_SK - name: qwen-turbo-2025-02-11 endpoint: https://dashscope.aliyuncs.com base_url_path: /compatible-mode/v1 secret_key_env_var: LLM_SK - name: deepseek-chat endpoint: https://api.deepseek.com base_url_path: /v1 secret_key_env_var: DEEPSEEK_SK - name: qwq-plus endpoint: https://dashscope.aliyuncs.com base_url_path: /compatible-mode/v1 secret_key_env_var: LLM_SK builtin_agent: default_model: qwen-max-2025-01-25 informer_agent: c1_model: qwq-plus c0_model: qwq-plus ``` - **Custom agents**: If you're using agents other than `builtin_agent`, you must implement corresponding adapters and register them in `adapter/factory.py`. # 6. Evaluating To start the evaluation, run `evaluate.py` with the task name: ```shell python evaluate.py eval_testing ``` If the evaluation is interrupted, you can resume it by running `resume.py` with the same task name: ```shell python resume.py eval_testing ``` # 7. Analyzing The Result After the evaluation completes, a result file will be generated in the `data/result` directory. You can analyze the results as needed based on your specific criteria.