Example prompt for synthetic data generation (for training a model to correct Excel formula errors)
{
"role": "system",
"content": "\nYou are an Excel expert responsible for generating accurate and realistic data to validate a model for Excel formula repair. Your role is to carefully analyze each scenario, think step-by-step, and provide the appropriate response.\nEnsure each step is precise, logical, and complete. Think carefully and deliver structured, well-reasoned outputs that conforms to the user requirements. Especially for #VALUE! and #REF! errors, ensure that the faulty formula generates the specified error and the correct formula resolves it logically and coherently.\n"
},
{
"role": "user",
"content": "\n### Role\nYou are an expert Excel data generator with extensive knowledge in Excel formulas, spreadsheet data, and common user challenges. \nYour role is to create high-quality, realistic data samples to validate a model used for Excel formula repair, specifically focusing on generating faulty formula that result in a #N/A runtime error.\n\n### Instructions\nYour job is to simulate a realistic Excel scenario where a user encounters a runtime error in a formula.\nThinking step-by-step, your task is carefully and methodically generate the following:\n**User Utterance**: \nCreate a plausible and concise user query that mimics a real-world user struggling with an Excel problem related to formulas and facing the specified runtime error. \nThe query should accurately reflect the description of the problem the user is facing.\n\nFaulty Formula:\nGenerate an Excel formula that results in the specified #N/A runtime error. This formula must be directly related to the user's query and use references from the Excel table you generate.\nThe formula should contain at least THREE Excel functions or more and produce the specified runtime error in the faulty formula location.\n\n**Excel Table (JSON)**:\nConstruct a dictionary of cells (JSON format) in a realistic Excel table format. Each cell must have a value (v) and a type (t). If the cell is empty, replace it with \"nan\". \nEnsure that the table structure supports the faulty formula you are suggesting and that it's coherent with the user query.\n\n**faulty formula location**: The location of the faulty formula in the excel table you generate. This cell location must be in the same location as one of the cells in the data dictionary that contains the runtime error.\n\n**Correct Formula**:\nPropose a formula that fixes the error and addresses the user utterance. Ensure that the formula resolves the specific runtime error by directly addressing the issue in the faulty formula.\nThe correct formula must work with the data when placed in the same cell location where the faulty formula was.\n\n**Explanation**:\nProvide a detailed, yet concise, explanation of how the correct formula resolves the error in the faulty formula. Your explanation should focus on addressing the error and fixing the formula, ensuring logical coherence.\n\n### Key Rules:\nThe faulty formula must generate the specified #N/A.\nThe correct formula must resolve the #N/A in a logically sound way.\nDo not use the NA() function for #N/A errors.\nThe output must follow the JSON format strictly and be surrounded by <JSON> and </JSON> tags.\nPay attention to logical coherence between the query, table, formulas, and the explanation.\nDo not use the IF or IFERROR functions in the correct formula. However, for cases where IF or IFERROR are used in the correct formula and the \"otherwise\" condition is involved, it should return an empty string (\"\") instead of random values. For example: \"correct_formula\": \"=IF(COUNTIF(B:B, A2) > 0, A2, \"\")\"\n\n### Requirements:\n**Thorough Reasoning**: Take your time to ensure the logical flow between the user query, faulty formula, correct formula, and explanation is sound.\n**Strict Adherence to Format**: Ensure the output format is followed exactly. Errors in format or logic will result in failed validation.\n**Realism**: All components (queries, formulas, tables) must be realistic and mimic real-world Excel usage.\n\nGiven the following information:\nExcel Runtime Error: #N/A\nPound Error Description: A definition of the Excel runtime error Occurs when a value is not available to a function or formula, indicating that something is missing or not applicable.\nTypical example format of a real world correctly annotated example:\n<JSON>{'cells': {'Sheet1': {'cells': {'A1': {'t': 's', 'v': 'List A'}, 'A2': {'t': 'n', 'v': '10.0'}, 'A3': {'t': 'n', 'v': '20.0'}, 'A4': {'t': 'n', 'v': '30.0'}, 'A5': {'t': 'n', 'v': '40.0'}, 'A6': {'t': 'n', 'v': '50.0'}, 'A7': {'t': 'n', 'v': '60.0'}, 'A8': {'t': 'n', 'v': '70.0'}, 'A9': {'t': 'n', 'v': '80.0'}, 'A10': {'t': 'n', 'v': '90.0'}, 'B1': {'t': 's', 'v': 'List B'}, 'B2': {'t': 'n', 'v': '2398.0'}, 'B3': {'t': 'n', 'v': '70.0'}, 'B4': {'t': 'n', 'v': '3.0'}, 'B5': {'t': 'n', 'v': '3948.0'}, 'B6': {'t': 'n', 'v': '69.0'}, 'B7': {'t': 'n', 'v': '50.0'}, 'B8': {'t': 'n', 'v': '90.0'}, 'B9': {'t': 'n', 'v': '10.0'}, 'B10': {'t': 'n', 'v': '8.0'}}}}, 'utterance': 'Lets say we have 2 lists of values in A1:A10 & B1:B10.Now, how do you find the number of common values in both lists? We just want the count, not list of common values it self. Also, if a value appears multiple times in both lists, it should be counted multiple times in the number of common values.', 'faulty_formula': '=SUM(--(MATCH(A2:A10,B2:B10,0)>0))', 'correct_formula': '=SUM(COUNTIF(A2:A10,\"=\"&B2:B10))', 'runtime_errors': ['#N/A'], 'faulty_formula_cell_location': None}</JSON>\n\n### Output\nPlease take your time to think step by step and generate a realistic JSON response following this strict format:\n\n<JSON>\n{\n \"utterance\": \"Your user query mimicking a user-facing problem in Excel.\",\n \"cells\": {\n // JSON representation of the Excel table cells, e.g.,\n \"A1\": { \"v\": \"10\", \"t\": \"n\" },\n \"A2\": { \"v\": \"nan\", \"t\": \"n\" }\n // Continue for other cells in the table.\n },\n \"faulty_formula\": \"faulty formula that results in the specified runtime error\",\n \"correct_formula\": \"a corrected formula that resolves the runtime error\",\n \"explanation\": \"Explanation of how the correct formula resolves the runtime error.\",\n \"runtime_errors\": [\"runtime error related to the faulty formula\"],\n \"faulty_formula_location\": \"Location of the faulty formula in the excel table.\"\n}\n</JSON>\n\n\n### Final Check\nEnsure that the faulty formula indeed returns the specified runtime error and that the correct formula addresses and fixes the error logically and coherently."
}