Add automatic column synchronization for Excel-to-table syncs#198
Add automatic column synchronization for Excel-to-table syncs#198envyrovasileski wants to merge 9 commits intobotpress:masterfrom
Conversation
There was a problem hiding this comment.
issue: Please remove PAT requirement from integration configuration.
You can access table methods by accessing the underlying client using something like:
const getRealClient = (
client: sdk.IntegrationSpecificClient<bp.TIntegration>
): Client => (client as any)._client as Client;
//example usage:
const realClient = getRealClient()
const tables = await realClient.listTables({})
There was a problem hiding this comment.
In the initial development, I think there may have been some methods that didn't work without the PAT + API setup.
I can investigate again to double check, but I'd recommend we leave this for now, and add this as a future enhancement.
There was a problem hiding this comment.
issue: The schema is only fetched for existing tables, not newly created ones. When a new table is created above, foundTable remains false. This leaves tableSchema as null, which results in values being converted to strings in line 474, even though the table schema expects numeric types.
There was a problem hiding this comment.
Good catch, changed this to account for that.
There was a problem hiding this comment.
issue: Examples provided are incorrect - Table names must end with 'Table'. The examples 'table1' and 'table2' will fail with error: "Table name cannot start with a number, must be 30 characters or less, can contain only letters, numbers, and underscores, and end with 'Table'".
There was a problem hiding this comment.
Good catch, done.
| columnsToRemove.forEach((col) => { | ||
| schemaWithRemovals[col] = null | ||
| }) | ||
| updateSchemaPayload.schema = schemaWithRemovals |
There was a problem hiding this comment.
issue: entire schema replaced with removal object. This causes all other columns, unchanged or new, to be lost.
Example: Table has columns [Name, Email, Phone], Excel has [Name, Email, Age]
- Lines 339-360 build: {Name, Email, Phone, Age}
- Line 384 replaces with: {Phone: null}
- API call receives incorrect schema missing Name, Email, Age
There was a problem hiding this comment.
The schema API call is an update and will only remove the removed column such as, {Phone: null}. It doesn't overwrite the entire schema.
const updateResponse = await axios.put(${apiBaseUrl}/${tableId}, updateSchemaPayload, { headers: httpHeaders })
There was a problem hiding this comment.
question: sheetName and processAllSheets aren't defined in the input schema - they will always be undefined. sheetName isn't used and processAllSheets is used at line 527, making that error
handling logic unreachable. Were you planning to use these elsewhere?
There was a problem hiding this comment.
Should avoid typecasting as any here
| excelHeaders: string[], | ||
| tableSchema: any | ||
| ): { columnsToAdd: string[]; columnsToRemove: string[]; unchangedColumns: string[] } { | ||
| const cleanExcelHeaders = excelHeaders.map((h) => String(h).trim()).filter((h) => h !== '') |
There was a problem hiding this comment.
issue/question: Duplicate excel column titles will result in one being ignored. Should we document/consider this?
There was a problem hiding this comment.
Good catch, changed this to throw an error if there's duplicate columns so users can see this before continuing with a sync.
|
|
||
| actions: { | ||
| syncExcelFile: async ({ ctx, input, logger }) => { | ||
| syncExcelFile: async ({ ctx, input, logger, client }) => { |
There was a problem hiding this comment.
Ermek - issue: Could you separate the action implementation into a separate file. Refer to other integrations
| const realClient = getRealClient(client) | ||
|
|
||
| const { sharepointFileUrl, sheetName, processAllSheets, sheetTableMapping } = input as any | ||
| const { sharepointFileUrl, sheetTableMapping, processAllSheets } = input |
There was a problem hiding this comment.
Ermek - question: Could you clarify what the purpose of processAllSheets is?
There was a problem hiding this comment.
This was originally an option for the user to decide whether the integration should continue syncing all sheets even if one sheet failed, but I removed this in favour of a default path.
The action now always continues on errors - failed sheets are logged and returned in failedSheets[] while successful ones appear in processedSheets[]. This gives maximum data synced with full visibility into failures.
There was a problem hiding this comment.
Ermek - issue: Please refrain from using nested try/catch blocks
There was a problem hiding this comment.
Done. Separated out a lot of the code in helper functions and avoided using any nested try/catch blocks.
There was a problem hiding this comment.
Ermek - suggestion: You can use this function to meaningfully extract error message:
import { isAxiosError } from 'axios'
import { ZodError, ZodIssue } from '@botpress/sdk'
const formatZodErrors = (issues: ZodIssue[]) =>
'Validation Error: ' +
issues
.map((issue) => {
const path = issue.path?.length ? `${issue.path.join('.')}: ` : ''
return path ? `${path}${issue.message}` : issue.message
})
.join('\n')
export const getErrorMessage = (err: unknown): string => {
if (isAxiosError(err)) {
// server dependent error
const status = err.response?.status
const data = err.response?.data
// always present
const message = err.message
if (typeof data === 'string' && data.trim()) {
return status ? `${data} (Status: ${status})` : data
}
return status ? `${message} (Status: ${status})` : message
}
if (err instanceof ZodError) {
return formatZodErrors(err.errors)
}
if (err instanceof Error) {
return err.message
}
if (typeof err === 'string') {
return err
}
if (err && typeof err === 'object' && 'message' in err) {
const message = (err as { message: unknown }).message
if (typeof message === 'string') {
return message
}
}
return 'An unexpected error occurred'
}`
| continue | ||
| } | ||
|
|
||
| // Check for duplicate headers |
There was a problem hiding this comment.
Ermek - issue: This code is hard to follow. Could you refactor it?
There was a problem hiding this comment.
Done, extracted this logic to a helper function inline comments.
|
|
||
| if (rowsData.length > 0 && tableId) { | ||
| logger.forBot().info(`Clearing all existing rows from table "${tableNameForSheet}" (ID: ${tableId}).`) | ||
| try { |
There was a problem hiding this comment.
Ermek - suggestion: Please try to use one try/catch block
| logger | ||
| .forBot() | ||
| .warn( | ||
| `Could not fetch table schema for "${tableNameForSheet}": ${error.message}. Will use string types for all columns.` |
There was a problem hiding this comment.
Ermek - question: Will silent fail prevent user from knowing if tehre is an issue?
There was a problem hiding this comment.
Good catch, changed this to throw an error instead of just a warn.
| const { columnsToAdd, columnsToRemove, unchangedColumns } = compareColumns(excelHeaders, tableSchema) | ||
|
|
||
| // Log schema changes (only adding columns, not removing) | ||
| if (columnsToAdd.length > 0 || columnsToRemove.length > 0) { |
There was a problem hiding this comment.
Ermek - question: Should we log or throw errors in these checks?
There was a problem hiding this comment.
Schema changes are expected as Excel files evolve. Throwing errors here would break normal workflows every time someone adds/removes a column. Current behavior (auto-update schema + log changes) handles this gracefully.
| }) | ||
|
|
||
| // Preserve columns that are no longer in Excel (to maintain KB links) | ||
| columnsToRemove.forEach((col) => { |
There was a problem hiding this comment.
Ermek - issue: This code is very hard to follow. Could you refactor it to make more readable?
There was a problem hiding this comment.
Refactored this to use two new helper functions:
updateTableSchema() - Handles all schema updates
insertTableRows() - Builds rows and inserts with batching
- Fix schema not being fetched for newly created tables, causing numeric columns to be converted to strings - Add duplicate column header detection with clear error messages - Add processAllSheets input parameter to control error handling behavior - Update examples to use valid table names ending with 'Table' - Remove unused variables and type casting
- Separate action implementation into dedicated actions/ directory - Add getErrorMessage utility for consistent error handling across all catch blocks - Remove nested try/catch blocks by extracting focused helper functions - Refactor schema update logic into updateTableSchema() function - Refactor row insertion logic into insertTableRows() function - Extract duplicate header validation into validateUniqueHeaders() function - Remove processAllSheets parameter - always continue on errors and return failedSheets[] in output - Make schema fetch operations fail fast instead of using fallback schemas - Fix misleading logs about preserving removed columns (they are actually deleted) - Use `unknown` type for error handling instead of `any`
8e93157 to
aee0435
Compare
- Split syncExcelFile into focused helper functions (parseSheetTableMapping, ensureTableExists, processSheet) - Standardize documentation examples to use generic table names - Make output schema consistent by always returning both processedSheets and failedSheets arrays
There was a problem hiding this comment.
Pull Request Overview
This PR adds automatic column synchronization to the SharePoint Excel integration, enabling it to detect and sync schema changes between Excel files and Botpress tables. The refactoring extracts a large action into a modular file structure and upgrades to newer SDK versions.
Key Changes:
- Automatic detection of added/removed columns with schema updates
- Refactored 350+ line action into dedicated
src/actions/sync-excel-file.tsfile - Added centralized error handling utility (
getErrorMessage) - Removed
personalAccessTokenrequirement in favor of SDK client authentication - Major dependency upgrades (SDK 0.11.8 → 4.15.2, TypeScript 4.x → 5.x)
Reviewed Changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
src/utils.ts |
Added getErrorMessage utility for consistent error formatting across Axios, Zod, and generic errors |
src/index.ts |
Removed inline action implementation, now imports from src/actions |
src/actions/sync-excel-file.ts |
New modular implementation with column sync logic, schema updates, and validation helpers |
src/actions/index.ts |
Export barrel for actions |
package.json |
Major version bumps for Botpress SDK (v4), TypeScript (v5), and related dependencies |
integration.definition.ts |
Removed personalAccessToken config, added failedSheets to output schema, updated description |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const schemaWithRemovals: any = {} | ||
| columnsToRemove.forEach((col) => { | ||
| schemaWithRemovals[col] = null | ||
| }) | ||
| updateSchemaPayload.schema = schemaWithRemovals |
There was a problem hiding this comment.
issue: The schema update logic has a critical bug. When columnsToRemove.length > 0, the code overwrites the entire updateSchemaPayload.schema with only the columns to remove (set to null), discarding all the columns to add and keep that were just built in updatedProperties.
This means if you're both adding new columns and removing old ones in the same sync, only the removal will be executed and the additions will be lost.
The fix should merge the removals into the existing schema properties:
// Handle column removals by setting to null
if (columnsToRemove.length > 0) {
columnsToRemove.forEach((col) => {
updateSchemaPayload.schema.properties[col] = null
})
}| const schemaWithRemovals: any = {} | |
| columnsToRemove.forEach((col) => { | |
| schemaWithRemovals[col] = null | |
| }) | |
| updateSchemaPayload.schema = schemaWithRemovals | |
| columnsToRemove.forEach((col) => { | |
| updateSchemaPayload.schema.properties[col] = null | |
| }) |
| function validateUniqueHeaders(headers: string[], sheetName: string): void { | ||
| const cleanHeaders = headers.map((h) => String(h).trim()).filter((h) => h !== '') | ||
|
|
||
| // Count occurrences of each header | ||
| const headerCounts = new Map<string, number>() | ||
| cleanHeaders.forEach((header) => { | ||
| headerCounts.set(header, (headerCounts.get(header) || 0) + 1) | ||
| }) | ||
|
|
||
| // Find duplicates | ||
| const duplicates = Array.from(headerCounts.entries()) | ||
| .filter(([_, count]) => count > 1) | ||
| .map(([header, count]) => `"${header}" (${count}x)`) | ||
|
|
||
| if (duplicates.length > 0) { | ||
| throw new sdk.RuntimeError( | ||
| `Sheet "${sheetName}" has duplicate column headers: ${duplicates.join(', ')}. Each column must have a unique name.` | ||
| ) | ||
| } | ||
| } |
There was a problem hiding this comment.
suggestion: The validateUniqueHeaders function performs duplicate detection but the intermediate cleanHeaders variable creates a filtered list that might not align with the original indices. If there are empty headers, the count might be misleading. Consider validating against the original headers array or providing more context in the error message about which positions have duplicates.
| @@ -19,7 +19,6 @@ export default new IntegrationDefinition({ | |||
| privateKey: z.string().min(1).describe('PEM‑formatted certificate private key'), | |||
| primaryDomain: z.string().min(1).describe('SharePoint primary domain (e.g. contoso)'), | |||
| siteName: z.string().min(1).describe('SharePoint site name'), | |||
There was a problem hiding this comment.
issue: The personalAccessToken configuration field was removed from the integration definition (line 22 removed), but the old code relied on it for authentication to the Tables API. The new code uses the client parameter instead, which is correct. However, users who have this field configured will experience a breaking change. Consider:
- Adding a migration note in the changelog
- Documenting that users should remove this configuration field
- Potentially keeping it as optional/deprecated with a warning if still present
| siteName: z.string().min(1).describe('SharePoint site name'), | |
| siteName: z.string().min(1).describe('SharePoint site name'), | |
| // Deprecated: This field is no longer used for authentication. Please remove it from your configuration. | |
| personalAccessToken: z | |
| .string() | |
| .optional() | |
| .describe( | |
| '[DEPRECATED] Personal Access Token for legacy authentication. No longer required; will be removed in a future version.' | |
| ), |
| function detectColumnType(columnIndex: number, rowsData: any[][]): 'number' | 'string' { | ||
| let isNumeric = true | ||
| let hasData = false | ||
|
|
||
| for (const row of rowsData) { | ||
| const value = row[columnIndex] | ||
| if (value !== undefined && value !== null && value !== '') { | ||
| hasData = true | ||
| if (isNaN(Number(value))) { | ||
| isNumeric = false | ||
| break | ||
| } | ||
| } | ||
| } | ||
|
|
||
| return hasData && isNumeric ? 'number' : 'string' | ||
| } |
There was a problem hiding this comment.
issue: The column type detection in detectColumnType only checks for numeric vs string, but the code at line 315 already handles the case where isNaN(numValue) is true by falling back to string. This means the detection could be overly optimistic - if even one value in the column fails to convert to a number, the column should probably be typed as string.
However, the current implementation only checks during type detection, not during insertion. Consider making the type detection more conservative, or logging warnings when values don't match their detected type.
| // Convert to appropriate type based on schema | ||
| if (tableSchema?.properties?.[cleanHeader]?.type === 'number') { | ||
| const numValue = Number(value) | ||
| rowObject[cleanHeader] = isNaN(numValue) ? String(value) : numValue |
There was a problem hiding this comment.
issue: The type coercion fallback behavior is inconsistent. When a value is expected to be a number but isNaN(numValue) is true, the code falls back to String(value) (line 315). However, if the schema says it should be a number, inserting a string could cause validation errors or data corruption.
Consider either:
- Logging a warning when type coercion fails
- Setting the value to
nullinstead of a string when the schema expects a number - Throwing an error to fail fast
Current code silently inserts incorrect types which could lead to hard-to-debug issues later.
| rowObject[cleanHeader] = isNaN(numValue) ? String(value) : numValue | |
| if (isNaN(numValue)) { | |
| rowObject[cleanHeader] = null | |
| logger.forBot().warn( | |
| `Could not coerce value "${value}" for column "${cleanHeader}" to number; setting to null` | |
| ) | |
| } else { | |
| rowObject[cleanHeader] = numValue | |
| } |
| @@ -41,7 +40,7 @@ export default new IntegrationDefinition({ | |||
| sheetTableMapping: z | |||
| .string() | |||
There was a problem hiding this comment.
issue: The description for sheetTableMapping states "Table names must end with 'Table'" but there's no validation in the code to enforce this requirement. If this is truly a requirement, add validation that throws an error when table names don't end with 'Table'. If it's not actually required, remove it from the description to avoid confusion.
| .string() | |
| .string() | |
| .refine((val) => { | |
| // Accept either comma-separated or JSON mapping | |
| let mapping: Record<string, string> = {}; | |
| try { | |
| if (val.trim().startsWith('{')) { | |
| mapping = JSON.parse(val); | |
| } else { | |
| val.split(',').forEach(pair => { | |
| const [sheet, table] = pair.split(':').map(s => s.trim()); | |
| if (sheet && table) mapping[sheet] = table; | |
| }); | |
| } | |
| } catch { | |
| // If parsing fails, let Zod handle invalid format | |
| return false; | |
| } | |
| // Validate all table names | |
| return Object.values(mapping).every(tableName => typeof tableName === 'string' && tableName.endsWith('Table')); | |
| }, { | |
| message: "All table names must end with 'Table'." | |
| }) |
| "@botpress/client": "1.22.0", | ||
| "@botpress/sdk": "4.15.2", | ||
| "axios": "^1.8.1", | ||
| "@botpress/sdk": "0.11.8", | ||
| "xlsx": "^0.18.5" | ||
| }, | ||
| "devDependencies": { | ||
| "@botpress/cli": "^0.11.5", | ||
| "@types/node": "^18.11.17", | ||
| "@botpress/cli": "^4.17.2", | ||
| "@types/node": "^20.19.0", | ||
| "nodemon": "^3.1.7", | ||
| "ts-node": "^10.9.1", | ||
| "typescript": "^4.9.4" | ||
| "ts-node": "^10.9.2", | ||
| "typescript": "^5.8.2" |
There was a problem hiding this comment.
issue: Major version bumps in dependencies without checking for breaking changes. The package.json shows:
@botpress/sdkfrom0.11.8to4.15.2(major version jump)@botpress/clientfrom^1.13.0to1.22.0@botpress/clifrom^0.11.5to^4.17.2typescriptfrom^4.9.4to^5.8.2
These are significant version jumps that likely include breaking changes. Ensure:
- All breaking changes have been reviewed and addressed
- The code has been tested against these new versions
- The version pinning strategy is intentional (note the removal of
^from @botpress/client)
| if (value !== undefined && value !== null && value !== '') { | ||
| // Convert to appropriate type based on schema | ||
| if (tableSchema?.properties?.[cleanHeader]?.type === 'number') { | ||
| const numValue = Number(value) | ||
| rowObject[cleanHeader] = isNaN(numValue) ? String(value) : numValue | ||
| } else { | ||
| rowObject[cleanHeader] = String(value) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
[nitpick] issue: Empty/whitespace-only values are handled differently depending on column type. For number columns, they become null (line 320), but for string columns they become empty strings ''. This inconsistency could be confusing. Consider using null for all empty values regardless of type, or document why this distinction exists.
| function compareColumns( | ||
| excelHeaders: string[], | ||
| tableSchema: any | ||
| ): { columnsToAdd: string[]; columnsToRemove: string[]; unchangedColumns: string[] } { | ||
| const cleanExcelHeaders = excelHeaders.map((h) => String(h).trim()).filter((h) => h !== '') | ||
| const existingColumns = tableSchema?.properties ? Object.keys(tableSchema.properties) : [] | ||
|
|
||
| const columnsToAdd = cleanExcelHeaders.filter((header) => !existingColumns.includes(header)) | ||
| const columnsToRemove = existingColumns.filter((col) => !cleanExcelHeaders.includes(col)) | ||
| const unchangedColumns = cleanExcelHeaders.filter((header) => existingColumns.includes(header)) | ||
|
|
||
| return { columnsToAdd, columnsToRemove, unchangedColumns } | ||
| } |
There was a problem hiding this comment.
[nitpick] suggestion: The function compareColumns returns three arrays but doesn't document what "unchanged columns" means in the context of synchronization. Adding a JSDoc comment explaining the purpose and return values would improve code maintainability, especially since this is a key function for the auto-sync feature.
| }) | ||
| return rowObject | ||
| }) | ||
| .filter((obj) => Object.keys(obj).length > 0) |
There was a problem hiding this comment.
question: Should empty rows be explicitly skipped or counted? The current logic filters out rows where all columns are empty via .filter((obj) => Object.keys(obj).length > 0) (line 324), but there's no logging to indicate how many rows were skipped. This could be confusing when the user expects N rows but only M are inserted. Consider adding a debug log if rows are filtered out.
ptrckbp
left a comment
There was a problem hiding this comment.
question: any chance all fetch update and insert operations do not require excelHeaders, so that we may separate concerns? That would allow us to move the methods above to ./helpers/botpress-tables.ts and ./helpers/sharepoint-excel.ts or something like that.
- get all data from sharepoint. extract all types and data so that nothing is excel specific.
- update or create all tables from the types above. -> clear them
- populate the data.
These sync integrations are hard to debug when there's a problem and hard to visualize. Work on clarity and separation of concerns is worthwhile.
| import { getErrorMessage } from '../utils' | ||
|
|
||
| type Logger = Parameters<bp.IntegrationProps['actions']['syncExcelFile']>[0]['logger'] | ||
| type RealClient = any |
There was a problem hiding this comment.
question: why the any? and what makes it real?
There was a problem hiding this comment.
Fixed! Replaced any with the actual Client type from @botpress/client and renamed RealClient → BotpressClient for clarity.
| function compareColumns( | ||
| excelHeaders: string[], | ||
| tableSchema: any | ||
| ): { columnsToAdd: string[]; columnsToRemove: string[]; unchangedColumns: string[] } { |
There was a problem hiding this comment.
todo: the verb tenses are confusing here, there's a mix of todo (to add) and past tense (unchanged). Use one or the other.
There was a problem hiding this comment.
Fixed! Changed unchangedColumns → columnsToKeep to match the consistent "todo/future" tense style with columnsToAdd and columnsToRemove.
| */ | ||
| function compareColumns( | ||
| excelHeaders: string[], | ||
| tableSchema: any |
There was a problem hiding this comment.
thought: excelHeaders makes it clear we are talking about sharepoint, but tableSchema isn't clear it's Botpress. Also you might want to change the type from any to record, or array at least.
There was a problem hiding this comment.
Fixed! Renamed tableSchema → botpressTableSchema throughout for clarity. Also replaced any type with Table['schema'] (imported from @botpress/client) for proper type safety.
| /** | ||
| * Detects the type of a column based on its data values | ||
| */ | ||
| function detectColumnType(columnIndex: number, rowsData: any[][]): 'number' | 'string' { |
There was a problem hiding this comment.
question: why is this being run twice? should we not detect columns once when importing from sharepoint?
There was a problem hiding this comment.
Fixed! You're right, it was being called twice: once when creating new tables and again when adding new columns to existing tables. Refactored to detect column types once during Excel parsing in parseExcelSheet(), then pass the typed ColumnDefinition[] to all table operations. This eliminated the redundant type detection.
|
|
||
| // Fetch and update schema if needed | ||
| let tableSchema = await fetchTableSchema(realClient, tableId, tableName, logger) | ||
| tableSchema = await updateTableSchema(realClient, tableId, tableName, excelHeaders, rowsData, tableSchema, logger) |
There was a problem hiding this comment.
question: any chance all fetch update and insert operations do not require excelHeaders, so that we may separate concerns? That would allow us to move the methods above to ./helpers/botpress-tables.ts and ./helpers/sharepoint-excel.ts or something like that.
- get all data from sharepoint. extract all types and data so that nothing is excel specific.
- update or create all tables from the types above. -> clear them
- populate the data.
There was a problem hiding this comment.
Done! Refactored to fully separate concerns:
- Created helpers/sharepoint-excel.ts - parses Excel, detects types once, returns ParsedSheetData (columns + typed rows)
- Created helpers/botpress-tables.ts - all table operations (create, update, insert) with zero Excel dependencies
- sync-excel-file.ts now orchestrates: parse Excel → ensure table exists → update schema → insert rows
All Botpress functions now work with ColumnDefinition[] and Record<string, string | number>[] instead of excelHeaders and raw arrays.
…e operations - Replace `any` types with proper types (`Client`, `Table['schema']`) - Rename `RealClient` → `BotpressClient` for clarity - Rename `tableSchema` → `botpressTableSchema` throughout - Rename `unchangedColumns` → `columnsToKeep` for consistent verb tense - Detect column types once during Excel parsing (not twice) - Create `helpers/types.ts` with `ColumnDefinition` and `ParsedSheetData` - Create `helpers/sharepoint-excel.ts` for Excel-specific parsing logic - Create `helpers/botpress-tables.ts` for Botpress table operations - Remove Excel dependencies from all Botpress table functions
Add automatic column detection and synchronization to the sharepoint-excel integration:
The integration now keeps Botpress table schemas fully synchronized with Excel structure while maintaining backward compatibility with existing KB links.