Skip to content

Add automatic column synchronization for Excel-to-table syncs#198

Open
envyrovasileski wants to merge 9 commits intobotpress:masterfrom
Envyro-Inc:sharepoint-excel-feat
Open

Add automatic column synchronization for Excel-to-table syncs#198
envyrovasileski wants to merge 9 commits intobotpress:masterfrom
Envyro-Inc:sharepoint-excel-feat

Conversation

@envyrovasileski
Copy link
Contributor

Add automatic column detection and synchronization to the sharepoint-excel integration:

  • Automatically adds new columns when detected in Excel files
  • Removes columns from Botpress tables when deleted from Excel (preserves table ID and KB links)

The integration now keeps Botpress table schemas fully synchronized with Excel structure while maintaining backward compatibility with existing KB links.

@envyrovasileski envyrovasileski requested a review from a team as a code owner October 22, 2025 04:31
Comment on lines 151 to 163

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Please remove PAT requirement from integration configuration.

You can access table methods by accessing the underlying client using something like:

const getRealClient = (
  client: sdk.IntegrationSpecificClient<bp.TIntegration>
): Client => (client as any)._client as Client;

//example usage:
const realClient = getRealClient()
const tables = await realClient.listTables({})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the initial development, I think there may have been some methods that didn't work without the PAT + API setup.

I can investigate again to double check, but I'd recommend we leave this for now, and add this as a future enhancement.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: The schema is only fetched for existing tables, not newly created ones. When a new table is created above, foundTable remains false. This leaves tableSchema as null, which results in values being converted to strings in line 474, even though the table schema expects numeric types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, changed this to account for that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Examples provided are incorrect - Table names must end with 'Table'. The examples 'table1' and 'table2' will fail with error: "Table name cannot start with a number, must be 30 characters or less, can contain only letters, numbers, and underscores, and end with 'Table'".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, done.

columnsToRemove.forEach((col) => {
schemaWithRemovals[col] = null
})
updateSchemaPayload.schema = schemaWithRemovals

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: entire schema replaced with removal object. This causes all other columns, unchanged or new, to be lost.

Example: Table has columns [Name, Email, Phone], Excel has [Name, Email, Age]

  • Lines 339-360 build: {Name, Email, Phone, Age}
  • Line 384 replaces with: {Phone: null}
  • API call receives incorrect schema missing Name, Email, Age

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schema API call is an update and will only remove the removed column such as, {Phone: null}. It doesn't overwrite the entire schema.

const updateResponse = await axios.put(${apiBaseUrl}/${tableId}, updateSchemaPayload, { headers: httpHeaders })

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: sheetName and processAllSheets aren't defined in the input schema - they will always be undefined. sheetName isn't used and processAllSheets is used at line 527, making that error
handling logic unreachable. Were you planning to use these elsewhere?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should avoid typecasting as any here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

excelHeaders: string[],
tableSchema: any
): { columnsToAdd: string[]; columnsToRemove: string[]; unchangedColumns: string[] } {
const cleanExcelHeaders = excelHeaders.map((h) => String(h).trim()).filter((h) => h !== '')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue/question: Duplicate excel column titles will result in one being ignored. Should we document/consider this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, changed this to throw an error if there's duplicate columns so users can see this before continuing with a sync.


actions: {
syncExcelFile: async ({ ctx, input, logger }) => {
syncExcelFile: async ({ ctx, input, logger, client }) => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ermek - issue: Could you separate the action implementation into a separate file. Refer to other integrations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

const realClient = getRealClient(client)

const { sharepointFileUrl, sheetName, processAllSheets, sheetTableMapping } = input as any
const { sharepointFileUrl, sheetTableMapping, processAllSheets } = input
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ermek - question: Could you clarify what the purpose of processAllSheets is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was originally an option for the user to decide whether the integration should continue syncing all sheets even if one sheet failed, but I removed this in favour of a default path.

The action now always continues on errors - failed sheets are logged and returned in failedSheets[] while successful ones appear in processedSheets[]. This gives maximum data synced with full visibility into failures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ermek - issue: Please refrain from using nested try/catch blocks

Copy link
Contributor Author

@envyrovasileski envyrovasileski Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Separated out a lot of the code in helper functions and avoided using any nested try/catch blocks.

Copy link
Contributor Author

@envyrovasileski envyrovasileski Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ermek - suggestion: You can use this function to meaningfully extract error message:

import { isAxiosError } from 'axios'
import { ZodError, ZodIssue } from '@botpress/sdk'

const formatZodErrors = (issues: ZodIssue[]) =>
  'Validation Error: ' +
  issues
    .map((issue) => {
      const path = issue.path?.length ? `${issue.path.join('.')}: ` : ''
      return path ? `${path}${issue.message}` : issue.message
    })
    .join('\n')

export const getErrorMessage = (err: unknown): string => {
  if (isAxiosError(err)) {
    // server dependent error
    const status = err.response?.status
    const data = err.response?.data
    // always present
    const message = err.message

    if (typeof data === 'string' && data.trim()) {
      return status ? `${data} (Status: ${status})` : data
    }
    return status ? `${message} (Status: ${status})` : message
  }

  if (err instanceof ZodError) {
    return formatZodErrors(err.errors)
  }

  if (err instanceof Error) {
    return err.message
  }

  if (typeof err === 'string') {
    return err
  }

  if (err && typeof err === 'object' && 'message' in err) {
    const message = (err as { message: unknown }).message
    if (typeof message === 'string') {
      return message
    }
  }
  return 'An unexpected error occurred'
}`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

continue
}

// Check for duplicate headers
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ermek - issue: This code is hard to follow. Could you refactor it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, extracted this logic to a helper function inline comments.


if (rowsData.length > 0 && tableId) {
logger.forBot().info(`Clearing all existing rows from table "${tableNameForSheet}" (ID: ${tableId}).`)
try {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ermek - suggestion: Please try to use one try/catch block

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

logger
.forBot()
.warn(
`Could not fetch table schema for "${tableNameForSheet}": ${error.message}. Will use string types for all columns.`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ermek - question: Will silent fail prevent user from knowing if tehre is an issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, changed this to throw an error instead of just a warn.

const { columnsToAdd, columnsToRemove, unchangedColumns } = compareColumns(excelHeaders, tableSchema)

// Log schema changes (only adding columns, not removing)
if (columnsToAdd.length > 0 || columnsToRemove.length > 0) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ermek - question: Should we log or throw errors in these checks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Schema changes are expected as Excel files evolve. Throwing errors here would break normal workflows every time someone adds/removes a column. Current behavior (auto-update schema + log changes) handles this gracefully.

})

// Preserve columns that are no longer in Excel (to maintain KB links)
columnsToRemove.forEach((col) => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ermek - issue: This code is very hard to follow. Could you refactor it to make more readable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored this to use two new helper functions:

updateTableSchema() - Handles all schema updates
insertTableRows() - Builds rows and inserts with batching

envyrovasileski and others added 7 commits November 8, 2025 17:42
- Fix schema not being fetched for newly created tables, causing numeric columns to be converted to strings
- Add duplicate column header detection with clear error messages
- Add processAllSheets input parameter to control error handling behavior
- Update examples to use valid table names ending with 'Table'
- Remove unused variables and type casting
- Separate action implementation into dedicated actions/ directory
- Add getErrorMessage utility for consistent error handling across all catch blocks
- Remove nested try/catch blocks by extracting focused helper functions
- Refactor schema update logic into updateTableSchema() function
- Refactor row insertion logic into insertTableRows() function
- Extract duplicate header validation into validateUniqueHeaders() function
- Remove processAllSheets parameter - always continue on errors and return failedSheets[] in output
- Make schema fetch operations fail fast instead of using fallback schemas
- Fix misleading logs about preserving removed columns (they are actually deleted)
- Use `unknown` type for error handling instead of `any`
- Split syncExcelFile into focused helper functions (parseSheetTableMapping, ensureTableExists, processSheet)
- Standardize documentation examples to use generic table names
- Make output schema consistent by always returning both processedSheets and failedSheets arrays
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds automatic column synchronization to the SharePoint Excel integration, enabling it to detect and sync schema changes between Excel files and Botpress tables. The refactoring extracts a large action into a modular file structure and upgrades to newer SDK versions.

Key Changes:

  • Automatic detection of added/removed columns with schema updates
  • Refactored 350+ line action into dedicated src/actions/sync-excel-file.ts file
  • Added centralized error handling utility (getErrorMessage)
  • Removed personalAccessToken requirement in favor of SDK client authentication
  • Major dependency upgrades (SDK 0.11.8 → 4.15.2, TypeScript 4.x → 5.x)

Reviewed Changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
src/utils.ts Added getErrorMessage utility for consistent error formatting across Axios, Zod, and generic errors
src/index.ts Removed inline action implementation, now imports from src/actions
src/actions/sync-excel-file.ts New modular implementation with column sync logic, schema updates, and validation helpers
src/actions/index.ts Export barrel for actions
package.json Major version bumps for Botpress SDK (v4), TypeScript (v5), and related dependencies
integration.definition.ts Removed personalAccessToken config, added failedSheets to output schema, updated description

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 266 to 270
const schemaWithRemovals: any = {}
columnsToRemove.forEach((col) => {
schemaWithRemovals[col] = null
})
updateSchemaPayload.schema = schemaWithRemovals
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: The schema update logic has a critical bug. When columnsToRemove.length > 0, the code overwrites the entire updateSchemaPayload.schema with only the columns to remove (set to null), discarding all the columns to add and keep that were just built in updatedProperties.

This means if you're both adding new columns and removing old ones in the same sync, only the removal will be executed and the additions will be lost.

The fix should merge the removals into the existing schema properties:

// Handle column removals by setting to null
if (columnsToRemove.length > 0) {
  columnsToRemove.forEach((col) => {
    updateSchemaPayload.schema.properties[col] = null
  })
}
Suggested change
const schemaWithRemovals: any = {}
columnsToRemove.forEach((col) => {
schemaWithRemovals[col] = null
})
updateSchemaPayload.schema = schemaWithRemovals
columnsToRemove.forEach((col) => {
updateSchemaPayload.schema.properties[col] = null
})

Copilot uses AI. Check for mistakes.
Comment on lines 52 to 71
function validateUniqueHeaders(headers: string[], sheetName: string): void {
const cleanHeaders = headers.map((h) => String(h).trim()).filter((h) => h !== '')

// Count occurrences of each header
const headerCounts = new Map<string, number>()
cleanHeaders.forEach((header) => {
headerCounts.set(header, (headerCounts.get(header) || 0) + 1)
})

// Find duplicates
const duplicates = Array.from(headerCounts.entries())
.filter(([_, count]) => count > 1)
.map(([header, count]) => `"${header}" (${count}x)`)

if (duplicates.length > 0) {
throw new sdk.RuntimeError(
`Sheet "${sheetName}" has duplicate column headers: ${duplicates.join(', ')}. Each column must have a unique name.`
)
}
}
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: The validateUniqueHeaders function performs duplicate detection but the intermediate cleanHeaders variable creates a filtered list that might not align with the original indices. If there are empty headers, the count might be misleading. Consider validating against the original headers array or providing more context in the error message about which positions have duplicates.

Copilot generated this review using guidance from repository custom instructions.
@@ -19,7 +19,6 @@ export default new IntegrationDefinition({
privateKey: z.string().min(1).describe('PEM‑formatted certificate private key'),
primaryDomain: z.string().min(1).describe('SharePoint primary domain (e.g. contoso)'),
siteName: z.string().min(1).describe('SharePoint site name'),
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: The personalAccessToken configuration field was removed from the integration definition (line 22 removed), but the old code relied on it for authentication to the Tables API. The new code uses the client parameter instead, which is correct. However, users who have this field configured will experience a breaking change. Consider:

  1. Adding a migration note in the changelog
  2. Documenting that users should remove this configuration field
  3. Potentially keeping it as optional/deprecated with a warning if still present
Suggested change
siteName: z.string().min(1).describe('SharePoint site name'),
siteName: z.string().min(1).describe('SharePoint site name'),
// Deprecated: This field is no longer used for authentication. Please remove it from your configuration.
personalAccessToken: z
.string()
.optional()
.describe(
'[DEPRECATED] Personal Access Token for legacy authentication. No longer required; will be removed in a future version.'
),

Copilot uses AI. Check for mistakes.
Comment on lines 30 to 46
function detectColumnType(columnIndex: number, rowsData: any[][]): 'number' | 'string' {
let isNumeric = true
let hasData = false

for (const row of rowsData) {
const value = row[columnIndex]
if (value !== undefined && value !== null && value !== '') {
hasData = true
if (isNaN(Number(value))) {
isNumeric = false
break
}
}
}

return hasData && isNumeric ? 'number' : 'string'
}
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: The column type detection in detectColumnType only checks for numeric vs string, but the code at line 315 already handles the case where isNaN(numValue) is true by falling back to string. This means the detection could be overly optimistic - if even one value in the column fails to convert to a number, the column should probably be typed as string.

However, the current implementation only checks during type detection, not during insertion. Consider making the type detection more conservative, or logging warnings when values don't match their detected type.

Copilot uses AI. Check for mistakes.
// Convert to appropriate type based on schema
if (tableSchema?.properties?.[cleanHeader]?.type === 'number') {
const numValue = Number(value)
rowObject[cleanHeader] = isNaN(numValue) ? String(value) : numValue
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: The type coercion fallback behavior is inconsistent. When a value is expected to be a number but isNaN(numValue) is true, the code falls back to String(value) (line 315). However, if the schema says it should be a number, inserting a string could cause validation errors or data corruption.

Consider either:

  1. Logging a warning when type coercion fails
  2. Setting the value to null instead of a string when the schema expects a number
  3. Throwing an error to fail fast

Current code silently inserts incorrect types which could lead to hard-to-debug issues later.

Suggested change
rowObject[cleanHeader] = isNaN(numValue) ? String(value) : numValue
if (isNaN(numValue)) {
rowObject[cleanHeader] = null
logger.forBot().warn(
`Could not coerce value "${value}" for column "${cleanHeader}" to number; setting to null`
)
} else {
rowObject[cleanHeader] = numValue
}

Copilot uses AI. Check for mistakes.
@@ -41,7 +40,7 @@ export default new IntegrationDefinition({
sheetTableMapping: z
.string()
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: The description for sheetTableMapping states "Table names must end with 'Table'" but there's no validation in the code to enforce this requirement. If this is truly a requirement, add validation that throws an error when table names don't end with 'Table'. If it's not actually required, remove it from the description to avoid confusion.

Suggested change
.string()
.string()
.refine((val) => {
// Accept either comma-separated or JSON mapping
let mapping: Record<string, string> = {};
try {
if (val.trim().startsWith('{')) {
mapping = JSON.parse(val);
} else {
val.split(',').forEach(pair => {
const [sheet, table] = pair.split(':').map(s => s.trim());
if (sheet && table) mapping[sheet] = table;
});
}
} catch {
// If parsing fails, let Zod handle invalid format
return false;
}
// Validate all table names
return Object.values(mapping).every(tableName => typeof tableName === 'string' && tableName.endsWith('Table'));
}, {
message: "All table names must end with 'Table'."
})

Copilot uses AI. Check for mistakes.
Comment on lines +14 to +24
"@botpress/client": "1.22.0",
"@botpress/sdk": "4.15.2",
"axios": "^1.8.1",
"@botpress/sdk": "0.11.8",
"xlsx": "^0.18.5"
},
"devDependencies": {
"@botpress/cli": "^0.11.5",
"@types/node": "^18.11.17",
"@botpress/cli": "^4.17.2",
"@types/node": "^20.19.0",
"nodemon": "^3.1.7",
"ts-node": "^10.9.1",
"typescript": "^4.9.4"
"ts-node": "^10.9.2",
"typescript": "^5.8.2"
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Major version bumps in dependencies without checking for breaking changes. The package.json shows:

  • @botpress/sdk from 0.11.8 to 4.15.2 (major version jump)
  • @botpress/client from ^1.13.0 to 1.22.0
  • @botpress/cli from ^0.11.5 to ^4.17.2
  • typescript from ^4.9.4 to ^5.8.2

These are significant version jumps that likely include breaking changes. Ensure:

  1. All breaking changes have been reviewed and addressed
  2. The code has been tested against these new versions
  3. The version pinning strategy is intentional (note the removal of ^ from @botpress/client)

Copilot uses AI. Check for mistakes.
Comment on lines 311 to 320
if (value !== undefined && value !== null && value !== '') {
// Convert to appropriate type based on schema
if (tableSchema?.properties?.[cleanHeader]?.type === 'number') {
const numValue = Number(value)
rowObject[cleanHeader] = isNaN(numValue) ? String(value) : numValue
} else {
rowObject[cleanHeader] = String(value)
}
}
}
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] issue: Empty/whitespace-only values are handled differently depending on column type. For number columns, they become null (line 320), but for string columns they become empty strings ''. This inconsistency could be confusing. Consider using null for all empty values regardless of type, or document why this distinction exists.

Copilot uses AI. Check for mistakes.
Comment on lines 13 to 25
function compareColumns(
excelHeaders: string[],
tableSchema: any
): { columnsToAdd: string[]; columnsToRemove: string[]; unchangedColumns: string[] } {
const cleanExcelHeaders = excelHeaders.map((h) => String(h).trim()).filter((h) => h !== '')
const existingColumns = tableSchema?.properties ? Object.keys(tableSchema.properties) : []

const columnsToAdd = cleanExcelHeaders.filter((header) => !existingColumns.includes(header))
const columnsToRemove = existingColumns.filter((col) => !cleanExcelHeaders.includes(col))
const unchangedColumns = cleanExcelHeaders.filter((header) => existingColumns.includes(header))

return { columnsToAdd, columnsToRemove, unchangedColumns }
}
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] suggestion: The function compareColumns returns three arrays but doesn't document what "unchanged columns" means in the context of synchronization. Adding a JSDoc comment explaining the purpose and return values would improve code maintainability, especially since this is a key function for the auto-sync feature.

Copilot uses AI. Check for mistakes.
})
return rowObject
})
.filter((obj) => Object.keys(obj).length > 0)
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Should empty rows be explicitly skipped or counted? The current logic filters out rows where all columns are empty via .filter((obj) => Object.keys(obj).length > 0) (line 324), but there's no logging to indicate how many rows were skipped. This could be confusing when the user expects N rows but only M are inserted. Consider adding a debug log if rows are filtered out.

Copilot generated this review using guidance from repository custom instructions.
Copy link
Collaborator

@ptrckbp ptrckbp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: any chance all fetch update and insert operations do not require excelHeaders, so that we may separate concerns? That would allow us to move the methods above to ./helpers/botpress-tables.ts and ./helpers/sharepoint-excel.ts or something like that.

  1. get all data from sharepoint. extract all types and data so that nothing is excel specific.
  2. update or create all tables from the types above. -> clear them
  3. populate the data.

These sync integrations are hard to debug when there's a problem and hard to visualize. Work on clarity and separation of concerns is worthwhile.

import { getErrorMessage } from '../utils'

type Logger = Parameters<bp.IntegrationProps['actions']['syncExcelFile']>[0]['logger']
type RealClient = any
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: why the any? and what makes it real?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed! Replaced any with the actual Client type from @botpress/client and renamed RealClient → BotpressClient for clarity.

function compareColumns(
excelHeaders: string[],
tableSchema: any
): { columnsToAdd: string[]; columnsToRemove: string[]; unchangedColumns: string[] } {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: the verb tenses are confusing here, there's a mix of todo (to add) and past tense (unchanged). Use one or the other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed! Changed unchangedColumns → columnsToKeep to match the consistent "todo/future" tense style with columnsToAdd and columnsToRemove.

*/
function compareColumns(
excelHeaders: string[],
tableSchema: any
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: excelHeaders makes it clear we are talking about sharepoint, but tableSchema isn't clear it's Botpress. Also you might want to change the type from any to record, or array at least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed! Renamed tableSchema → botpressTableSchema throughout for clarity. Also replaced any type with Table['schema'] (imported from @botpress/client) for proper type safety.

/**
* Detects the type of a column based on its data values
*/
function detectColumnType(columnIndex: number, rowsData: any[][]): 'number' | 'string' {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: why is this being run twice? should we not detect columns once when importing from sharepoint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed! You're right, it was being called twice: once when creating new tables and again when adding new columns to existing tables. Refactored to detect column types once during Excel parsing in parseExcelSheet(), then pass the typed ColumnDefinition[] to all table operations. This eliminated the redundant type detection.


// Fetch and update schema if needed
let tableSchema = await fetchTableSchema(realClient, tableId, tableName, logger)
tableSchema = await updateTableSchema(realClient, tableId, tableName, excelHeaders, rowsData, tableSchema, logger)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: any chance all fetch update and insert operations do not require excelHeaders, so that we may separate concerns? That would allow us to move the methods above to ./helpers/botpress-tables.ts and ./helpers/sharepoint-excel.ts or something like that.

  1. get all data from sharepoint. extract all types and data so that nothing is excel specific.
  2. update or create all tables from the types above. -> clear them
  3. populate the data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Refactored to fully separate concerns:

  • Created helpers/sharepoint-excel.ts - parses Excel, detects types once, returns ParsedSheetData (columns + typed rows)
  • Created helpers/botpress-tables.ts - all table operations (create, update, insert) with zero Excel dependencies
  • sync-excel-file.ts now orchestrates: parse Excel → ensure table exists → update schema → insert rows

All Botpress functions now work with ColumnDefinition[] and Record<string, string | number>[] instead of excelHeaders and raw arrays.

…e operations

- Replace `any` types with proper types (`Client`, `Table['schema']`)
- Rename `RealClient` → `BotpressClient` for clarity
- Rename `tableSchema` → `botpressTableSchema` throughout
- Rename `unchangedColumns` → `columnsToKeep` for consistent verb tense
- Detect column types once during Excel parsing (not twice)
- Create `helpers/types.ts` with `ColumnDefinition` and `ParsedSheetData`
- Create `helpers/sharepoint-excel.ts` for Excel-specific parsing logic
- Create `helpers/botpress-tables.ts` for Botpress table operations
- Remove Excel dependencies from all Botpress table functions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants