Skip to content

Conversation

@jackjii79
Copy link
Contributor

@jackjii79 jackjii79 commented Jan 2, 2026

https://github.com/h2oai/h2oai/issues/34827

Due to the complexity of TabPFN, automation regression is skipped, manual testing results show down below

Copilot AI review requested due to automatic review settings January 2, 2026 22:00
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces three new TabPFN-based transformers for Driverless AI that leverage pre-trained TabPFN models for outlier detection and embedding generation. The implementation includes both transformer and model components for unsupervised outlier detection, along with a supervised embedding transformer.

Key Changes:

  • Adds TabPFN-based outlier detection transformer with chunked processing and memory optimization
  • Implements TabPFN embedding transformer using supervised learning with SVD dimensionality reduction
  • Introduces unsupervised outlier detection model with Random Forest-based feature selection and density-aware sampling

Reviewed changes

Copilot reviewed 1 out of 1 changed files in this pull request and generated 10 comments.

File Description
transformers/outliers/tabpfn_outlier.py Implements outlier detection transformer with chain-rule probability estimation across feature permutations, supporting chunked processing for large datasets
transformers/generic/tabpfn_embedding.py Provides supervised embedding extraction from TabPFN models with automatic classification/regression detection and SVD-based dimensionality reduction
models/unsupervised/tabpfn_outlier.py Implements unsupervised outlier model with surrogate RF for feature selection, density-aware sampling, and score calibration for probabilistic interpretation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 644 to 646
finals = None
if full > final_output.shape[0]:
finals = np.full((full, 2 if self.return_flag else 1,), fill_value=0.0, dtype=np.float32)
Copy link

Copilot AI Jan 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable name 'finals' is unclear and doesn't convey its purpose. A more descriptive name like 'full_output' or 'padded_output' would better indicate that this array holds the full dataset with zero-filled values for unsampled rows.

Copilot uses AI. Check for mistakes.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 20 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jackjii79
Copy link
Contributor Author

Testing results:

  • TabPFNEmbeddingTransformer
    ✅ Multiclassification
    ✅ Binary
    ✅ Regression
Screenshot 2026-01-05 at 8 36 10 AM

@jackjii79
Copy link
Contributor Author

TabPFNOutlierScorerModel
✅ Unsupervised learning

Screenshot 2026-01-05 at 9 08 46 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant