Hi! 👋 I am Philippe Bich, a Research Scientist at Huawei Research in Zürich, where I am part of the AI Team and focus on model compression and quantization for LLMs/VLMs.
Before joining Huawei, I completed my Ph.D. at Politecnico di Torino under the supervision of Prof. Gianluca Setti, working on AI model compression and on making deep neural networks more efficient for resource-constrained platforms. During my Master’s thesis at the Boston University Robotics Lab with Prof. John Baillieul, I built a strong interest in AI at the edge, which later guided my doctoral research.
On this page, I try to keep track of my most recent works, talks, and publications. Feel free to reach out if any of it sparks your curiosity!
🔥 News
-
Jun 2026
🚀 Coming soon: “KVarN: Variance-Normalized KV-Cache Quantization Mitigates Error Accumulation in Reasoning Tasks”. Stay tuned!
-
May 2026
🎉 My paper “SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights” has been accepted at ICML 2026!
-
Feb 2026
🤗 SINQ is now integrated into Hugging Face Transformers! Check out the code and docs on GitHub.
📝 Selected Publications
-
Coming soon
KVarN: Variance-Normalized KV-Cache Quantization Mitigates Error Accumulation in Reasoning Tasks Preprint, 2026.TL;DR: A novel state-of-the-art variance-normalized KV-cache quantization scheme that limits compounding error in long reasoning traces and beats TurboQuant by Google Research with better accuracy and lower bits.
-
ICML 2026
SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights International Conference on Machine Learning (ICML), 2026.TL;DR: A calibration-free quantization method based on Sinkhorn normalization that delivers strong low-bit LLM weights out of the box. Integrated into 🤗 Hugging Face Transformers.
-
IEEE TPAMI 2025
On the Universal Approximation Properties of Deep Neural Networks using MAM Neurons IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025.TL;DR: Theoretical foundations showing that Multiply-And-Max/min (MAM) neurons preserve the universal approximation property while enabling aggressive structured pruning.
📖 Education
-
Politecnico di Torino- Ph.D. in Electrical, Electronics and Communications Engineering
- M.Sc. in Mechatronics Engineering — 110/110 cum laude
- B.Sc. in Computer Engineering — 109/110
📚 All Publications
-
ICML
SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights
-
J-STARS
FOREST-GC: A conFOrmable Rendering Engine for Synthetic Tree Generation and Counting
-
TNNLS
A Multiply-And-Max/min Neuron Paradigm for Aggressively Prunable Deep Neural Networks
-
TPAMI
On the Universal Approximation Properties of Deep Neural Networks using MAM Neurons
-
MLJ
Linearly-Interpretable Concept Embedding Models for Text Analysis
-
xAI
V-CEM: Bridging Performance and Intervenability in Concept-based Models
-
ECML PKDD
Towards Better Generalization and Interpretability in Unsupervised Concept-Based Models
-
TCAS-II
MESA: A Dynamical Attention-based Pre-processing Pipeline for High-throughput Event-based Computer Vision Tasks
-
CVPRW
Event-based Eye Tracking: AIS 2024 Challenge Survey
-
BioCAS
Memory in Motion: Exploring Leaky Integration of Time Surfaces for Event-Based Eye-Tracking
-
AICAS
Optimizing Vision Transformers: Leveraging Max and Min Operations for Efficient Pruning
-
CVPRW
Pedro: an Event-based Dataset for Person Detection in Robotics
-
MWSCAS
Multiply-and-Max/min Neurons at the Edge: Pruned Autoencoder Implementation
-
ICRA
Visual Navigation Using Sparse Optical Flow and Time-to-Transit
-
BioCAS
Aggressively Prunable MAM²-based Deep Neural Oracle for ECG Acquisition by Compressed Sensing