Hi! 👋 I am Philippe Bich, a Research Scientist at Huawei Research in Zürich, where I am part of the AI Team and focus on model compression and quantization for LLMs/VLMs.

Before joining Huawei, I completed my Ph.D. at Politecnico di Torino under the supervision of Prof. Gianluca Setti, working on AI model compression and on making deep neural networks more efficient for resource-constrained platforms. During my Master’s thesis at the Boston University Robotics Lab with Prof. John Baillieul, I built a strong interest in AI at the edge, which later guided my doctoral research.

On this page, I try to keep track of my most recent works, talks, and publications. Feel free to reach out if any of it sparks your curiosity!

🔥 News

Jun 2026
🚀 Coming soon: “KVarN: Variance-Normalized KV-Cache Quantization Mitigates Error Accumulation in Reasoning Tasks”. Stay tuned!
May 2026
🎉 My paper “SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights” has been accepted at ICML 2026!
Feb 2026
🤗 SINQ is now integrated into Hugging Face Transformers! Check out the code and docs on GitHub.

📝 Selected Publications

Coming soon

KVarN: Variance-Normalized KV-Cache Quantization Mitigates Error Accumulation in Reasoning Tasks Lorenz K. Mueller, Philippe Bich, Chiara Boretti, Hyun Min Chang, Jiawei Zhuang, Lukas Cavigelli Preprint, 2026.
TL;DR: A novel state-of-the-art variance-normalized KV-cache quantization scheme that limits compounding error in long reasoning traces and beats TurboQuant by Google Research with better accuracy and lower bits.
ICML 2026

SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights Lorenz K. Mueller, Philippe Bich, Jiawei Zhuang, Ahmet Çalik, Luca Benfenati, Lukas Cavigelli International Conference on Machine Learning (ICML), 2026.
TL;DR: A calibration-free quantization method based on Sinkhorn normalization that delivers strong low-bit LLM weights out of the box. Integrated into 🤗 Hugging Face Transformers.
IEEE TPAMI 2025

On the Universal Approximation Properties of Deep Neural Networks using MAM Neurons Philippe Bich, Andriy Enttsel, Luciano Prono, Alex Marchioni, Fabio Pareschi, Mauro Mangia, Gianluca Setti, Riccardo Rovatti IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025.
TL;DR: Theoretical foundations showing that Multiply-And-Max/min (MAM) neurons preserve the universal approximation property while enabling aggressive structured pruning.

Browse all publications

📖 Education

Politecnico di Torino
- Ph.D. in Electrical, Electronics and Communications Engineering Nov 2021 – 2025 · Advisor: Prof. Gianluca Setti · AI model compression and quantization
- M.Sc. in Mechatronics Engineering — 110/110 cum laude 2018 – 2021 · Master's thesis at the Boston University Robotics Lab with Prof. John Baillieul
- B.Sc. in Computer Engineering — 109/110 2015 – 2018

📚 All Publications

2026

ICML
SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights International Conference on Machine Learning (ICML), 2026.
J-STARS
FOREST-GC: A conFOrmable Rendering Engine for Synthetic Tree Generation and Counting IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2026.

2025

TNNLS
A Multiply-And-Max/min Neuron Paradigm for Aggressively Prunable Deep Neural Networks IEEE Transactions on Neural Networks and Learning Systems, vol. 36, no. 8, pp. 14414–14427, 2025.
TPAMI
On the Universal Approximation Properties of Deep Neural Networks using MAM Neurons IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025.
MLJ
Linearly-Interpretable Concept Embedding Models for Text Analysis Machine Learning, vol. 114, no. 10, art. 224, 2025.
xAI
V-CEM: Bridging Performance and Intervenability in Concept-based Models World Conference on Explainable Artificial Intelligence (xAI), pp. 48–67, 2025.
ECML PKDD
Towards Better Generalization and Interpretability in Unsupervised Concept-Based Models Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD), 2025.
TCAS-II
MESA: A Dynamical Attention-based Pre-processing Pipeline for High-throughput Event-based Computer Vision Tasks IEEE Transactions on Circuits and Systems II: Express Briefs, 2025.

2024

CVPRW
Event-based Eye Tracking: AIS 2024 Challenge Survey CVPR 2024 Workshops — AIS: Vision, Graphics and AI for Streaming.
BioCAS
Memory in Motion: Exploring Leaky Integration of Time Surfaces for Event-Based Eye-Tracking IEEE Biomedical Circuits and Systems Conference (BioCAS), pp. 1–5, 2024.
AICAS
Optimizing Vision Transformers: Leveraging Max and Min Operations for Efficient Pruning IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), 2024.

2023

CVPRW
Pedro: an Event-based Dataset for Person Detection in Robotics CVPR 2023 Workshops — 4th International Workshop on Event-Based Vision.
MWSCAS
Multiply-and-Max/min Neurons at the Edge: Pruned Autoencoder Implementation IEEE International Midwest Symposium on Circuits and Systems (MWSCAS), 2023.

2022

ICRA
Visual Navigation Using Sparse Optical Flow and Time-to-Transit IEEE International Conference on Robotics and Automation (ICRA), pp. 9397–9403, 2022.
BioCAS
Aggressively Prunable MAM²-based Deep Neural Oracle for ECG Acquisition by Compressed Sensing IEEE Biomedical Circuits and Systems Conference (BioCAS), pp. 163–167, 2022.