Full disclosure first - I'm the lead author on the UMI-tools paper you reference for the 'directional' method.

Some really interesting approaches here, e.g truncating the UMIs to induce collisions. I was especially interested in the observations regarding how using the incorrect method to identify the true cell barcodes can introduce false clusters of cells. This could have some serious implications and I'd be very interested to see if there are more concrete examples of this in published literature.

I've got a question about fig S5. It appears that as the UMI length is increased towards its true length (10bp), the performance of directional relative to your method, "Bayesian" improves dramatically. Indeed, with a UMI length of 9bp (the closest one can get to the true UMI length here), directional is the most accurate. Given this, isn't the statement in the manuscript that ' We found that the Bayesian approach proposed here significantly outperforms existing method' a bit missleading. The Bayesian approach does appear to outperform existing methods when the UMI is short (<=8bp). However, one couldn't conclude it would be more accurate than directional for the very data used here for testing. What's more, the dataset used is probably fairly representative of the majority of future datasets. So it would be more accurate to say that the Bayesian approach appears to be more accurate when the UMIs is < 9bp and to make clear this is not the case for e.g 10X.