This time, I wrote this article after experiencing a very big(?) problem. It was my first time experiencing a problem while using Databricks, and above all, I had a very hard time resolving it because there was no related information.
To conclude, it was a bug in Databricks. In the end, I opened a CASE and the backend engineer at the headquarters resolved the problem.
⚠️ Problem found
We have a task that does DELETE. But one day, the following ERROR occurred and the DELETE failed. The message is as follows:
[DELTA_DELETION_VECTOR_SIZE_MISMATCH] Deletion vector integrity check failed. Encountered a size mismatch.
First, I instinctively googled, tried various measures, and even asked the AI provided by Databricks about the problem. In the end, it was said to do REFRESH TABLE, and after executing it, I executed DELETE again, but it did not work due to the same problem.
🛠️ Symptoms
Here is a list of symptoms. Some were possible, some were not. Some were only partially possible.
And there was also an ERROR message that said there was a problem reading the file, so I found the file, downloaded it, and read it directly, but there was no problem at all. In conclusion, it did not seem to be a problem with the file.
💡 Conclusion
And as for the ERROR message, it was a problem that could not be solved in conclusion. Databricks admitted it was a bug, and after fixing it, I confirmed that everything was working normally. In the end, after struggling for 3~4 days, I opened the CASE, and it said it was a bug right away, so I was a little... discouraged. So if there is no Google or documentation, don't worry and open the CASE. There was also an article that said there was a problem reading this file, so I looked for the file, downloaded it, and read it directly, but there was no problem at all. In conclusion, it didn't seem to be a file problem.