August 18, 2022
More to come on this post. Here’s a summary of what recently happened.
I work with SQL. Although I don’t call myself an absolute expert with SQL, I’m pretty good. I hired a true expert. We had to figure out why two datasets didn’t match.
That’s perfect for CrushErrors.com, but all of our data was in SQL and we considered ourselves SQL experts, so we initially opted to try to identify discrepancies in SQL.
First problem was that AWS’ SQL doesn’t have OUTER JOIN. So we need to UNION a LEFT and RIGHT join. But it quickly became way too cumbersome. Within 20 minutes, I said, let’s do this in Crush.
We exported about 70,000 rows for each dataset (A and B).
We ran CrushErrors on it. It immediately identified roughly 700 rows in data A that wasn’t in data B. It also figured out that all the rows in B were already in A.
That was the OUTER JOIN. We studied a few of these items and that helped us greatly to understand why A and B did not match perfectly.
Then, we learned additional information from the INNER JOIN portion. To summarize, Crush did an INNER and OUTER join at the same time.
But Crush’s INNER is nothing like SQL. It’s much more informative. We only had two columns to compare. One column was a label and the other column was a number. Just about as easy as it can get.
So the 95% of the INNER JOIN results showed the rows matched perfectly. Same label, same number.
But the remaining 5% showed that the SUM matched but not the count of the rows. For example, for A rows, a particular label a +1, -1, and a +1 so the total equaled +1. The B side, for that label was one row that equaled +1. Crush not only figured out they were INNER JOIN matches but explained WHY they were INNER JOIN matches.
My SQL expert was blown away!