Many people seem to know the part of optimizing cost and performance by fetching data without reading unnecessary parts in big queries using partition columns.
There are many advantages to using partitioned columns as above, but there may be cases where you cannot.
If there are several date columns in the table, the case where the date column to be used is not partitioned is as follows. Big query does not provide multiple indexes (partition columns) like regular RDBs. Therefore, we need to improve the performance through one partition column. In this case, a very difficult situation arises.
If you get a value into the date column as the basis, the data doesn't fit, but if you have to write it because of performance or cost, it doesn't make sense because you have to write strange data as it is. Therefore, if you retrieve data through the following methods, you can capture both performance and cost and get accurate data.
This method is quick to understand if you look through the picture as follows.
TODAY column: 2019-03-01, 2019-03-02… Load together in date format LOGTIME column (partition: 2019-03-01 12:30:20 loaded in datetime format What should I do if I need to aggregate and show the number of cases by date from 2020-02-25 to 2020-03-03 by referring to the above?
The cost has been reduced tremendously. Will the results be the same?
Of course, the result is exactly the same. As you can see above, we take LOGTIME back and forth +- 1 day more (in case you missed it) and filter it back to TODAY. This allows you to get completely accurate data while lowering your costs by significantly reducing the range.
This is a very extreme scenario that I made, but there are several scenarios, and you can see a real example of using it in the post I wrote earlier. (Go to Example)
This has helped me a lot by reducing search costs by over 100 times. In particular, in the case of big queries, this seems to be very helpful because small companies often use a daily quota limit.