- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.2k
reduce CGO calls when scanning rows #1291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| 
 Edit: this PR now includes #1294 since it fixes the CIFuzz and macOS tests. | 
aa590cf    to
    fd9d096      
    Compare
  
    | 
 Codecov ReportAttention: Patch coverage is  
 
 ❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@            Coverage Diff             @@
##           master    #1291      +/-   ##
==========================================
- Coverage   47.16%   47.14%   -0.02%     
==========================================
  Files          12       12              
  Lines        1533     1542       +9     
==========================================
+ Hits          723      727       +4     
- Misses        669      672       +3     
- Partials      141      143       +2     ☔ View full report in Codecov by Sentry. | 
c65c946    to
    fb4206b      
    Compare
  
    | Note: this PR includes #1294 since it fixes the CIFuzz and macOS tests. | 
fb4206b    to
    8cf69a9      
    Compare
  
    | // _sqlite3_column_decltypes stores the declared column type in the typs array. | ||
| // This function must always be called before _sqlite3_column_types since it | ||
| // overwrites the datatype. | ||
| static void _sqlite3_column_decltypes(sqlite3_stmt* stmt, uint8_t *typs, int ntyps) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it somewhat confusing that this ends up using one integer to contain both the derived declared type and also the real column type.
How much savings would there be relative to today if instead of doing this, we just added a function like static int _sqlite3_column_derived_decltype(sqlite3_stmt* stmt,int iCol)? Then that could return values from your "enum" (0, GO_SQLITE3_DECL_BOOL, or GO_SQLITE3_DECL_DATE), and would not need any bitmasking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it somewhat confusing that this ends up using one integer to contain both the derived declared type and also the real column type.
Yeah, it's a bit tricky. I originally used two slices to store the derived declared type and real type separately, but switching to using one slice and a the bit mask since it saves an alloc.
How much savings would there be relative to today if instead of doing this, we just added a function like static int _sqlite3_column_derived_decltype(sqlite3_stmt* stmt,int iCol)? Then that could return values from your "enum" (0, GO_SQLITE3_DECL_BOOL, or GO_SQLITE3_DECL_DATE), and would not need any bitmasking.
Most of the performance improvement here comes from batching the CGO calls needed to get the column's declared and real type (basically, the performance improvement comes from reducing the number of CGO calls - see BenchmarkStmt10Cols). So going back to having to call a CGO function for each column would negate most of the performance improvements here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To follow up on my above comment, BenchmarkStmt10Cols shows how batching CGO calls instead of performing them for each column improves performance (here we see a 30% perf improvement, which roughly scales with the number of columns being scanned).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much savings would there be relative to today if instead of doing this, we just added a function like static int _sqlite3_column_derived_decltype(sqlite3_stmt* stmt,int iCol)? Then that could return values from your "enum" (0, GO_SQLITE3_DECL_BOOL, or GO_SQLITE3_DECL_DATE), and would not need any bitmasking.
Just to clarify, my above comments assume that you're suggesting we call this new function in the per-column loop in bind (that is we call it for each column instead of batching the calls). If that is the case then it would negate most of the performance gains here. If you meant something else could you elaborate (just want to make sure we're on the same page here)?
8cf69a9    to
    8338d7f      
    Compare
  
    8338d7f    to
    008c11f      
    Compare
  
    This commit improves the performance of queries by at least 20% by only
collecting the information needed to convert sqlite3 values to Go values
and by batching CGO calls when scanning rows (the performance
improvement scales with the number of columns being scanned).
This commit adds a new coltype field to the SQLiteRows struct which
stores the declared column type (either data/time or boolean) and the
sqlite3 datatype. Previously, this library would fetch the string
representation of each column, which is inefficient and rarely needed
since the non-standard SQLiteRows.DeclTypes method is rarely called.
It also changes the benchmark suite to use an in-memory database since
we do not want the file system interfering with benchmark results.
goos: darwin
goarch: arm64
pkg: github.com/mattn/go-sqlite3
cpu: Apple M1 Max
                             │ base.10.txt  │             new.10.txt              │
                             │    sec/op    │   sec/op     vs base                │
CustomFunctions-10              3.318µ ± 2%   3.115µ ± 2%   -6.10% (p=0.000 n=10)
Suite/BenchmarkExec-10          1.236µ ± 1%   1.240µ ± 2%        ~ (p=0.617 n=10)
Suite/BenchmarkQuery-10         4.004µ ± 7%   3.363µ ± 2%  -16.02% (p=0.000 n=10)
Suite/BenchmarkParams-10        4.241µ ± 1%   3.758µ ± 2%  -11.40% (p=0.000 n=10)
Suite/BenchmarkStmt-10          2.830µ ± 0%   2.378µ ± 2%  -15.97% (p=0.000 n=10)
Suite/BenchmarkRows-10          126.3µ ± 1%   101.3µ ± 1%  -19.79% (p=0.000 n=10)
Suite/BenchmarkStmtRows-10      124.9µ ± 1%   100.5µ ± 2%  -19.56% (p=0.000 n=10)
Suite/BenchmarkStmt10Cols-10   10.130µ ± 0%   7.042µ ± 1%  -30.48% (p=0.000 n=10)
geomean                         8.655µ        7.328µ       -15.33%
                             │ base.10.txt  │               new.10.txt               │
                             │     B/op     │     B/op      vs base                  │
CustomFunctions-10               568.0 ± 0%     576.0 ± 0%   +1.41% (p=0.000 n=10)
Suite/BenchmarkExec-10           128.0 ± 0%     128.0 ± 0%        ~ (p=1.000 n=10) ¹
Suite/BenchmarkQuery-10          688.0 ± 0%     648.0 ± 0%   -5.81% (p=0.000 n=10)
Suite/BenchmarkParams-10       1.078Ki ± 0%   1.031Ki ± 0%   -4.35% (p=0.000 n=10)
Suite/BenchmarkStmt-10           920.0 ± 0%     872.0 ± 0%   -5.22% (p=0.000 n=10)
Suite/BenchmarkRows-10         9.305Ki ± 0%   9.188Ki ± 0%   -1.26% (p=0.000 n=10)
Suite/BenchmarkStmtRows-10     9.289Ki ± 0%   9.164Ki ± 0%   -1.35% (p=0.000 n=10)
Suite/BenchmarkStmt10Cols-10     992.0 ± 0%     696.0 ± 0%  -29.84% (p=0.000 n=10)
geomean                        1.181Ki        1.106Ki        -6.35%
¹ all samples are equal
                             │ base.10.txt │              new.10.txt              │
                             │  allocs/op  │ allocs/op   vs base                  │
CustomFunctions-10              18.00 ± 0%   18.00 ± 0%        ~ (p=1.000 n=10) ¹
Suite/BenchmarkExec-10          7.000 ± 0%   7.000 ± 0%        ~ (p=1.000 n=10) ¹
Suite/BenchmarkQuery-10         23.00 ± 0%   23.00 ± 0%        ~ (p=1.000 n=10) ¹
Suite/BenchmarkParams-10        27.00 ± 0%   27.00 ± 0%        ~ (p=1.000 n=10) ¹
Suite/BenchmarkStmt-10          25.00 ± 0%   25.00 ± 0%        ~ (p=1.000 n=10) ¹
Suite/BenchmarkRows-10          525.0 ± 0%   519.0 ± 0%   -1.14% (p=0.000 n=10)
Suite/BenchmarkStmtRows-10      524.0 ± 0%   518.0 ± 0%   -1.15% (p=0.000 n=10)
Suite/BenchmarkStmt10Cols-10    39.00 ± 0%   19.00 ± 0%  -51.28% (p=0.000 n=10)
geomean                         46.26        42.17        -8.86%
¹ all samples are equal
    008c11f    to
    e1a8f32      
    Compare
  
    | @rittneje any chance you could take another look at this PR? I created it because I have a project that needs these performance gains and would like to get it off my fork. | 
This commit improves the performance of queries by at least 15% by only collecting the information needed to convert sqlite3 values to Go values and by batching CGO calls when scanning rows (the performance improvement scales with the number of columns being scanned).
This commit adds a new coltype field to the SQLiteRows struct which stores the declared column type (either data/time or boolean) and the sqlite3 datatype. Previously, this library would fetch the string representation of each column, which is inefficient and rarely needed since the non-standard SQLiteRows.DeclTypes method is rarely called.
It also changes the benchmark suite to use an in-memory database since we do not want the file system interfering with benchmark results.