Skip to content

Conversation

@mhauru
Copy link
Member

@mhauru mhauru commented Oct 30, 2025

More aggressively tighten element types after each setindex_internal!!. This fixes performance for some models, most notably the Loop univariate 10k that showed substantial disadvantage compared to Metadata in still in #1098.

Also, add tests for tighten_ and loosen_types!!. These become more important since we are calling those two functions all the time now, and they must be compile-time no-ops for this to not cause a large overhead. Tests check this the best they can (I've checked more thoroughly manually with code_typed).

Also fix a type instability in loosen_types!! caught by the new tests.

Benchmarking now, depending on results will either mark this ready or add more performance improvements to this if needed.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 30, 2025

Benchmark Report for Commit 48e93ef

Computer Information

Julia Version 1.11.7
Commit f2b3dbda30a (2025-09-08 12:10 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × AMD EPYC 7763 64-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)

Benchmark Results

┌───────────────────────┬───────┬─────────────┬───────────────────┬────────┬────────────────┬─────────────────┐
│                 Model │   Dim │  AD Backend │           VarInfo │ Linked │ t(eval)/t(ref) │ t(grad)/t(eval) │
├───────────────────────┼───────┼─────────────┼───────────────────┼────────┼────────────────┼─────────────────┤
│ Simple assume observe │     1 │ forwarddiff │             typed │  false │            6.5 │             1.7 │
│           Smorgasbord │   201 │ forwarddiff │             typed │  false │          708.2 │            43.5 │
│           Smorgasbord │   201 │ forwarddiff │ simple_namedtuple │   true │          413.6 │            60.1 │
│           Smorgasbord │   201 │ forwarddiff │           untyped │   true │          781.5 │            37.0 │
│           Smorgasbord │   201 │ forwarddiff │       simple_dict │   true │         7463.0 │            27.1 │
│           Smorgasbord │   201 │ forwarddiff │      typed_vector │   true │          745.5 │            41.4 │
│           Smorgasbord │   201 │ forwarddiff │    untyped_vector │   true │          804.6 │            37.6 │
│           Smorgasbord │   201 │ reversediff │             typed │   true │          894.6 │            45.6 │
│           Smorgasbord │   201 │    mooncake │             typed │   true │          701.8 │             5.7 │
│           Smorgasbord │   201 │      enzyme │             typed │   true │          872.6 │             3.9 │
│    Loop univariate 1k │  1000 │    mooncake │             typed │   true │         3849.6 │             5.7 │
│       Multivariate 1k │  1000 │    mooncake │             typed │   true │          977.4 │             9.0 │
│   Loop univariate 10k │ 10000 │    mooncake │             typed │   true │        42416.6 │             5.3 │
│      Multivariate 10k │ 10000 │    mooncake │             typed │   true │         8637.8 │             9.9 │
│               Dynamic │    10 │    mooncake │             typed │   true │          119.8 │            11.2 │
│              Submodel │     1 │    mooncake │             typed │   true │            8.5 │             6.5 │
│                   LDA │    12 │ reversediff │             typed │   true │          996.4 │             2.0 │
└───────────────────────┴───────┴─────────────┴───────────────────┴────────┴────────────────┴─────────────────┘

@codecov
Copy link

codecov bot commented Oct 30, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.23%. Comparing base (80cf12d) to head (48e93ef).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1102      +/-   ##
==========================================
+ Coverage   81.17%   81.23%   +0.05%     
==========================================
  Files          40       40              
  Lines        3793     3805      +12     
==========================================
+ Hits         3079     3091      +12     
  Misses        714      714              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Contributor

DynamicPPL.jl documentation for PR #1102 is available at:
https://TuringLang.github.io/DynamicPPL.jl/previews/PR1102/

@mhauru
Copy link
Member Author

mhauru commented Oct 30, 2025

Current benchmarks:

┌───────────────────────┬───────┬─────────────┬────────────────┬────────┬────────────────┬─────────────────┐
│                 Model │   Dim │  AD Backend │        VarInfo │ Linked │ t(eval)/t(ref) │ t(grad)/t(eval) │
├───────────────────────┼───────┼─────────────┼────────────────┼────────┼────────────────┼─────────────────┤
│ Simple assume observe │     1 │    mooncake │          typed │  false │            4.5 │             5.8 │
│ Simple assume observe │     1 │    mooncake │   typed_vector │  false │            8.1 │             6.6 │
│ Simple assume observe │     1 │    mooncake │        untyped │  false │           36.0 │             1.6 │
│ Simple assume observe │     1 │    mooncake │ untyped_vector │  false │            6.3 │             7.3 │
│           Smorgasbord │   201 │ reversediff │          typed │  false │          370.5 │            50.8 │
│           Smorgasbord │   201 │ reversediff │   typed_vector │  false │          407.4 │            46.5 │
│           Smorgasbord │   201 │ reversediff │        untyped │  false │         4072.0 │             4.6 │
│           Smorgasbord │   201 │ reversediff │ untyped_vector │  false │          336.3 │            56.4 │
│    Loop univariate 1k │  1000 │    mooncake │          typed │   true │         1742.8 │             4.7 │
│    Loop univariate 1k │  1000 │    mooncake │   typed_vector │   true │         1822.0 │             4.7 │
│    Loop univariate 1k │  1000 │    mooncake │        untyped │   true │         1779.7 │            18.0 │
│    Loop univariate 1k │  1000 │    mooncake │ untyped_vector │   true │         1818.4 │             4.6 │
│       Multivariate 1k │  1000 │    mooncake │          typed │   true │          425.4 │             8.4 │
│       Multivariate 1k │  1000 │    mooncake │   typed_vector │   true │          437.9 │             8.6 │
│       Multivariate 1k │  1000 │    mooncake │        untyped │   true │         1625.9 │             2.3 │
│       Multivariate 1k │  1000 │    mooncake │ untyped_vector │   true │          419.1 │             8.2 │
│   Loop univariate 10k │ 10000 │    mooncake │          typed │   true │        17784.3 │             4.9 │
│   Loop univariate 10k │ 10000 │    mooncake │   typed_vector │   true │        19867.9 │             4.7 │
│   Loop univariate 10k │ 10000 │    mooncake │        untyped │   true │        19939.9 │            18.1 │
│   Loop univariate 10k │ 10000 │    mooncake │ untyped_vector │   true │        18634.1 │             4.7 │
│      Multivariate 10k │ 10000 │    mooncake │          typed │   true │         3693.4 │             9.4 │
│      Multivariate 10k │ 10000 │    mooncake │   typed_vector │   true │         3759.9 │             9.2 │
│      Multivariate 10k │ 10000 │    mooncake │        untyped │   true │        14833.3 │             2.4 │
│      Multivariate 10k │ 10000 │    mooncake │ untyped_vector │   true │         3545.0 │             9.5 │
│               Dynamic │    10 │    mooncake │          typed │   true │           71.1 │             6.2 │
│               Dynamic │    10 │    mooncake │   typed_vector │   true │           97.1 │             5.7 │
│               Dynamic │    10 │    mooncake │ untyped_vector │   true │           78.2 │             6.8 │
│              Submodel │     1 │    mooncake │          typed │   true │            5.4 │             5.2 │
│              Submodel │     1 │    mooncake │   typed_vector │   true │            9.9 │             5.7 │
│              Submodel │     1 │    mooncake │        untyped │   true │            4.5 │            10.8 │
│              Submodel │     1 │    mooncake │ untyped_vector │   true │            8.1 │             5.7 │
│                   LDA │    12 │ reversediff │          typed │   true │          468.5 │             2.0 │
│                   LDA │    12 │ reversediff │   typed_vector │   true │          497.3 │             1.9 │
└───────────────────────┴───────┴─────────────┴────────────────┴────────┴────────────────┴─────────────────┘

Still not happy with that overhead, will try to understand it better and fix.

@yebai
Copy link
Member

yebai commented Oct 31, 2025

Thanks, Markus. To clarify, is the below accurate?

  • typed = typed varinfo with namedtuple of metadata
  • untyped = varinfo with Metadata
  • typed vector = varinfo with namedtuple of VNV (ie, varnamedvector)
  • untyped vector = varinfo with VNV

@mhauru
Copy link
Member Author

mhauru commented Oct 31, 2025

Yes, that's correct.

@mhauru
Copy link
Member Author

mhauru commented Oct 31, 2025

Some of the overheads turned out to be in unflatten, where some unnecessary recontiguification was being done. With that fixed:

┌───────────────────────┬───────┬─────────────┬────────────────┬────────┬────────────────┬─────────────────┐
│                 Model │   Dim │  AD Backend │        VarInfo │ Linked │ t(eval)/t(ref) │ t(grad)/t(eval) │
├───────────────────────┼───────┼─────────────┼────────────────┼────────┼────────────────┼─────────────────┤
│ Simple assume observe │     1 │    mooncake │          typed │  false │           11.2 │             5.6 │
│ Simple assume observe │     1 │    mooncake │   typed_vector │  false │           11.2 │             7.8 │
│ Simple assume observe │     1 │    mooncake │        untyped │  false │           87.7 │             1.7 │
│ Simple assume observe │     1 │    mooncake │ untyped_vector │  false │            6.7 │            11.3 │
│           Smorgasbord │   201 │ reversediff │          typed │  false │          921.6 │            51.3 │
│           Smorgasbord │   201 │ reversediff │   typed_vector │  false │          944.1 │            50.3 │
│           Smorgasbord │   201 │ reversediff │        untyped │  false │         9901.3 │             4.8 │
│           Smorgasbord │   201 │ reversediff │ untyped_vector │  false │          802.4 │            59.9 │
│    Loop univariate 1k │  1000 │    mooncake │          typed │   true │         4385.3 │             4.8 │
│    Loop univariate 1k │  1000 │    mooncake │   typed_vector │   true │         4524.7 │             4.9 │
│    Loop univariate 1k │  1000 │    mooncake │        untyped │   true │         4464.0 │            17.5 │
│    Loop univariate 1k │  1000 │    mooncake │ untyped_vector │   true │         4377.5 │             4.7 │
│       Multivariate 1k │  1000 │    mooncake │          typed │   true │         1049.7 │             8.4 │
│       Multivariate 1k │  1000 │    mooncake │   typed_vector │   true │         1061.0 │             8.4 │
│       Multivariate 1k │  1000 │    mooncake │        untyped │   true │         4075.2 │             2.3 │
│       Multivariate 1k │  1000 │    mooncake │ untyped_vector │   true │         1000.3 │             8.5 │
│   Loop univariate 10k │ 10000 │    mooncake │          typed │   true │        44469.4 │             5.0 │
│   Loop univariate 10k │ 10000 │    mooncake │   typed_vector │   true │        45690.0 │             5.1 │
│   Loop univariate 10k │ 10000 │    mooncake │        untyped │   true │        49740.4 │            16.6 │
│   Loop univariate 10k │ 10000 │    mooncake │ untyped_vector │   true │        44590.8 │             5.0 │
│      Multivariate 10k │ 10000 │    mooncake │          typed │   true │         9332.6 │             9.3 │
│      Multivariate 10k │ 10000 │    mooncake │   typed_vector │   true │         9323.7 │             9.3 │
│      Multivariate 10k │ 10000 │    mooncake │        untyped │   true │        37636.3 │             2.5 │
│      Multivariate 10k │ 10000 │    mooncake │ untyped_vector │   true │         9418.0 │             9.1 │
│               Dynamic │    10 │    mooncake │          typed │   true │          179.9 │             6.2 │
│               Dynamic │    10 │    mooncake │   typed_vector │   true │          193.3 │             6.4 │
│               Dynamic │    10 │    mooncake │ untyped_vector │   true │          175.3 │             7.4 │
│              Submodel │     1 │    mooncake │          typed │   true │           13.5 │             5.0 │
│              Submodel │     1 │    mooncake │   typed_vector │   true │           13.5 │             6.8 │
│              Submodel │     1 │    mooncake │        untyped │   true │           11.2 │            11.6 │
│              Submodel │     1 │    mooncake │ untyped_vector │   true │           11.2 │             7.2 │
│                   LDA │    12 │ reversediff │          typed │   true │         1191.3 │             2.0 │
│                   LDA │    12 │ reversediff │   typed_vector │   true │         1166.6 │             2.1 │
└───────────────────────┴───────┴─────────────┴────────────────┴────────┴────────────────┴─────────────────┘

Loop univariate has gotten a lot better, from a 12% slow down to 3%. I might still look into e.g. Dynamic, see if I can squeeze that down, but if there's nothing obvious to be found I would call this good enough and start a PR replacing Metadata with VNV.

@mhauru mhauru marked this pull request as ready for review October 31, 2025 16:32
@mhauru mhauru requested a review from penelopeysm October 31, 2025 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants