Skip to content

Conversation

@madelinevibes
Copy link
Collaborator

@madelinevibes madelinevibes commented Nov 3, 2025

A few bug fixes had been supplied to large node runners. This point release includes those fixes as a formality

rustyrussell and others added 18 commits November 3, 2025 03:31
… restart.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We complain:
```
lightningd-1 2025-10-31T00:55:00.377Z **BROKEN** plugin-bookkeeper: Unparsable blockheight datastore entry: {"key":["bookkeeper","blockheights","756999f870a7a7c97f5c143f12b9096a50d1b1acd74aeb9ab2dc251a5c361494"],"generation":0,"hex":"00000067"}
```

And we don't have the blockheight:

```
                   {
                       'account': 'external',
         -             'blockheight': 103,
         ?                            - -
         +             'blockheight': 0,
                       'credit_msat': 555555000,
                       'currency': 'bcrt',
                       'debit_msat': 0,
                       'origin': 'wallet',
                       'outpoint': '756999f870a7a7c97f5c143f12b9096a50d1b1acd74aeb9ab2dc251a5c361494:0',
                       'tag': 'deposit',
                       'timestamp': 1761872097,
                       'type': 'chain',
                   },
```

Reported-by: @michael1011
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: Plugins: `bookkeeper` now correctly restores chain event blockheights it has derived.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…with 1 remaining before maxparts.

1. We would find a flow.
2. refine_flow would reduce it so it doesn't deliver enough.
3. So we need to find another, but we are at the limit.
4. So we remove the flow we found.
5. Goto 1.

This can be fixed by disabling a channel which we caused us to reduce the flow,
so we should always make forward progress.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: Plugins: `askrene` could enter an infinite loop when maxparts is restricted.
We have another report of looping.  This maxparts code is being completely
rewritten, but it's good to have a catchall for any other cases which might
emerge.

I had to make it customizable since our tests under valgrind are SLOW!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…own on large numbers of requests.

Note that we create a destructor on the command to reset request->cmd
pointer if the cmd is freed (so we know not to call the callback).
But attaching hundreds of thousands of them is slow: it's a
single-linked list, which is iterated in several places.

But that's redundant: the request is now allocated off the cmd, so freeing the command
will free the request anyway.

Hacking in something to print progress to a file, here's the number of
requests processed every 10 seconds before and after:

Before:
	$ while sleep 10; do wc -l /tmp/bkpr-progress; done
	181529 /tmp/bkpr-progress
	195994 /tmp/bkpr-progress
	207083 /tmp/bkpr-progress
	226336 /tmp/bkpr-progress
	234319 /tmp/bkpr-progress
	241514 /tmp/bkpr-progress
	247421 /tmp/bkpr-progress
	255292 /tmp/bkpr-progress
	261367 /tmp/bkpr-progress
	269085 /tmp/bkpr-progress
	276953 /tmp/bkpr-progress
	282233 /tmp/bkpr-progress
	286193 /tmp/bkpr-progress
	290930 /tmp/bkpr-progress
	295276 /tmp/bkpr-progress
	301086 /tmp/bkpr-progress

After:
	169505 /tmp/bkpr-progress
	196010 /tmp/bkpr-progress
	219370 /tmp/bkpr-progress
	235671 /tmp/bkpr-progress
	244242 /tmp/bkpr-progress
	255362 /tmp/bkpr-progress
	265636 /tmp/bkpr-progress
	276966 /tmp/bkpr-progress
	284451 /tmp/bkpr-progress
	288836 /tmp/bkpr-progress
	296578 /tmp/bkpr-progress
	304571 /tmp/bkpr-progress

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This significantly speeds up the query which bookkeeper often does:

		      "SELECT created_index"
		      " FROM channelmoves"
		      " WHERE payment_hash = X'%s'"
		      "   AND credit_msat = %"PRIu64
		      "   AND created_index <= %"PRIu64,

On large databases this scan is expensive, and a payment_hash index
cuts it down a great deal.  It does take longer to load the channelmoves
in the first place though (about 3x).

Before:
	$ while sleep 10; do wc -l /tmp/bkpr-progress; done
	169505 /tmp/bkpr-progress
	196010 /tmp/bkpr-progress
	219370 /tmp/bkpr-progress
	235671 /tmp/bkpr-progress
	244242 /tmp/bkpr-progress
	255362 /tmp/bkpr-progress
	265636 /tmp/bkpr-progress
	276966 /tmp/bkpr-progress
	284451 /tmp/bkpr-progress
	288836 /tmp/bkpr-progress
	296578 /tmp/bkpr-progress
	304571 /tmp/bkpr-progress

After:
	$ while sleep 10; do wc -l /tmp/bkpr-progress; done
	161421 /tmp/bkpr-progress
	238273 /tmp/bkpr-progress
	281185 /tmp/bkpr-progress
	305787 /tmp/bkpr-progress

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: plugins: the sql plugin now keeps an index on `channelmoves` by `payment_hash`.
If we read all of them, we might get 1.6M at once (after initial
migration).  Then we submit a few hundred thousand simultaneous
requests to lightningd, and it gets upset, queueing them all on the
xpay command hook and running out of memory.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: plugins: bookkeeper first invocation after migration from prior to 25.09 with very large databases will not crash.
Workflow error `gpg: using "4129A994AA7E9852" is thrown due to incorrect gpg parsing. Update the awk parsing logic to properly locate and extract the key fingerprint within the gpgconf --list-options output structure, ensuring automated signing uses the correct key.

Changelog-None.
…uilds

clnrest's `utoipa-swagger-ui` library has an indirect `rust-embed` dependency which by default includes timestamps in build. It results in non-deterministic build for clnrest. Using environment variable `SOURCE_DATE_EPOCH` with fixed value will enforce a consistent timestamp for builds.

Also adding the `--locked` flag to ensure the release build uses exact dependencies from Cargo.lock. The `--locked` flag is particularly important for deterministic builds as it prevents Cargo from updating the lockfile.

Fixes ElementsProject#8288.

Changelog-Fixed: Core lightning builds for Ubuntu Focal, Jammy and Noble are deterministic again.
The Publish distribution stage was failing because it executed the update-pyln-versions script from within the WORKDIR, which created an invalid context. To resolve this, we have decoupled the process, separating the updating of version state into its own step that runs from the root directory before the publish operation.

Changelog-None.
If both refresh new events, we will get an assertion:

```
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
```
bookkeeper: plugins/bkpr/bookkeeper.c:1226: parse_and_log_chain_move: Assertion `e->db_id > bkpr->chainmoves_index' failed.
bookkeeper: FATAL SIGNAL 6 (version v25.09-245-g901714b-modded)
0x5d7d8718b40f send_backtrace
        common/daemon.c:36
0x5d7d8718b4ab crashdump
        common/daemon.c:81
0x7a6086c4532f ???
        ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0x7a6086c9eb2c __pthread_kill_implementation
        ./nptl/pthread_kill.c:44
0x7a6086c9eb2c __pthread_kill_internal
        ./nptl/pthread_kill.c:78
0x7a6086c9eb2c __GI___pthread_kill
        ./nptl/pthread_kill.c:89
0x7a6086c4527d __GI_raise
        ../sysdeps/posix/raise.c:26
0x7a6086c288fe __GI_abort
        ./stdlib/abort.c:79
0x7a6086c2881a __assert_fail_base
        ./assert/assert.c:96
0x7a6086c3b516 __assert_fail
        ./assert/assert.c:105
0x5d7d8717505d parse_and_log_chain_move
        plugins/bkpr/bookkeeper.c:1226
0x5d7d871754f4 listchainmoves_done
        plugins/bkpr/bookkeeper.c:169
0x5d7d87182a4b handle_rpc_reply
        plugins/libplugin.c:1072
0x5d7d87182b5c rpc_conn_read_response
        plugins/libplugin.c:1361
0x5d7d871ba660 next_plan
        ccan/ccan/io/io.c:60
0x5d7d871bab31 do_plan
        ccan/ccan/io/io.c:422
0x5d7d871babee io_ready
        ccan/ccan/io/io.c:439
```

Reported-by: @michael1011
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: plugins: assertion crash in bookkeeper when fresh records arrive while multiple queries in progress.
We call it once at the end, but calling on each allocation is
excessive, and it shows when dealing with large PSBTS.  Testing a
700-input PSBT was unusably slow without this: after this the entire
test ran in 9 seconds.

Changelog-Fixed: JSON-RPC: Dealing with giant PSBTs (700 inputs!) is now much faster.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
```
lightningd-1 2025-10-27T11:26:04.285Z **BROKEN** plugin-bcli: bitcoin-cli exec failed: Argument list too long
```

Use -stdin to bitcoin-cli: we can then handle arguments of arbitrary length.

Fixes: ElementsProject#8634
Changelog-Fixed: plugins: `bcli` would fail with "Argument list too long" when sending a giant tx.
…t queue it to the channeld for the peer.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…sendpays "pending".

If we failed after we register (e.g. channeld not available), we don't
mark it failed.  We shouldn't register until we've definitely created
the htlc.

Changelog-Fixed: `xpay` would sometimes leave payment parts status `pending` in failure cases (as seen in listpays or listsendpays).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fixes: ElementsProject#8629
@madelinevibes madelinevibes requested review from rustyrussell and removed request for cdecker November 3, 2025 04:36
@endothermicdev
Copy link
Collaborator

I think this PR should refer to the 25.09.1 branch. Ask-rene seems to be running very stable - I haven't encountered the timeout condition. Looks good to me.

@madelinevibes madelinevibes changed the base branch from master to release-v25.09.1 November 3, 2025 22:59
rustyrussell and others added 4 commits November 4, 2025 02:20
Since we're synchronous, these only reach lightningd after we're done:
in the case of 1.6M channelmoves, that can give it major heartburn.

In practice, this reduces the first bkpr command on a fresh upgrade
from 349 to 235 seconds (but this was before other improvements we did
this release).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: Plugins: `bookkeeper` reduced logging for large imports to increase speed.
This makes a big difference for large tables.  Consider 1.6M channelmoves,
which took 82 seconds to populate, now takes 17 seconds:

Before:
	plugin-sql: Time to call listchannelmoves: 10.380341485 seconds
	plugin-sql: Time to refresh channelmoves: 82.311287310 seconds

After:

	plugin-sql: Time to call listchannelmoves: 9.962815480 seconds
	plugin-sql: Time to refresh channelmoves: 15.711549299 seconds
	plugin-sql: Time to refresh + create indices for channelmoves: 17.100151235 seconds

tests/test_coinmoves.py::test_generate_coinmoves (50,000):
	Time (from start to end of l2 node):	27 seconds
	Worst latency:				16.0 seconds

Changelog-Changed: Plugins: `sql` initial load for tables is much faster (e.g 82 to 17 seconds for very large channelmoves table).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@rustyrussell
Copy link
Contributor

MISSION ACCOMPLISHED!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants