Skip to content

Conversation

DavisVaughan
Copy link
Collaborator

@DavisVaughan DavisVaughan commented Jun 3, 2025

Extracted from #126

I'm very excited for this one!

Overview

The general idea of this PR is to introduce our own SourceFile to replace Biome's LineIndex, which:

  • Owns the contents: String for a single file
  • Tracks the line_starts: Vec<TextSize> for that file

It then exposes two mirrored API endpoints:

  • SourceFile::offset() for Position + PositionEncodingKind (row / column) -> TextSize (byte offset)
  • SourceFile::source_location() for TextSize (byte offset) -> Position + PositionEncodingKind (row / column)

These are then used in the LSP to build to_proto.rs and from_proto.rs

That's really the whole PR. The rest is all the nitty gritty details of dealing with UTF conversion, documentation, and updating the LSP to use this new thing.

I've let SourceFile own the contents: String because it also exposes SourceFile::replace_range(), which simultaneously updates the contents and rebuilds the line_starts (for when the document changes). It also removes a footgun in the API to do it this way. If the SourceFile doesn't own the contents, then they must be provided as an argument to source_location(), which leaves the door open for you to provide the wrong contents on accident (this is how LineIndex currently works).

The implementation is based on ruff's LineIndex, but I've modified it a bit. I also feel like I fully understand how it works at this point, which I've tried to capture in my top level documentation in source_file.rs:
https://github.com/astral-sh/ruff/blob/e8ea40012afda872ef9bdf496a8b888ba80b533b/crates/ruff_source_file/src/line_index.rs#L17

Motivation 1 - tower-lsp removal

As we worked on ripping out tower-lsp in favor of lsp-server, we got blocked by the fact that we wanted to use lsp-types directly, but biome-lsp-converters's helpers (which we relied on in a deep way) use the flavor of lsp-types that was reexported by tower-lsp, i.e. biome-lsp-converters used tower_lsp::lsp_types::* and we used lsp_types::* and Rust does not consider those to be the same.

This PR rolls our own low level converters between an LSP lsp_types::Position (row / column) and a biome_text_size::TextSize (byte offset), handling all the nitty gritty details of PositionEncodingKind and UTF support.

This allows us to:

  • Remove biome-lsp-converters as a dependency
  • Remove tower-lsp as a transitive dependency we were getting through biome-lsp-converters (i.e. even if we were to work around this problem some other way and keep using biome-lsp-converters, we'd have still had tower-lsp in our dependency chain, which kind of sucks, because it is heavy due to tokio)

We should now be completely unblocked to continue the tower-lsp removal

Motivation 2 - use in Ark

I've implemented the new source_file crate in such a way that it can be dropped into Ark alongside biome_text_size, and then we'll be able to remove:

  • The Rope in Document, in favor of a SourceFile
  • The encoding helpers, which I don't think are particularly efficient or ergonomic, the Rope really gets in the way

I really don't love that Rope anymore:

  • I don't really think it is necessary for us, our files aren't that big
  • String::replace_range() will reuse internal storage when it can
    • During deletions or exact replacements it should not reallocate
    • During insertions it will reserve() more memory which has a very good chance of just overallocating to avoid reallocating on every individual character insertion

We've successfully worked without a Rope in Air and it's worked well, so I think we can drop it in Ark too

Motivation 3 - Owning a bit more of the stack

We've been happy with Biome for the most part, but I think it's generally good to own as much of the stack as we can so we are free to change things as needed. This will be particularly important when working with Salsa in the future, so we can modify types as needed to best work with it.

Comment on lines -1 to -2
// TODO: Remove this
#![allow(dead_code)]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this in favor of a few local #[allow(dead_code)]s, or just removing some dead things entirely if they looked outdated.

I'm very thankful to reenable dead code analysis, because it really helps me remember to remove things I didn't end up using

/// The set of tree-sitter document parsers managed by the `GlobalState`.
pub(crate) parsers: HashMap<Url, tree_sitter::Parser>,
/// translate UTF-16 positions sent by the client to UTF-8 byte offsets.
pub(crate) encoding: LineOffsetEncoding,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We end up translating the lsp_types::PositionKindEncoding from the Client to a LineOffsetEncoding as fast as possible, and that's what we hold onto and use throughout the LSP.

(PositionEncoding was also from biome-lsp-converters and we don't need that anymore)

use settings::LineEnding;
use tower_lsp::lsp_types;

pub(crate) fn text_edit(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These did not get removed, they got lifted up to the top level to_proto.rs, and are now implemented on top of our own to_proto::range() helper, not the one from biome_lsp_converters

Comment on lines -82 to -85
// --- Start Posit
pub fn diff(text: &str, replace_with: &str) -> TextEdit {
super::diff::diff(text, replace_with)
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved rust_analyzer/text_edit.rs to just text_edit.rs, and rust_analyzer/diff.rs to just diff.rs, but I've left all the rust-analyzer attribution in place

I did remove this one helper in favor of just calling crate::diff::diff() directly, so we can keep this file as a pure rust-analyzer copy/paste

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file reads really nicely from top to bottom now, i.e. it builds up the converter helpers in order of their complexity:

  • position()
  • range()
  • text_edit()
  • text_edit_vec()
  • ...

Comment on lines +60 to +66
/// A 0-indexed offset into a line, represented as a number of code units under one of the
/// three possible [LineOffsetEncoding]s
#[derive(Debug, Copy, Clone, PartialEq, Eq)]
pub struct LineOffset {
raw: u32,
encoding: LineOffsetEncoding,
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other footgun / confusion point that I think I've nicely removed is that the offset into the line is the only thing that actually relies on the encoding, so the offset itself also owns the encoding.

I think this just really clarifies why the encoding is required, and how it gets used

@@ -0,0 +1,966 @@
//! This top level documentation details the algorithms and terminology behind
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spent a lot of time on this documentation. I don't think you need to read it in detail unless you really want to know how offset() and source_location() work

Comment on lines -373 to -383
[[package]]
name = "biome_lsp_converters"
version = "0.1.0"
source = "git+https://github.com/biomejs/biome?rev=2648fa4201be4afd26f44eca1a4e77aac0a67272#2648fa4201be4afd26f44eca1a4e77aac0a67272"
dependencies = [
"anyhow",
"biome_rowan",
"rustc-hash",
"tower-lsp 0.20.0 (registry+https://github.com/rust-lang/crates.io-index)",
]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woop

Comment on lines -2340 to 2303

[[package]]
name = "tower-lsp"
version = "0.20.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d4ba052b54a6627628d9b3c34c176e7eda8359b7da9acd497b9f20998d118508"
dependencies = [
"async-trait",
"auto_impl",
"bytes",
"dashmap 5.5.3",
"futures",
"httparse",
"lsp-types",
"memchr",
"serde",
"serde_json",
"tokio",
"tokio-util",
"tower",
"tower-lsp-macros 0.9.0 (registry+https://github.com/rust-lang/crates.io-index)",
"tracing",
]

[[package]]
name = "tower-lsp"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note how we drop the transitive dep on tower-lsp that we got through biome-lsp-converters

We now currently only depend on the lionel-/tower-lsp fork we have, which we will drop when we use lsp-server, also allowing us to drop all tokio dependencies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant