-
Notifications
You must be signed in to change notification settings - Fork 22
Replace biome_lsp_converters
with home grown SourceFile
#353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
// TODO: Remove this | ||
#![allow(dead_code)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this in favor of a few local #[allow(dead_code)]
s, or just removing some dead things entirely if they looked outdated.
I'm very thankful to reenable dead code analysis, because it really helps me remember to remove things I didn't end up using
/// The set of tree-sitter document parsers managed by the `GlobalState`. | ||
pub(crate) parsers: HashMap<Url, tree_sitter::Parser>, | ||
/// translate UTF-16 positions sent by the client to UTF-8 byte offsets. | ||
pub(crate) encoding: LineOffsetEncoding, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We end up translating the lsp_types::PositionKindEncoding
from the Client to a LineOffsetEncoding
as fast as possible, and that's what we hold onto and use throughout the LSP.
(PositionEncoding
was also from biome-lsp-converters and we don't need that anymore)
use settings::LineEnding; | ||
use tower_lsp::lsp_types; | ||
|
||
pub(crate) fn text_edit( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These did not get removed, they got lifted up to the top level to_proto.rs
, and are now implemented on top of our own to_proto::range()
helper, not the one from biome_lsp_converters
// --- Start Posit | ||
pub fn diff(text: &str, replace_with: &str) -> TextEdit { | ||
super::diff::diff(text, replace_with) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved rust_analyzer/text_edit.rs
to just text_edit.rs
, and rust_analyzer/diff.rs
to just diff.rs
, but I've left all the rust-analyzer attribution in place
I did remove this one helper in favor of just calling crate::diff::diff()
directly, so we can keep this file as a pure rust-analyzer copy/paste
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this file reads really nicely from top to bottom now, i.e. it builds up the converter helpers in order of their complexity:
position()
range()
text_edit()
text_edit_vec()
- ...
/// A 0-indexed offset into a line, represented as a number of code units under one of the | ||
/// three possible [LineOffsetEncoding]s | ||
#[derive(Debug, Copy, Clone, PartialEq, Eq)] | ||
pub struct LineOffset { | ||
raw: u32, | ||
encoding: LineOffsetEncoding, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One other footgun / confusion point that I think I've nicely removed is that the offset into the line is the only thing that actually relies on the encoding, so the offset itself also owns the encoding.
I think this just really clarifies why the encoding is required, and how it gets used
@@ -0,0 +1,966 @@ | |||
//! This top level documentation details the algorithms and terminology behind |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spent a lot of time on this documentation. I don't think you need to read it in detail unless you really want to know how offset()
and source_location()
work
[[package]] | ||
name = "biome_lsp_converters" | ||
version = "0.1.0" | ||
source = "git+https://github.com/biomejs/biome?rev=2648fa4201be4afd26f44eca1a4e77aac0a67272#2648fa4201be4afd26f44eca1a4e77aac0a67272" | ||
dependencies = [ | ||
"anyhow", | ||
"biome_rowan", | ||
"rustc-hash", | ||
"tower-lsp 0.20.0 (registry+https://github.com/rust-lang/crates.io-index)", | ||
] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woop
|
||
[[package]] | ||
name = "tower-lsp" | ||
version = "0.20.0" | ||
source = "registry+https://github.com/rust-lang/crates.io-index" | ||
checksum = "d4ba052b54a6627628d9b3c34c176e7eda8359b7da9acd497b9f20998d118508" | ||
dependencies = [ | ||
"async-trait", | ||
"auto_impl", | ||
"bytes", | ||
"dashmap 5.5.3", | ||
"futures", | ||
"httparse", | ||
"lsp-types", | ||
"memchr", | ||
"serde", | ||
"serde_json", | ||
"tokio", | ||
"tokio-util", | ||
"tower", | ||
"tower-lsp-macros 0.9.0 (registry+https://github.com/rust-lang/crates.io-index)", | ||
"tracing", | ||
] | ||
|
||
[[package]] | ||
name = "tower-lsp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note how we drop the transitive dep on tower-lsp that we got through biome-lsp-converters
We now currently only depend on the lionel-/tower-lsp
fork we have, which we will drop when we use lsp-server, also allowing us to drop all tokio dependencies
Extracted from #126
I'm very excited for this one!
Overview
The general idea of this PR is to introduce our own
SourceFile
to replace Biome'sLineIndex
, which:contents: String
for a single fileline_starts: Vec<TextSize>
for that fileIt then exposes two mirrored API endpoints:
SourceFile::offset()
forPosition + PositionEncodingKind
(row / column) ->TextSize
(byte offset)SourceFile::source_location()
forTextSize
(byte offset) ->Position + PositionEncodingKind
(row / column)These are then used in the LSP to build
to_proto.rs
andfrom_proto.rs
That's really the whole PR. The rest is all the nitty gritty details of dealing with UTF conversion, documentation, and updating the LSP to use this new thing.
I've let
SourceFile
own thecontents: String
because it also exposesSourceFile::replace_range()
, which simultaneously updates thecontents
and rebuilds theline_starts
(for when the document changes). It also removes a footgun in the API to do it this way. If theSourceFile
doesn't own the contents, then they must be provided as an argument tosource_location()
, which leaves the door open for you to provide the wrongcontents
on accident (this is howLineIndex
currently works).The implementation is based on ruff's
LineIndex
, but I've modified it a bit. I also feel like I fully understand how it works at this point, which I've tried to capture in my top level documentation insource_file.rs
:https://github.com/astral-sh/ruff/blob/e8ea40012afda872ef9bdf496a8b888ba80b533b/crates/ruff_source_file/src/line_index.rs#L17
Motivation 1 - tower-lsp removal
As we worked on ripping out tower-lsp in favor of lsp-server, we got blocked by the fact that we wanted to use lsp-types directly, but biome-lsp-converters's helpers (which we relied on in a deep way) use the flavor of lsp-types that was reexported by tower-lsp, i.e. biome-lsp-converters used
tower_lsp::lsp_types::*
and we usedlsp_types::*
and Rust does not consider those to be the same.This PR rolls our own low level converters between an LSP
lsp_types::Position
(row / column) and abiome_text_size::TextSize
(byte offset), handling all the nitty gritty details ofPositionEncodingKind
and UTF support.This allows us to:
We should now be completely unblocked to continue the tower-lsp removal
Motivation 2 - use in Ark
I've implemented the new
source_file
crate in such a way that it can be dropped into Ark alongsidebiome_text_size
, and then we'll be able to remove:Rope
inDocument
, in favor of aSourceFile
Rope
really gets in the wayI really don't love that
Rope
anymore:String::replace_range()
will reuse internal storage when it canreserve()
more memory which has a very good chance of just overallocating to avoid reallocating on every individual character insertionWe've successfully worked without a
Rope
in Air and it's worked well, so I think we can drop it in Ark tooMotivation 3 - Owning a bit more of the stack
We've been happy with Biome for the most part, but I think it's generally good to own as much of the stack as we can so we are free to change things as needed. This will be particularly important when working with Salsa in the future, so we can modify types as needed to best work with it.