Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add @stdlib/plot/table/unicode #2407

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

Snehil-Shah
Copy link
Member

Towards #2067

Description

What is the purpose of this pull request?

This pull request:

  • adds @stdlib/plot/table/unicode

Table options:

  • alignment: datum's cell alignment. Default: 'right'.
  • borders: border characters. Default: '─ │ ─ │'.
  • cellPadding: cell padding. Default: 1.
  • columnSeparator: column separator character. Default: '│'.
  • corners: corner characters. Default: '┌ ┐ ┘ └'.
  • headerSeparator: header separator character. Default: '─'.
  • joints: joint characters. Default: '┼ ┬ ┤ ┴ ├'.
  • marginX: horizontal output margin. Default: 0.
  • marginY: vertical output margin. Default: 0.
  • maxCellWidth: maximum cell width (excluding padding). Default: FLOAT64_MAX.
  • maxOutputWidth: maximum output width (including margin). Default: FLOAT64_MAX.
  • rowSeparator: row separator character. Default: 'None'.

Table methods:

  • addRow(row): adds a row to data.
  • getData(): gets current data and headers in an object
  • render(): renders table
  • setData(data,[headers]): sets data

Related Issues

Does this pull request have any related issues?

This pull request:

Questions

Any questions for reviewers of this pull request?

The current implementation allows for loose parsing. So, if the data is not exactly tabular, in some cases, it still tries to parse, adjust and make sense of the data.

These are the only two cases where this can be seen:

data = {
    'col1': [ ... ],
    'col2': [ ... ]
};
headers = [  'col1', 'col2', 'col3'  ];

Parser will automatically ignore 'col3' and value of this._headers after parsing will be [ 'col1', 'col2' ].

data = [ {  'col1': 1, 'col2': 4 }, { 'col1' : 5 } ];
headers = [  'col1', 'col2'  ];

Parser will automatically fill the missing col2 value with undefined.

Should we be raising an error instead?

Other

Any other information relevant to this pull request? This may include screenshots, references, and/or implementation notes.

No.

Checklist

Please ensure the following tasks are completed before submitting this pull request.


@stdlib-js/reviewers

@kgryte kgryte added the Feature Issue or pull request for adding a new feature. label Jun 21, 2024
@kgryte
Copy link
Member

kgryte commented Jun 22, 2024

Re: loose parsing. I'd just raise an error. If the number of headers is off, that is probably a user bug.

When an object is missing a field, that is trickier, as could mean missing data. However, again, I'd raise an error here. A user should arguably be explicitly in terms of what value should represent missing data (e.g., undefined, null, NaN, '', etc).

Re: methods. I'd opt for closer parity with sparklines. Namely,

addRow => push()
getData/setData => data (accessor)

and I would add a headers accessor for getting/setting headers. You shouldn't have to provide data again in order to update headers. This should be a separate accessor to allow independent updating.

And similar to sparklines, I'd make the table an event emitter with render and change events.

Re: maxCellWidth. I wonder if it would be better to adopt the border-box box model, as in CSS. Namely, the cell width should include padding, not exclude it.

Re: maxOutputWidth. I'd rename to simply maxWidth, as in CSS.

Re: alignment. This one is trickier. E.g., I may want to align columns differently, as I would in a spreadsheet. So my initial inclination is that alignment can either be a string (apply to all columns) or an array of strings, in which an alignment must be provided for each column.

One could argue that the same logic applies to cellPadding and maxCellWidth.

@kgryte kgryte added the REPL Issue or pull request specific to the project REPL. label Jun 22, 2024
@kgryte
Copy link
Member

kgryte commented Jun 22, 2024

For the row separator default, wouldn't an empty string make more sense?


b.tic();
for ( i = 0; i < b.iterations; i++ ) {
str = table.setData( data(), headers() ).render();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You definitely do not want to be generating fresh data and headers for every benchmark iteration. Otherwise, you confound results.

/**
* Create a Unicode table.
*
* @module @stdlib/plot/table/unicode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing example code.


// VARIABLES //

var CHARACTER_LENGTH = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. Users should be able to provide emojis, etc, which may be comprised of multiple code points. Instead, you need to check for the number of grapheme clusters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why must a row separator only be one character? Couldn't I also want a pattern? E.g., -+-+-, or something similar?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why must a row separator only be one character? Couldn't I also want a pattern? E.g., -+-+-, or something similar?

I wanted to generalize the arguments. We have column separators and borders as well. And as they are vertical, having them span multiple characters makes things more complex. For instance, how do we place the corners and joints if we have a 3-character long vertical border? Although it does make sense for horizontal lines, but I figured it was making the API design inconsistent and messy. For instance, borders takes in a shorthand top-right-bottom-left. We would have to only allow the top and bottom properties to be able to span multiple characters (or grapheme clusters to be accurate).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single grapheme clusters are fine to use across the board for now. And understood regarding the difficulty in supporting multiple visual characters for vertical borders/separators. While horizontal support for multiple grapheme clusters would be straightforward to add, we can wait until a user requests such a feature.

Copy link
Member Author

@Snehil-Shah Snehil-Shah Jun 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, now that I think again, maybe I was seeing it the wrong way. We can have multiple characters for a vertical line. Say if it's -*#. We just print:

-
*
#

This way, we never have a "width/thickness" of a line to be more than a single grapheme cluster so placing corners and joints becomes straightforward.
We would have to reserve joints and corners to be a single grapheme cluster though for obvious reasons

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Thanks for circling back!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also now that we are allowing multiple character strings, I was wondering if we could change the shorthand properties ('a b c d') into an array (['a', 'b', 'c', 'd']). This would allow the user to also have spaces as part of their line characters?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That also makes sense. Same thing: either a string or an array of strings.

Copy link
Member

@kgryte kgryte Jun 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...meaning I shouldn't have to do ['a']; I should be able to both 'a' and ['a'] and have them both result in the same thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. This is for the shorthand properties. Yeah, just supporting an array of strings seems reasonable.

if ( !isString( separator ) ) {
throw new TypeError( format( 'invalid assignment. `%s` must be a string. Value: `%s`.', 'rowSeparator', separator ) );
}
if ( separator === 'None' ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't make None the sentinel.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean we should another value to denote None (like null or undefined)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, or the empty string.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not use undefined as the sentinel.

* @throws {Error} output must be able to accommodate every column individually
* @returns {Array<number>} list of column indices
*/
function resolveWrapping() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than use a closure, I suggest either figuring out a way to move these to the parent scope or adding them as private methods on the table prototype. Otherwise, each time render is invoked, these functions have to be allocated, etc, which will hurt perf.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can move it to the module scope, and just take in the private properties as arguments? Adding them to the table prototype can be avoided (I think?) as these functions are only used when rendering and nowhere else..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is fine, as well.


#### UnicodeTable.prototype.alignment

Alignment of datum in cell. The value must be either `'right'`, `'left'` or `'center'`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Alignment of datum in cell. The value must be either `'right'`, `'left'` or `'center'`.
Alignment of datum in cell. The value must be either `'right'`, `'left'`, or `'center'`.

The project uses Oxford commas.

Comment on lines +362 to +365
data = new Float64Array( 50 );
for ( i = 0; i < data.length; i++ ) {
data[ i ] = randu() * 100.0;
}
Copy link
Member

@kgryte kgryte Jun 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use @stdlib/random/array/uniform instead. That way you can avoid loops.

Comment on lines +371 to +374
headers = new Float64Array( 5 );
for ( i = 0; i < headers.length; i++ ) {
headers[ i ] = randu() * 100.0;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest just hardcoding a list of strings here. Generating random headers is unlikely in user code.

@Snehil-Shah
Copy link
Member Author

For the row separator default, wouldn't an empty string make more sense?

Without empty string:

┌───────┬──────┬───────┐
│  col1 │ col2 │  col3 │
├───────┼──────┼───────┤
│    45 │   33 │ hello │
│ 32.54 │ true │  null │
└───────┴──────┴───────┘

With empty string:

┌───────┬──────┬───────┐
│  col1 │ col2 │  col3 │
├───────┼──────┼───────┤
│    45 │   33 │ hello │
│       │      │       │
│ 32.54 │ true │  null │
└───────┴──────┴───────┘

Most of the prior art don't separate rows by default (dataframes in ipython or jupyter, or python tabulate), and I think it looks better too without the rows separated

@Snehil-Shah
Copy link
Member Author

Re: methods. I'd opt for closer parity with sparklines

Should I also have an argument for bufferSize that denotes the max number of rows? in the table? After that "pushing" more data would remove the "oldest" data in a cyclic manner like we do with sparklines.

Re: alignment. This one is trickier. E.g., I may want to align columns differently, as I would in a spreadsheet. So my initial inclination is that alignment can either be a string (apply to all columns) or an array of strings, in which an alignment must be provided for each column.

So, say the class isn't initialized with the data or headers, then we should raise an error if the user provides an array of alignments? Because in general, the alignments array should be of the same length as the number of columns?

and I would add a headers accessor for getting/setting headers. You shouldn't have to provide data again in order to update headers. This should be a separate accessor to allow independent updating.

Should we allow them to give headers if the data doesn't exist yet? (or raise an error)

@kgryte
Copy link
Member

kgryte commented Jun 22, 2024

Re: bufferSize. Yes, that can make sense for streaming contexts.

Re: alignments without data/headers. No, it just needs to be consistent. As soon as we're provided something that conveys the number of columns, from that point forward, the table has a fixed number of columns and everything just needs to be consistent.

Re: headers without data. Yes, that seems reasonable and is applicable in streaming contexts, where you know the headers beforehand and are still awaiting data to be pushed.

Re: row separator and empty string. By empty string, I did not mean create a row separator using the empty string. I meant using the empty string as a sentinel to convey that no row separator should be rendered.

@Snehil-Shah
Copy link
Member Author

Re: alignments without data/headers. No, it just needs to be consistent. As soon as we're provided something that conveys the number of columns, from that point forward, the table has a fixed number of columns and everything just needs to be consistent

Just to make sure I get you correctly, if there is no data or headers yet, and we set the alignment array for 5 columns. If the user now sets the data that depicts 9 columns, we raise an error, right?

Re: row separator and empty string. By empty string, I did not mean create a row separator using the empty string. I meant using the empty string as a sentinel to convey that no row separator should be rendered.

Ah, understood, yes that makes sense.

@kgryte
Copy link
Member

kgryte commented Jun 22, 2024

If the user now sets the data that depicts 9 columns, we raise an error, right?

Correct. Can raise with something like "invalid argument. Expected %d columns, but received data having %d columns.".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Issue or pull request for adding a new feature. REPL Issue or pull request specific to the project REPL.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants