Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/type preservation empty dataframes #301

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

dan-corneanu
Copy link

It looks like manipulating a column in an empty data frame defaults the result to a type of :string.

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add a fix to this PR?

diff --git a/lib/red_amber/data_frame_variable_operation.rb b/lib/red_amber/data_frame_variable_operation.rb
index 7a5179e..62b0706 100755
--- a/lib/red_amber/data_frame_variable_operation.rb
+++ b/lib/red_amber/data_frame_variable_operation.rb
@@ -675,9 +675,18 @@ module RedAmber
           raise DataFrameArgumentError, "Data size mismatch (#{data.size} != #{size})"
         end
 
-        a = Arrow::Array.new(data.is_a?(Vector) ? data.to_a : data)
+        if data.respond_to?(:to_arrow_chunked_array)
+          chunked_array = data.to_arrow_chunked_array
+        else
+          if data.respond_to?(:to_arrow_array)
+            a = data.to_arrow_array
+          else
+            a = Arrow::Array.new(data)
+          end
+          chunked_array = Arrow::ChunkedArray.new([a])
+        end
         fields[i] = Arrow::Field.new(key, a.value_data_type)
-        arrays[i] = Arrow::ChunkedArray.new([a])
+        arrays[i] = chunked_array
       end
       [fields, arrays]
     end
diff --git a/lib/red_amber/vector.rb b/lib/red_amber/vector.rb
index 7237807..5267eb6 100644
--- a/lib/red_amber/vector.rb
+++ b/lib/red_amber/vector.rb
@@ -198,6 +198,22 @@ module RedAmber
     alias_method :values, :to_ary
     alias_method :entries, :to_ary
 
+    # Convert to an Arrow::Array.
+    #
+    # @return [Arrow::Array]
+    #   Apache Arrow array representation.
+    def to_arrow_array
+      @data.to_arrow_array
+    end
+
+    # Convert to an Arrow::ChunkedArray.
+    #
+    # @return [Arrow::ChunkedArray]
+    #   Apache Arrow chunked array representation.
+    def to_arrow_chunked_array
+      @data.to_arrow_chunked_array
+    end
+
     # Indeces from 0 to size-1 by Array.
     #
     # @return [Array]

@@ -439,6 +439,22 @@ class DataFrameVariableOperationTest < Test::Unit::TestCase
assert_equal str2, @df2.assign { assigner2.to_a }.tdr_str
end

sub_test_case 'Dataframe with zero n_records' do
test 'assign by block' do
assert_equal :double, @df.b.type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this.

Suggested change
assert_equal :double, @df.b.type

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants