Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Visitor traversal logic to use statically defined Field descriptors #1504

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mrabiciu
Copy link

@mrabiciu mrabiciu commented Nov 18, 2023

Summary

This diff changes how traverse is implemented to allow the implementation to be internal to SwiftProtobuf.

This allows for the mechanism of traversal to be modified in the future without making breaking changes to the API. For example we can add VisitorOptions to the API, or introduce a NamedVisitor type while remaining backwards compatible. Additionally this can be expanded upon to get rid of _ProtoNameProviding since name information can be included in Field which can in-turn be used to speed up json serialization.

This comes with a slight hit to runtime performance, and little impact to bundle size.

I intend to use this change as the basis for adding a VisitorOptions argument to traverse which will allow the visitor to visit fields that are set to the default value.

This is a pretty massive change but mostly because it required me to run make regenerate as well as Message to now be expressed as any Message to due to the self constraints.

The important files to look at are:

  • Sources/SwiftProtobuf/Field.swift
  • Sources/SwiftProtobuf/Message.swift
  • Sources/protoc-gen-swift/MessageGenerator.swift
  • Sources/protoc-gen-swift/MessageFieldGenerator.swift
  • Sources/protoc-gen-swift/OneOfGenerator.swift

Testing

Unit tests

All unit tests pass

Runtime Performance

I tested with various different Message configurations. Overall this solution does have a slight runtime performance hit. I also played with using enums with associated values and keypaths instead however that produced worse performance than what I have in this request.

Release Builds

Method Performance
Binary encode, all fields are unset ~1.2x - 2x slower
Binary encode, all fields are set ~1 - 1.2x slower
Json encode, all fields are unset ~1 - 1.3 x slower
Json encode, all fields are set no difference

Debug Builds

Method Performance
Binary encode, all fields are unset ~3x slower
Binary encode, all fields are set ~1.2x slower
Json encode, all fields are unset ~2x slower
Json encode, all fields are set ~1.1x slower

Bundle size

I saw little to no impact on the bundle size with this change. In some cases the bundle size increased slightly, and in some other cases the bundle size deceased. It seems to be dependent on the specific shape of the Message types being built.

Copy link
Contributor

@FranzBusch FranzBusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I dive into the details of the PR can we please split out the any changes into a separate PR. These changes generate a ton of noise.

@mrabiciu
Copy link
Author

@thomasvl
Copy link
Collaborator

Re the size changes - It's been a while since I looked into this, but at one point, I believe for readability and maybe for (debug?) code sizes, we were special casing generation for when a switch would only get one case, and generating an if instead. No idea if that would still make sense for some of your additions, but it might be something to look at again.

@tbkka
Copy link
Contributor

tbkka commented Nov 27, 2023

Do you see any impact to runtime memory use?

@tbkka
Copy link
Contributor

tbkka commented Nov 28, 2023

I'm debating whether your approach is preferable to changing the existing traverse methods to visit every field unconditionally. For example, I wonder if duration.pb.swift should change from

  public func traverse<V: Visitor>(visitor: inout V) throws {
    if self.seconds != 0 {
      try visitor.visitSingularInt64Field(value: self.seconds, fieldNumber: 1)
    }
    if self.nanos != 0 {
      try visitor.visitSingularInt32Field(value: self.nanos, fieldNumber: 2)
    }
    try unknownFields.traverse(visitor: &visitor)
  }

to

  public func traverse<V: Visitor>(visitor: inout V) throws {
    try visitor.visitSingularInt64Field(value: self.seconds, fieldNumber: 1, isNonDefault: self.seconds != 0)
    try visitor.visitSingularInt32Field(value: self.nanos, fieldNumber: 2, isNonDefault: self.nanos != 0)
    try unknownFields.traverse(visitor: &visitor)
  }

The latter form would likely be similar performance to yours, with similar code size and memory usage to the existing code.

Specifically, your code introduces several new generic types, which might need to be instantiated at runtime, and it adds some new large data structures that might get lazily allocated at runtime. For large projects with lots of protos, those could be problems. (I'm not certain how these different impacts will play out, but it's something I'd like to measure before committing to a particular approach.)

@mrabiciu
Copy link
Author

I haven't done any memory performance benchmarking yet. I can work on that tonight and tomorrow. I expect the addition of these static fields will cause an increase in memory usage, although I don't expect the impact to be too high since Field is a pretty small struct.

I think your outlined func traverse example is the optimal solution in terms of performance and I'm fine pivoting to that approach if you'd like. I think it would be a much simpler change.

The main downside of that approach is that it makes it impossible to make changes to the implementation of traverse without introducing a major version bump. This is the reason #861 has remained open for so long. By making the implementation of traverse internal to SwiftProtobuf we can add new visitor options without making braking changes. Like adding the ability to visit default fields, or the ability to visit unset fields, or maybe even adding some multithreaded/parallelized version of traverse.

@tbkka
Copy link
Contributor

tbkka commented Nov 28, 2023

... I'm fine pivoting to that approach if you'd like. ...

I'm not sure yet, that's why I'd like to get more information about the overhead of different approaches. I recall experimenting with a lot of different variations before settling on the current traverse implementation. Your approach intrigues me because it is one that I did not consider. 🤔 And to be honest, runtime memory usage is something I did not test as carefully as I should have, so it's quite possible that the existing code is not as efficient in that respect as I'd like.

The future possibilities you list don't really sound very compelling to me: visiting default/unset fields is something we can implement for 2.0 as a breaking change in a couple of different ways. (And note that your change does break the API between the generated code and the library, so we can only adopt it for 2.0.) No other new options have been suggested in a few years now, so I'm not too worried about trying to allow for future expansion there. Parallelizing also doesn't seem very appealing -- I doubt the overhead would be worth it.

The most interesting aspect of your proposal to me is that it could let us re-implement == as a pure library implementation, without any generated code. That's something that we cannot easily do with the current design, and could be a significant code size win. But that assumes that your proposal doesn't have other costs that we don't yet understand.

@thomasvl
Copy link
Collaborator

[...]

  public func traverse<V: Visitor>(visitor: inout V) throws {
    try visitor.visitSingularInt64Field(value: self.seconds, fieldNumber: 1, isNonDefault: self.seconds != 0)
    try visitor.visitSingularInt32Field(value: self.nanos, fieldNumber: 2, isNonDefault: self.nanos != 0)
    try unknownFields.traverse(visitor: &visitor)
  }

The latter form would likely be similar performance to yours, with similar code size and memory usage to the existing code.

I never really thought about it until looking at this snippet, we fetch the value twice in these calls, what does the compiler do in these cases? Does it optimize it down to a single fetch, are they really two property accesses? Is it worth generating different code to use a local to only fetch the value once? Is that smaller or larger? Faster?

The most interesting aspect of your proposal to me is that it could let us re-implement == as a pure library implementation, without any generated code. That's something that we cannot easily do with the current design, and could be a significant code size win. But that assumes that your proposal doesn't have other costs that we don't yet understand.

I do like this idea, not having to generate equality would be nice.

Having fields is a list also makes it that much more likely to be able to revisit the binary serialization and do it in reverse to remove the sizing pass.

@tbkka
Copy link
Contributor

tbkka commented Nov 28, 2023

I never really thought about it until looking at this snippet, we fetch the value twice in these calls, what does the compiler do in these cases? Does it optimize it down to a single fetch ... ?

Since all the code here is in the same source file and the getters are pretty trivial, the compiler should routinely inline the getter and the usual compiler optimizations should do the right thing from that point on. If anyone sees otherwise (in Release builds), the compiler team would appreciate a bug report.

@tbkka
Copy link
Contributor

tbkka commented Nov 28, 2023

Having fields is a list also makes it that much more likely to be able to revisit the binary serialization and do it in reverse to remove the sizing pass.

I recently managed to convince myself that reversing the order of fields is not a problem. So I'm just waiting for some spare time to sit down and actually implement this.

@mrabiciu
Copy link
Author

I did some memory testing and the proposed solution appears to use ~1.1x more memory.

Testing strategy:

  1. Generated a test.proto file which has 100 messages with 50 random fields each test.proto.zip

  2. Create a swift CLI built in release mode that assigns all fields on each message and then encodes them using both binary and json encoding 10000 times

  3. Observe the memory usage displayed in xcode

Existing implementation used ~15.3 mb, the proposed implementation used ~16.7 mb

@tbkka
Copy link
Contributor

tbkka commented Nov 30, 2023

I want to go forward with this, since it gives us a lot of nice things:

The tricky work will be experimenting with different ways to implement the list of fields to properly balance code size, performance, and memory usage. A 10% growth in memory usage is probably more of a concern than the minor performance loss described above, so I think that's what we should focus the most attention on.

Copy link
Contributor

@tbkka tbkka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are a bunch more questions for you. These are mostly to help me better understand what options you've already considered and why you decided this particular approach was best. I also hope to find some time to do more of my own testing of this PR soon.

Generally speaking, I'm most interested in reducing code size and memory usage at this point. Our performance is already pretty good, so I'm willing to trade some of that for reduced code size and/or memory usage. Code in the library itself is relatively cheap; if 10 copies of a function in the library will let us drop one function from the generated code or save a few kilobytes for each message in memory, it's almost certainly worth it. Our library is relatively small compared to the volume of generated code for people who have hundreds of protobuf messages.

As I mentioned, I think the really big win here will come from reimplementing == using the field information to traverse both objects together.

}
},
.singularEnum({ $0.requestedOutputFormat }, fieldNumber: 3, defaultValue: .unspecified),
.singularString({ $0.messageType }, fieldNumber: 4),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does performance and memory usage change if you use a key path here (\.messageType) instead of an explicit closure?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original approach used keypaths but they seemed to have slightly worse runtime performance than closures which I found surprising. I haven't looked at memory performance though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speculatively, a lot of folks use key paths, so there's pressure on the compiler folks to speed those up. Using key paths would let us benefit from those improvements. Hmmm.... That assumes they actually happen, of course, and we don't know how much improvement might come about someday, so it's not obvious that we should go down that path right now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm indifferent, I think both are fine I just picked the one that was faster at runtime.

return nil
}
},
.singularEnum({ $0.requestedOutputFormat }, fieldNumber: 3, defaultValue: .unspecified),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Proto3, the default enum value is always the zero value. Does it help to split out the proto2 and proto3 enum cases?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't special case based on Syntax alone. We likely can reduce code size by making the cases with presence clear, and then for the ones that don't have presence, special casing zero likely also makes sense.

Sources/Conformance/conformance.pb.swift Outdated Show resolved Hide resolved
return nil
}
},
.singularBool({ $0.printUnknownFields }, fieldNumber: 9),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it help to combine this with the _protobuf_nameMap information ? Superficially, it seems like it would only require adding name information to these items. But the nameMap is a dictionary in order to speed up field number => field name conversions, and it's not clear whether it would make any sense to use a dictionary here?

A dictionary that used field numbers for keys and mapped to enums that carried field name and lookup functions could provide much of the need here. That would not obviously encode oneOf data, but I wonder if we really need oneOf to be coded here? (We could query each item of the oneOf separately and take whichever one was present?) That feels like it might be slower for some things, but I wonder what the real impact would be?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it help to combine this with the _protobuf_nameMap information ? Superficially, it seems like it would only require adding name information to these items

Yes absolutely I think this design allows us to remove the _protobuf_nameMap and probably speed up json encoding as a result since we can get rid of the dictionary lookup. I didn't include it in this PR to keep it focused but I think that absolutely makes sense.

We could have _fields be an OrderedDictionary where the keys are the field numbers allow it to completely remove the need for _protobuf_nameMap. Allowing us to traverse fields in an ordered way while still allowing us to look up a field name from a given field number for decoding purposes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm.... So many trade-offs! OrderedDictionary would have additional runtime memory overhead, of course.

I know the compiler folks are working on techniques to make the compiler better at coding constant arrays in TEXT at compile-time rather than building them in dirty memory at runtime. But to take advantage of that, we'd need to simplify that array construction a lot. That's somewhat speculative, but would suggest that it would be worth building the Fields array as a simple list of constant enum cases, eliminating as much indirection as possible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lookup is just for parsing, right? With a low number of fields, walking a list is likely cost the the performance of searching a dictionary. And we might even be able to tweak things to optimize for an ordered input also. i.e. - a table driven parse might be something to also look at doing, reducing code size even more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OrderedDictionary would have additional runtime memory overhead, of course

Yeah and I'm honestly not very familiar with it, I just know that it exists. There are other ways to do this like lazily constructing a static dictionary that we store alongside the fields array that maps field numbers to Field<>, or even just maps it to an index in the field array to save space, but basically something similar to _NameMap could be done.

but would suggest that it would be worth building the Fields array as a simple list of constant enum cases, eliminating as much indirection as possible.

This was my original design!

enum Field<M: Message> {
    case singularFloat(getter: (M) -> Float, isDefault: (M) -> Bool)
    // ...
}

However modeling sub-messages, enums, and maps proved to be difficult which required creating abstractions for indirection using protocols.

case singularMessage(getter: (M) -> any Message, isUnset: (M) -> Bool)
// ...

private extension Message {
    func traverseAsField<V: Visitor>(visitor: inout V, fieldNumber: Int) throws {
        try visitor.visitSingularMessage(message: self, fieldNumber: fieldNumber)
    }
}

Also the switch statement over the enum turned out to be more expensive than I thought (the switch itself was slow and also the compiler was deciding to create copies of the enum case and its content during the switch for some reason). More expensive than protocol dispatching that I'm doing now with FieldItem. My original enum based implementation was on the order of 10x-30x slower than the implementation of traverse on the main branch. Due to paying both the cost of the slow enum switch statement, as well as having to do a dynamic dispatch as when calling traverseAsField for enums, messages, and maps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... the compiler was deciding to create copies of the enum case and its content during the switch ...

Ah. Yeah, I know that problem well. 😞

for field in Self._fields {
try field.traverse(message: self, visitor: &visitor)
}
try unknownFields.traverse(visitor: &visitor)
Copy link
Contributor

@tbkka tbkka Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In experimenting with #1518, I wonder if we want to leave unknownFields to be queried by the individual encoder? #1518 wants to look at the unknownFields first rather than last.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would that work? Would we just not visit unknown fields at all in the implementation of traverse then? Or would this be maybe an argument to traverse to decide the order of traversal?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea: traverse wouldn't visit unknown fields at all. The encoders would be changed to use traverse to visit the regular fields and request the unknown fields directly from the message.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tbkka should the rather than visitors having collect it, does it make sense to change up the visit api at the same time so a visitor can ask to go forward or backwards? Then unknowns can always be "last" independent of the field order traversed? We could even support flags to say if they want extensions included/skips and/or unknowns first/last/skipped to provide much more customization.

Copy link
Author

@mrabiciu mrabiciu Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

traverse wouldn't visit unknown fields at all

Could that break existing implementations of Visitor that are expecting a call to visitUnknown? The compiler will not complain even if visitUnknown is removed from the protocol and could cause bugs to those who are expecting this to be called on their visitor implementations.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's one rough idea of how I was thinking could Visitor might control the behavior:

public protocol Visitor {

  public enum UnknownMode {
    // Visit any unknowns before the fields/extensions.
    case first
    // Visit any unknowns after the fields/extensions.
    case last
    // Don't visit any unknowns.
    case skip
  }

  public static var unknownMode:UnknownMode { get }

  public enum FieldOrder {
    // Visit fields/extensions in increasing number order.
    case forward
    // Visit fields/extensions in decreasing number order.
    case reverse
  }

  public static var fieldOrder:FieldOrder { get }

  // ...
}

// Defaults
extension Visitor {
  public static var unknownMode:UnknownMode { return .last }
  public static var fieldOrder:FieldOrder { return .forward }
}

And then the default for Message traverse() could be something like:

  public func traverse<V: Visitor>(visitor: inout V) throws {
    let unknownMode = V.Type.unknownMode
    if unknownMode == .first {
      try unknownFields.traverse(visitor: &visitor)
    }

    switch V.Type.fieldOrder {
    case .forward:
      for field in Self._fields {
        try field.traverse(message: self, visitor: &visitor)
      }
    case .reverse:
      for field in Self._fields.reversed() {
        try field.traverse(message: self, visitor: &visitor)
      }
    }

    if unknownMode == .last {
      try unknownFields.traverse(visitor: &visitor)
    }
  }

Or something like that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be controlled by the visitor or just be an argument to Traverse?
I was imagining this

struct TraversalOptions {
  public enum UnknownMode {
    // Visit any unknowns before the fields/extensions.
    case first
    // Visit any unknowns after the fields/extensions.
    case last
    // Don't visit any unknowns.
    case skip
  }
  public enum FieldOrder {
    // Visit fields/extensions in increasing number order.
    case forward
    // Visit fields/extensions in decreasing number order.
    case reverse
  }
  struct AdditionalVisits: OptionSet { // needs a better name
    let rawValue: Int

    static let visitDefaultFields = AdditionalVisits(rawValue: 1 << 0)
    static let visitUnsetFields = AdditionalVisits(rawValue: 1 << 1)
  }
  var unknownMode: UnknownMode = .last
  var fieldOrder: FieldOrder = .forward
  var additionalVisits: AdditionalVisits = []
  
  static let defaultOptions = .init()
}

func traverse<V: Visitor>(visitor: V, options: TraversalOptions = .defaultOptions)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sorta what I was originally thinking, but since the visit*Message() impls usually have to recurse, if they forget to forward the options, it would change to the .defaultOptions which could be a subtle bug.

I also wondered if a given Visitor will always use the same options, in which case that seems like it being properties of the Visitor might make sense.

But I'm not tied to one vs. the other, the potential danger with having to pass through the Options does worry me a little.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, visitors always traverse in the same way, and call the traverse method from just one of a very few places in their source. So I would suggest using either arguments or separate functions for the different traversal styles. In practice, I'm pretty sure that reverse traversal will always visit unknown fields first, and forward traversal will always visit last. The only exception is -- if I remember correctly -- JSON doesn't actually support unknown fields at all.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm remembering correctly, I think Go might have support to traverse fields within a message in random order so folks can't bake assumptions into the binary encoding (they might also space pad JSON for the same sorta reasons). Which I guess could be exposed as to the public EncodingOptions to let folks pick if they wanted the default (non deterministic) or deterministic ordering, like we have for map<>s. So, if that's something we ever thought we might also want to support, that might be a reason not to go two different traversal methods, but instead stick with some more of way to control it via some options.

@@ -110,6 +110,9 @@ public protocol Message: _CommonMessageConformances {
/// normal `Equatable`. `Equatable` is provided with specific generated
/// types.
func isEqualTo(message: Message) -> Bool

/// Provides `Field` information for this `Message` type used to provide a default implementation of `traverse`
static var _fields: [Field<Self>] { get }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In working with the Swift standard library team, we've learned that generic types are not free at runtime -- in particular, the runtime typically ends up having to actually create a runtime representation for each generic type. So Field<M> will end up producing a separate runtime type representation for every message type M (much the same as if you had created a type Field_MessageName instead). So it can make sense to combine or otherwise minimize the number of different generic types in a library where memory usage might be an issue.

I'm not sure there's a good alternative in this case: Field<> does seem like it needs to be generic over the message type.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think the reified generics have a negative impact on binary size. IDK what kind of impact they have on memory but its probably a negative one too.

I did have a different approach I explored where I created a non-generic and less safe version of Field:

public struct UnsafeField {
    private let item: UnsafeFieldItem
    func unsafeTraverse<M: Message, V: Visitor>(message: M, visitor: inout V) throws {
        try item.unsafeTraverse(message: message, visitor: &visitor)
    }
    
    public static func singularFloat<M: Message>(_ message: M.Type, _ getValue: @escaping (M) -> Float, fieldNumber: Int, defaultValue: Float = 0) -> Self {
        Self(item: UnsafeSingularFloatField(fieldNumber: fieldNumber, getValue: getValue, isDefault: { getValue($0) == defaultValue }))
    }
}

private protocol UnsafeFieldItem {
    func unsafeTraverse<M: Message, V: Visitor>(message: M, visitor: inout V) throws
}

private protocol UnsafeGetterProtocol<ReturnValue> {
    associatedtype ReturnValue
    func invoke<M: Message>(message: M, then action: (ReturnValue) throws -> Void) throws
}

private struct UnsafeGetter<Msg: Message, ReturnValue>: UnsafeGetterProtocol {
    let getter: (Msg) -> ReturnValue
    let shouldSkip: (Msg) -> Bool

    func invoke<M: Message>(message: M, then action: (ReturnValue) throws -> Void) throws {
        let msg = unsafeBitCast(message, to: Msg.self)
        if !shouldSkip(msg) {
            return try action(getter(msg))
        }
    }
}

private struct UnsafeSingularFloatField: UnsafeFieldItem {
    private let fieldNumber: Int
    private let getter: any UnsafeGetterProtocol<Float>
    
    fileprivate init<M: Message>(fieldNumber: Int, getValue: @escaping (M) -> Float, isDefault: @escaping (M) -> Bool) {
        self.fieldNumber = fieldNumber
        self.getter = UnsafeGetter(getter: getValue, shouldSkip: isDefault)
    }
    
    func unsafeTraverse<M: Message, V: Visitor>(message: M, visitor: inout V) throws {
        try getter.invoke(message: message) {
            try visitor.visitSingularFloatField(value: $0, fieldNumber: fieldNumber)
        }
    }
}

This had identical runtime performance as the Field<> solution. However I think this ultimately still created the same amount of generic types and just pushed this cost to UnsafeGetter and I didn't see noticeable difference in binary size impact.

The UnsafeGetter design was done this way to allow me to do unsafeBitCast which doesn't have a runtime penalty compared to doing an as!

However if the goal is to remove as many generics as possible UnsafeGetter could be done like this:

private struct UnsafeGetter<ReturnValue>: UnsafeGetterProtocol {
    
    let invokeIfShouldNotSkip: (any Message, (ReturnValue) throws -> Void) throws -> Void
    
    init<M: Message>(getter: (M) -> ReturnValue, shouldSkip: (M) -> Bool) {
        invokeIfShouldNotSkip = { msg, callback in
            let message = msg as! M
            if !shouldSkip(msg as! M) {
                try callback(getter(message))
            }
        }
    }

    func invoke<M: Message>(message: M, then action: (ReturnValue) throws -> Void) throws {
        try invokeIfShouldNotSkip(message, action)
    }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private struct UnsafeSingularFloatField: UnsafeFieldItem {
    fileprivate init<M: Message>(...) { ... }    
    func unsafeTraverse<M: Message, V: Visitor>(message: M, visitor: inout V) throws { ... }
}

This kind of design is very tempting, actually. This avoids the generic type (avoiding the need to reify a lot of runtime type information) by pushing the generic parameters down to the functions. The perf impact here is from looking up the type arguments for those functions. If the function is called from a separately-compiled module, the call will end up looking like this:

  M = runtime_type_lookup("name of message type")
  V = runtime_type_lookup("name of visitor type")
  unsafeTraverse(self, message, visitor, M, V)

Of course, if the M and V arguments are available already, then repeat lookups can be avoided. For example, in your code above, it looks like there's a sequence of unsafeTraverse() functions that all take the same M and V type arguments, so it should be possible for the compiler to just pass those through.

The lookups can also be avoided if the function can be inlined into a place where the type arguments are constants. (This is why one reason the standard library is so aggressive about inlining everything, but the cost can be high, since heavy inlining can prevent you from changing the code in the future.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

        let msg = unsafeBitCast(message, to: Msg.self)

Does this actually work? I somehow thought unsafeBitCast was only for pointer/object types?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this actually work

Yes it works but only if the types have the exact same size in memory. It does not work if message is an Any which is why UnsafeGetter captured Msg: Message and the function captures M: Message. Otherwise the only way to do the cast is with an as?/as!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested this for cases where the message has indirect storage? In that case, the message has a reference-counted pointer to the class that actually stores the data, and I wasn't aware it was safe to use unsafeBitCast to copy a struct that has reference-counted pointers.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, I'd have to dig up my old branch and see. Either way the approach that used unsafeBitCast is probably not the right way to go since in the end it created the same amount of generic classes due to UnsafeGetter<Msg: Message, ReturnValue> still capturing the message type. If the goal is reducing the amount of generics then the second approach I posted with struct UnsafeGetter<ReturnValue> would be better even thought the as! is less performant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to reduce memory usage without hurting performance "too much." How much slower is your second approach?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is significant. Around 2x slower with as!

@mrabiciu
Copy link
Author

mrabiciu commented Dec 1, 2023

The most interesting aspect of your proposal to me is that it could let us re-implement == as a pure library implementation, without any generated code

I gave this a try last night and it was pretty straightforward to build. The end result was a ~10% binary size reduction in my test app that contained 100 messages with 50 fields each.

@mrabiciu
Copy link
Author

mrabiciu commented Dec 1, 2023

Just pushed 2 updates:

  1. Figured out how to avoid the any Message syntax by putting static var _fields into _MessageImplementationBase
  2. Remove isUnset from message fields

@mrabiciu
Copy link
Author

mrabiciu commented Dec 6, 2023

@tbkka Do you have any outstanding feedback for me? Anything I can do to help push this through?

@tbkka
Copy link
Contributor

tbkka commented Dec 6, 2023

Compared your branch to main using the Perf Harness that's in the tree (after I fixed it up in #1521). That shows a small increase in overall code size along with 50% reduction in binary encoding performance. I'm also surprised to see that JSON and Text encoding show a much smaller absolute slowdown than binary: Binary added about 60µs (42->105) but JSON only increased by 22µs (78->100) and Text 19µs (152->171). I would expect all three to slow down by about the same amount. Hmmm....

What would we need to do to also support decoding with this approach? Minimum, we'd need a way to quickly look up field info from field number and/or name. And we'd need to support setting a field, not just reading it.

@tbkka
Copy link
Contributor

tbkka commented Dec 6, 2023

Encoding performance is still pretty proportional to the number of fields, so we haven't significantly increased our fixed costs.

@tbkka
Copy link
Contributor

tbkka commented Dec 6, 2023

@mrabiciu Could you merge main into your branch? (Or rebase, whatever you're happier with right now.). That will make some of my experiments a little simpler.

@tbkka
Copy link
Contributor

tbkka commented Dec 7, 2023

Hmmm... TestJSONPerformance/testEncoding_manyEnumMapsEncoding_shouldBeUnder1s is almost 4s on my machine right now.

@tbkka
Copy link
Contributor

tbkka commented Dec 7, 2023

I'm also surprised to see that JSON and Text encoding show a much smaller absolute slowdown than binary ...

Ah. Binary encoding does two traversals -- one to measure the total size and another to actually serialize things -- which explains this. Eliminating this would require a resizable buffer that we could expand as necessary during the regular serialization. I've long thought such expansion would actually be slower than the second traversal, but I'm starting to wonder if that's actually true or not. Hmmm.... That may require some future experimentation.

@tbkka
Copy link
Contributor

tbkka commented Dec 7, 2023

I'm seeing some retain/release traffic during the traversals. I'm not sure where that's coming from. I'm also see copy and destroy of the Field objects, which might be due to some local changes I'm experimenting with.

@mrabiciu
Copy link
Author

mrabiciu commented Dec 7, 2023

That shows a small increase in overall code size

Yes I saw the increase too, however after experimenting with implementing == using Field I saw the bundle size reduced to smaller than whats currently on main

What would we need to do to also support decoding with this approach? Minimum, we'd need a way to quickly look up field info from field number and/or name. And we'd need to support setting a field, not just reading it.

I haven't familiarized myself with decoding yet but I think having _fields be [Int: Field<Self>] along with storing a NameDescription in on Field would be everything needed to make decoding work. This allows us to eliminate _NameMap, further decreasing the code size.

Could you merge main into your branch

Yep I can do that!

TestJSONPerformance/testEncoding_manyEnumMapsEncoding_shouldBeUnder1s is almost 4s on my machine right now.

Do you have any instructions for how I can run and profile this?

@thomasvl
Copy link
Collaborator

thomasvl commented Dec 7, 2023

What would we need to do to also support decoding with this approach? Minimum, we'd need a way to quickly look up field info from field number and/or name. And we'd need to support setting a field, not just reading it.

I haven't familiarized myself with decoding yet but I think having _fields be [Int: Field<Self>] along with storing a NameDescription in on Field would be everything needed to make decoding work. This allows us to eliminate _NameMap, further decreasing the code size.

I'm still not sure we need to take this all the way to a Dictionary. I'd think most messages have few enough fields, that a linear search (or maybe a binary if the list if over a certain size) wouldn't be that bad (and might be better in performance and memory tradeoffs compared to having to make the Dictionary.

We might be able to provide a option to indicate if the input is expected to be in order. When in order, you could keep a current index, and after each filed, we just walk in order from that index to find the next field. For Binary and TextFormat where they should be well ordered, simply walking in order to the next field is likely going to be much better. It would take a really sparse message with lots of fields for that to be bad.

We also might be able to store some extra flags about the table, say if there are few gaps in the field numbers, we could null pad things, so that we could just index the field list by the field number and make all those lookups O(1).

@tbkka
Copy link
Contributor

tbkka commented Dec 7, 2023

TestJSONPerformance/testEncoding_manyEnumMapsEncoding_shouldBeUnder1s is almost 4s on my machine right now.

Do you have any instructions for how I can run and profile this?

You can use swift test to run the tests. That has various options to select specific tests, etc. (If that has any problems, try make test first -- the Makefile has some logic to update various pieces before it runs swift test.)

@mrabiciu
Copy link
Author

mrabiciu commented Dec 8, 2023

I've been profiling testEncoding_manyEnumMapsEncoding_shouldBeUnder1s and it seems isDefault is behaving slower than I expected. I think this and the retain/release traffic are related. I see the message being tested is backed by a storage class so I think the retain/release are happening when the getters are executed { $0._storage._repeatedMessage }.

I noticed we do withExtendedLifetime in a bunch of places. Does that avoid doing retain/release on the storage class?

@tbkka
Copy link
Contributor

tbkka commented Dec 8, 2023

... it seems isDefault is behaving slower than I expected. I think this and the retain/release traffic are related.

That seems like a reasonable guess. Accessing data on the storage class does require retaining that class object. withExtendedLifetime might help -- that tells the compiler to consider the object "in use" for at least a particular range of code; the compiler can then retain once at the beginning and release once at the end, avoiding intermediate retains/releases. But that doesn't necessarily avoid retain/release in intermediate functions.

Another approach might be to combine the fetch and default handling so that you make one function call and get both pieces of information at once with one round-trip down to the storage class instead of two.

@tbkka
Copy link
Contributor

tbkka commented Dec 8, 2023

The functional tests are passing now, which is promising. Failures above seem to only be because this branch has not re-generated the reference files, so the plugin tests are not generating the expected output. Please don't re-generate until we're ready to land this; that creates a huge number of differences that make it a little hard to see the actual code differences.

@tbkka
Copy link
Contributor

tbkka commented Dec 8, 2023

I'm still not sure we need to take this all the way to a Dictionary.

Good point. Dictionary has the drawback that it's hard to compile it down to static text. Where possible, it would be nice to avoid Dictionary and stick with simple Arrays. A few options:

  • If there are few fields, binary search on an array makes a lot of sense.
  • If the field numbers form a compact set, we can order the array to provides field number -> field info lookup directly. (This obviously won't work for an item with two fields numbered 1 and 50,000.)
  • We could also emit a small dictionary that just mapped field number and/or name to the index into the field array. That would leave the bulk of the data in something that could conceivably be optimized into static text.

@thomasvl
Copy link
Collaborator

thomasvl commented Dec 8, 2023

I'm still not sure we need to take this all the way to a Dictionary.

Good point. Dictionary has the drawback that it's hard to compile it down to static text. Where possible, it would be nice to avoid Dictionary and stick with simple Arrays. A few options:

  • If there are few fields, binary search on an array makes a lot of sense.
  • If the field numbers form a compact set, we can order the array to provides field number -> field info lookup directly. (This obviously won't work for an item with two fields numbered 1 and 50,000.)
  • We could also emit a small dictionary that just mapped field number and/or name to the index into the field array. That would leave the bulk of the data in something that could conceivably be optimized into static text.

The C++ generation these days use some tables for a bunch of things also, so there might be things to draw them.

One thing I believe they do is have a header on the table, we could model that as a second static, or maybe make the static an enum with two cases: (a) packed array/table (b) sparse array/table. If for (a) any unused field number gets a marker that is is unused, and we can do direct field number to array indexing (minus one since there is no zero field number). For (b), we do the binary search over the list. And if we decided the do the map for field to offset, we could always compute that at runtime the first time we need it.

@tbkka
Copy link
Contributor

tbkka commented Dec 8, 2023

Haven't thought this through, but I wonder if it would help at all if the visitor could visit the indirect storage class directly when there was one, instead of always going through the façade struct. I'm not sure what that would entail... Hmmm...

(I've experimentally combined my reverse binary encoder with this and it seems to be working pretty well.)

@thomasvl
Copy link
Collaborator

thomasvl commented Dec 8, 2023

Haven't thought this through, but I wonder if it would help at all if the visitor could visit the indirect storage class directly when there was one, instead of always going through the façade struct. I'm not sure what that would entail... Hmmm...

I've long wondered about making out generated traverse/decode move into the Storage class (when it exists), it means moving the unknowns/extension there also, and then we could skip the need for the extended lifetime things. But if we move to table driven, I'm not sure how that would work out. I guess we could have a default impl for message without Storage that directly uses the table there, but we'd generate a stub for storage that forwards to the Storage class, and provide a public function that that is generate to some Storage protocol and gets the table passed to it? (i think the C++ currently always generates their calls, but they calls are now really short and just call the table based helper passing in the class specific args, so we could do something like that)

@mrabiciu
Copy link
Author

mrabiciu commented Dec 8, 2023

Another approach might be to combine the fetch and default handling so that you make one function call and get both pieces of information at once with one round-trip down to the storage class instead of two.

Yeah I had the same idea, I'm in the middle of prototyping that now.

We could also emit a small dictionary that just mapped field number and/or name to the index into the field array. That would leave the bulk of the data in something that could conceivably be optimized into static text.

I was thinking the same thing. The dictionary could also be generated at runtime on first access of _fields rather than being pre-compiled.

Haven't thought this through, but I wonder if it would help at all if the visitor could visit the indirect storage class directly when there was one, instead of always going through the façade struct

I think this could be possible. The Storage class may need to become public in order for us to have public Field<_StorageClass> properties for this to work but we could keep all the fields private which is probably fine?

@mrabiciu
Copy link
Author

I've tried all of the above but it did not appear to improve the performance of the tests. I'm still seeing a lot of retain/release traffic despite doing something like this:

struct SomeMessage {
    class _StorageClass {
        var _singularInt: Int = 0
        // ...
        
        static let _fields: [Field<Unmanaged<_StorageClass>>] = [
            .singularInt32({ $0.takeUnretainedValue()._singularInt32 }, fieldNumber: 1), 
            // ...
        ]
    }
    
    func traverse<V: Visitor>(visitor: inout V) throws {
        let unmanagedStorage = Unmanaged.passUnretained(_storage)
        for field in _StorageClass._fields {
            try field.traverse(unmanagedStorage, visitor: visitor)
        }
    }
}

I'm not really sure why since I thought Unmanaged is supposed to stop adding retain/release calls. Unless the retain/release calls are on something else? Is there any good way to debug this and know what exactly is being retained and released?

@mrabiciu
Copy link
Author

As a side-note my original goal with this PR was to enable traversal of default and unset fields for the v2 release. Would you be opposed to me opening a second PR where I add a more naive version of default and unset field traversal that uses the existing traverse implementation? I'm happy to continue experimenting with this refactor but I would also like to unblock my project that depends on needing default fields in json.

@tbkka
Copy link
Contributor

tbkka commented Dec 12, 2023

Would you be opposed to me opening a second PR where I add a more naive version of default and unset field traversal that uses the existing traverse implementation?

Not at all. It's possible we would have objections to the specific implementation, but I'd have to see it first. Regardless, it's always worth discussing incremental improvements -- we don't have to get it all perfect in a single shot. 😁

@thomasvl
Copy link
Collaborator

As a side-note my original goal with this PR was to enable traversal of default and unset fields for the v2 release. Would you be opposed to me opening a second PR where I add a more naive version of default and unset field traversal that uses the existing traverse implementation? I'm happy to continue experimenting with this refactor but I would also like to unblock my project that depends on needing default fields in json.

Do you all use main as is vs. using tagged releases? I'm just making sure I understand how that would "unblock" your project. Since I believe this was only going to be a 2.x feature, and at the moment, we don't have a timetable for making the first 2.x release.

@mrabiciu
Copy link
Author

Once my PR is merged my intention is to either pin to the specific commit hash that includes my PR, or if main happens to be unstable at that commit I can just apply my PR as a Bazel patch on top of whatever the latest stable 1.x version is until 2.x is released.

@iluvcapra
Copy link

I hope there's still activity on this, I'm working with a server that requires a request to have all of its options filled-out, even if they're values are false or an empty string.

@thomasvl
Copy link
Collaborator

thomasvl commented Feb 2, 2024

I hope there's still activity on this, I'm working with a server that requires a request to have all of its options filled-out, even if they're values are false or an empty string.

This change is some what larger, and got paused for some issues around it. But it sounds like what you are interested in is in #1526. However, due to how the library has to change, it won't be in a 1.x release when done, you'll likely have to wait for a 2.0 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants